Protecting passwords in web applications
04 April 2016
With years of developing web applications, one thing that comes up time and time again is... What is the safest way to store passwords in a system?
There are number of simple ways to store passwords
- Clear text
- Encrypted
- Hashing
- Password-based key derivation functions
Clear Text
If someone enters their password as "dog1" then the password will be stored in the database as "dog1". Storing the password in this format means anyone with access to the database, server, backups, employees, developers or potentially attackers while the data is in transit (man-in-the-middle attacks between application and database) can "see" the password.
The below image shows what someone with access to your system can "see".
The problem with passwords is that users do tend to use their password for more than one site or service. Therefore someone stealing my "dog1" password might be able to access my Active Directory network login, Facebook, and Online Bank websites.
Another option that is mildly better is to encrypt the password.
Encryption
Encryption at a very basic level takes a piece of plain text and converts it into random gibberish. The way in which the gibberish is created is a mix of mathematics and the encryption key (at a very simplistic level).
If we choose a very common encryption method called AES (Advanced Encryption Standard) which was originally called Rijndael. My database encrypted "dog1" value looks much better at a casual glance, using the encryption key "iloveanimals".
Without the key ("iloveanimals") it is very secure, government level secure. But it is the security of the key that makes it a problem. The common problems with using encryption for passwords are:
- The password can be reversed, this means if the attacker knows the decryption key, or brute forces the password, they will retrieve the original password. They can then use the password on other systems the user might have access to. There is no genuine secure reason anyone should know a persons password, password resets are all that is needed. Sharing passwords with anyone is never acceptable, accountability is immediately lost.
- The storing of the encryption/decryption key is a problem. The problem, which has been around since symmetric encryption began, is how do you transport and store the key securely. The key is needed by the system every time the password is checked, this means it must be easily available. Do you then encrypt the encryption key? If so, how do you protect that key?
So if you encrypt the passwords and an attacker steals the data containing the passwords, they might well steal the key as well. Then they can in a few seconds decrypt the whole database of passwords.
So what is the best, and really only, way to protect passwords? Answer: Hashing.
Hashing
Hashing is a mathematical algorithm that I will describe in more detail in later articles, but for now think of it just like encryption, only that there is no encryption/decryption key.
The algorithm has logic that converts the given plain text into a fixed length random output. The advantage of this is there is no key to protect. Another advantage is you get a fixed length output regardless of input length, therefore it becomes unfeasible to reverse it. My "dog1" when hashed with sha256 is shown below
Just like the encryption above, I can't read what the original text is. Unlike encryption though there is no known mathematical way of returning the original value. There is no key to protect so there is no protection needed for that.
The problem with hashing a string is it will always compute to the same output given the same input. A SHA256 of the string 12345678 will always give the same result. Someone attacking a database of passwords that are hashed can brute force password matches against readily available rainbow tables. These rainbow tables contains thousands, millions, of computed strings of which the attacker is looking for a match. This allows them to take an educated guess of what the original password was likely to be (there are a finite amount of possible hashes so they will duplicate called a collision)
Another problem with hashes is they are very efficient to compute, more so than encryption algorithms. This means that brute force attacks can be conducted very, very quickly. The use of SHA-2 such as SHA256 helps over say SHA1 and MD5 but as computers gain speed something else needs to be done.
Password-based key derivation functions
Password-based key derivation functions (KDFs) offer protection against duplicate output hashes being generated from the same input string to combat rainbow tables, and they can slow down the speed in which a brute force password guess takes to complete.
To ensure the same string, 12345678, does not compute to the same hash we need to add some variety into the calculation. This variety is known as salt, which is a string unique to each entry commonly stored with the password in the database. The KDF can then take as input the salt + password to compute the output, it should be different everytime assuming the salt has been generated using a suitable PRNG (Pseduo-random number generator).
To ensure the computational speed of which the output value is derived the KDF also introduces a tactic known as 'rounds'. This is the number of times the mathmatical operation is performed on the inputted string. You could set the KDF to conduct 10,000 rounds on the string, this adds time to compute the output - slowing down brute force attacks.
If you are looking at talking to your development or security team about password-based KDFs they are commonly going to know about PBKDF2 (the one I favour) and bcrypt implementations.
Summary
Password storage should be treated differently to general data confidentiality. After all most data you encrypt you want to be able to retrieve at a later date, passwords should never be one of them.
Side note: To the savy readers you might have noticed the password in the hash image example is shown in Base64, this is a common way of storing binary data in a database.