Intelligent Password Storage
By stretch | Monday, February 21, 2011 at 5:12 a.m. UTC
I want to rant for a moment about websites and passwords. It's depressing to think that a decade into the new millennium, some web application developers still can't get their heads around the concept of secure password storage. And it's not just the amateurs. Every time I see an error message like the one below telling me my password is too long or that it contains a "special" character, I die a little inside.
Why? It means that the the password is being stored in plain text. For those unfamiliar with database programming, fields (columns) must be created with a declared width prior to storing any data. Thus, a negligent programmer might create a table like the following:
+---------------+---------------+ | username(14) | password(10) | +---------------+---------------+ | mcguirk | s0cc3r | | brendon | f1nal_cUt | | melissa | Aquaph0be4 | | jason | h0meal0n3 | +---------------+---------------+
The programmer then enforces two types of input restrictions:
- Length - Passwords chosen by users are limited in length to the width of the database column (in this example, 10 characters).
- Special characters - Certain characters (e.g. ', ", ?, etc.) are forbidden out of fear of SQL injection.
This approach has two huge flaws. First, it prevents users from choosing strong passwords. Second, should the application's database be breached, all of the account credentials can be easily gleaned. Attackers like to try reusing an individual's login and password from one site on other popular sites because they know people are lazy and tend to reuse passwords.
So what's the right way to store passwords? Passwords are an interesting type of data in that they never need to be displayed by the application. Your username needs to be retrieved in order to display a "Welcome, $username!" message. Your email address needs to be retrieved in order to send you an email. But your password never needs to be retrieved, only compared with a password you provide. This is where one-way hashing comes in handy.
I wrote an article a while back explaining how Cisco IOS MD5 hashes are generated. A plaintext (human-readable) password is provided at the command line. From this password and a randomly generated cryptographic salt, a one-way hash with a very high probability of uniqueness is generated. The same concept should be employed with credentials stored in a database.
As an example, let's look at how user passwords are stored for this site. PacketLife.net runs on the Django Python framework, so all passwords are stored as salted SHA-1 hashes in the database. This means that:
- Passwords can be as long as you're willing to type in.
- Passwords can contain any character.
- A compromise of the database will not readily reveal your password.
It's true that an attacker could perform a brute force attack against a password hash, but this is computationally intensive (proportionate to the length of the password), and the cryptographic salt defeats time-memory trade-offs like rainbow tables.
If the example given earlier were reworked to employ proper password hashing, the resulting table would look like this:
+--------------+-----------------------------------------------------+ | username(14) | password_hash(52) | +--------------+-----------------------------------------------------+ | mcguirk | sha1$39502$38673599e0815168ee723342a5f51b24df0b7605 | | brendon | sha1$295c6$a38850d2a15d7042476cde2a83efffaaae3e781e | | melissa | sha1$ee509$48df698e8a9c7922de0c3c5191acd9046debd7f7 | | jason | sha1$7f9a2$74eb751493f49e5cfe83847f58d97b42177fc06f | +--------------+-----------------------------------------------------+
Posted in Security
February 21, 2011 at 5:21 a.m. UTC
Yeah one of the biggest giveaways about terrible password storage is when passwords are length limited, limited to A-Za-z0-9 or even emailing it in plaintext back to you (hello MailMan).
If you get to write your own software stack it's probably easiest to use bcrypt because it's harder to brute force. A nice writeup about why to use bcrypt for passwords is at http://codahale.com/how-to-safely-store-a-password/
(There are ones with higher work AND memory usage too, but bcrypt is supported nearly everywhere) :)
February 21, 2011 at 7:48 a.m. UTC
Tine for gpgAuth. One password everywhere, securely.
February 21, 2011 at 8:13 a.m. UTC
What I have also seen is sites requiring passwords exactly 8 characters long, with exactly one special character, one number, and one capital letter.
While it is a good idea to encourage this, it is bad to enforce it. They just reduced the number of possible passwords, and thus made brute-forcing the passwords easier.
February 21, 2011 at 8:51 a.m. UTC
This is all good and well as long as you use simple password mechanisms in the frontend (i.e. plain text), which, depending on the situation, can be stupid in itself.
If you want to use secure password mechanisms in the frontend, like CRAM-MD5 or CHAP, the backend needs to know the plain text password, so simply hashing the password does not work.
Any suggestions for that? One could symmetrically encrypt the password in the database, but that would mean distributing the key to all applications that work with the password.
February 21, 2011 at 10:26 a.m. UTC
A quirk that many people are not aware of is that the original Unix crypt(1) function actually truncated passwords to eight chars (and only cared about the lower 7 bits in each char).
I used to work at a company that had Digital Unix servers (later Tru64) running on Alpha boxen, and an IT manager who thought he was clever with a very long and convoluted root password. He looked at me nervously one day when I logged in after only typing the first eight characters of it.
February 21, 2011 at 12:40 p.m. UTC
February 21, 2011 at 6:04 p.m. UTC
Gawker passwords were done properly(ish). They were randomly salted crypt() hashes, anyway. The plaintext passwords that got published were uncovered by a dictionary attack.
I used to work at a place that had very paranoid policies. Users weren't allowed to choose their own passwords out of fear that they'd use their dog's name, wife's birthday, etc...
Instead, users had to choose a password from a list of choices presented by the password change utility. This utility had some understanding of English word construction (vowels, consonants) in order to produce generally pronouncable passwords, which sometimes wound up being actual English words.
The thing is, a long look at this system's proposed passwords made it possible to figure out how it was putting words together.
Oh, and the utility used the same salt every time.
An attack using a custom dictionary (based on the words it tended to construct) snagged 90% of the passwords. Dumb dumb dumb.
February 21, 2011 at 6:58 p.m. UTC
Lalufu: it works by a double hashing step.
P is the plaintext password
H(x) is the hashing function.
When you first enter your password, the database calculates
H(P) and stores that in the DB.
When it wants to authenticate a client, it sends a nonce,
The client then calculates
H(N + H(P)), and sends that to the server.
The server, already knowing
N, can then calculate
H(N + H(P)) and if that matches what the client sent, then it's authenticated.
This keeps the plaintext password from ever being sent over the wire, as well as provides replay-attack protection.
Edit: I'm not sure exactly how things like CRAM-MD5 and CHAP work, but I believe the implementation is very similar to what I described above.
February 21, 2011 at 7:07 p.m. UTC
Nice article. I'm currently building a web app and was planning to employ MD5 hashing but this has made me think, now I will use SHA-1 instead.
Also, SSL encrypted login pages is on the todo list.
February 22, 2011 at 1:24 a.m. UTC
I see someone likes homemovies ;)
February 22, 2011 at 8:59 a.m. UTC
This does not solve the problem that was supposed to be solved by hashing the password in the database in the first place. If I nick the database with the hashes I can still impersonate the client, since I have all the information needed for the exchange. You just replaced the P with H(P) in the exchange, causing H(P) to be sufficient for successful authentication.
February 22, 2011 at 4:07 p.m. UTC
What I suppose could be done is some crypto skullduggery where the server is able to calculate H(N+P) when it only knows N and H(P). I seem to recall that there are hash functions which have this property, but this is not true for SHA1 and MD5, which are used in the common challenge-response based protocols.
February 24, 2011 at 7:39 a.m. UTC
Nice Article..and solve the problems of SQL injections and minor dictionary and brute force. Still limiting the logon attempts and session expiration will help securing password verification process.
March 1, 2011 at 10:07 a.m. UTC
"people are lazy and tend to reuse passwords"
At my workplace I have off the top of my head 5 passwords with different requirements and reset periods. That's just my workplace, then add all the personal applications and websites I use, computers etc etc.
It's a problem with the lack of reliable and trusted federation that is the real cause of this 'laziness'. I don't know that blaming users for reusing passwords is really constructive in 2011. But I digress..
March 2, 2011 at 4:18 a.m. UTC
March 13, 2011 at 5:39 a.m. UTC
Omg I'm glad someone with a largish readership is talking about this. Annoys the crap out of me when sites do this. I like the ones that are overly restrictive but it still doesn't help. For example a site saying your password has to be between 6-8 characters... Umm... Yeah
In fact the worst implementation I've come across is just that. This is the login to a 401k account at a bank. The site appears to be down right now but going by my memory the restrictions were 6-8 characters with at least one capital letter and one number, No special characters (it rejects the password if you try). Sigh