Think back to the web of fifteen years ago. Most of the web sites of the time consisted of a few pages of content along with a contact page (and maybe even a guestbook.) Most often that contact page was backed by a script that mailed the results to a fixed e-mail address. Is anyone willing to admit to using formmail.pl? In the least, we can admit that we didn’t move data around in the most secure way.
How does a one-time password work?
I can’t think of anyone who would argue that keeping e‐mail private is of little import or an easy task. I use e‐mail as storage for personal information, for authentication, and for communicating when I want the communications to be private. Such private information should be well‐protected. Unfortunately, this is rarely the case.
My current e‐mail service provider uses machine learning techniques to read my messages and display frighteningly‐relevant advertisements to me. (This is part of the bargain of free e‐mail service.) It also does a good job providing reasonable security against most parties who would like to read my e‐mail.
So, how would I go about keeping my messages private to those whose names are on the To and From lines of the message? There are some solutions to this challenge. Two common solutions are PGP and S/MIME. Each of these solutions uses public-key encryption.
Public‐key encryption is a type of encryption that requires separate, related keys for encryption and decryption. I control a keypair—a public key and a private key. My friend John also controls a keypair. When I send John a message, I use his public key (which is publically-available) to encrypt the message. He can then use his private key to decrypt and read the message. When he sends me a response, he uses my public key to encrypt the message and I use my private key to decrypt the message.
PGP (Pretty Good Privacy) provides a means of encrypting messages and files (but not the recipients or the Subject of a message.) It is easy to use as encryption goes and relatively secure when used properly. Identities in PGP are built on the notion of a web of trust. The web of trust is a very important part of PGP. What good is encryption if you don’t encrypt the message for only the correct recipient?
The web of trust system works on the idea that you are willing to trust your friends. If you and I meet and exchange keys, you know that when you encrypt a message with my public key, I am the one who can receive that message. If I introduce you to my friend Alan, you might be willing to accept that a public key he provides you is in his control. However, with PGP, I can send you a file that proves to you that I believe that a certain public key is controlled by Alan. You can then trust that Alan is in control of that public key because you trust that I have verified Alan’s public key.
S/MIME (Secure/Multipurpose Internet Mail Extensions) also provides a means of means of securing the body of a message. It is built on the notion that central authorities should verify your identity. This makes it popular in larger companies that need to secure e‐mail. (It can also be used as the basis for single sign‐on implementations.)
The public key infrastructure system uses central authorities like companies or governments to verify identities. You (or, more accurately, your software) may trust one of these authorities to verify that Troy at Art & Logic is really Troy at Art & Logic. This is useful for companies that want their employees to communicate privately. Each company can verify the identities of the employees to the other company. This works less well when you want to communicate privately with your friends from college.
There are some drawbacks to encrypting your e‐mail.
If you want your messages to be scanned for malware, you must do that after decrypting the message. Alternatively, you or your IT team may keep a copy of your private key and use that to decrypt the message and scan it for malware. This makes your messages less private, but may save you some pain. Alternately, you may decide that you don’t need to worry about it if you trust the sender and PGP and S/MIME allow you to verify the identity of the sender.
Searching encrypted messages is not easily done if your software is not built for it. You can search Subjects, but the Subject of each message is not encrypted, so you need to find the balance of what you are willing to leave unencrypted.
In all, e‐mail encryption provides usable privacy for your messages. It does not secure the Subject line or the sender or recipients of a message. It provides some challenges for defending against malware and searching your messages in the future. If you must have privacy for your communications, I suggest you evaluate one of these options.
(This is part 2 of a series on web security; see part 1.)
In my last post we saw that what your users don’t know can hurt them. In other words, how securely you handle your users’ private data behind the scenes can have profound implications both for your business and your users’ well being. To put it bluntly, it’s bad for your business to be publicly shamed over your handling of sensitive data, and it’s bad for your users to have their bank accounts pilfered — those being some of the worse case scenarios.
So today I’d like to resume our discussion of secure password storage. Let’s put our black hat back on, and see what we can break.
I’ll start with the easiest case. Sometimes developers assume that as long as their database is safely hidden behind a firewall and an ordinary web server, then it’s OK to store everything in plaintext. But this is not true. There are many ways data can leak from your production database, including:
- performing a SQL injection attack through your website
- digging through a backup or archive of your database — therefore as long as you don’t create backups, you’re safe*
- gaining access to the server file system, e.g. through telnet/SSH/RDP
* That was a joke.
It is a good rule of thumb that you should design your database with the assumption that malicious users may gain unrestricted access to it at some point. But even if that happens, you should be prepared to breathe a sigh of (slightly nervous) relief, knowing that they still won’t be able to use the information maliciously. That doesn’t mean you have to encrypt everything, but you should definitely encrypt anything sensitive, such as credit card numbers, passwords, and so on.
A natural first step is to perform a one-way encryption, or “hash” on passwords so they are no longer readable in the database. Here’s an example:
Do you see any problems with the above? OK, ignore the fact that the hashes are very small numbers. This is just pseudo-data for illustration.
Observant readers will notice that one of the hashes occurs more than once (JSmith and MRandolph). Did you catch that? This is one of the problems with storing password hashes in your database – it’s still very easy to see which users chose the same password. Remember, users won’t protect themselves, and a surprising number of users may have a password of “12345” or “password” (or “Password1”, just to anticipate and refute a well-intentioned, but ultimately insufficient attempt to solve this problem via a more strict password selection process).
Beware the Dictionary
An even deeper problem here is that a hashing scheme like the above is susceptible to a dictionary attack using a large, pre-calculated collection of hashes of common passwords. All it takes is one successful match to positively identify the hashing scheme used, and then start doing damage.
Don’t Follow This Recipe
So we have to make sure the hashes we store are unique. We don’t want an attacker to be able to recognize any of them. To do this, people use what’s called a “salt” to make the output more random.
A well-salted hash.
(photo by Tavallai, CC BY-ND 2.0)
The salt is just a random number, and you can combine it with the hash process to get a more random looking output. Here’s how one person did that, showing the same data from the table above, but with salt included. Pay special attention to the JSmith and MRandolph records, as before:
Whoa, wait a minute. Do you see a new problem here? It is true that each “PasswordHash” attribute is now unique since a random number has been prefixed. And the developers may run a few simple SQL queries and verify that no two PasswordHash attributes are the same, and pat themselves on the back. But that is merely a dangerous illusion, and this is a very wrong implementation.
Since you have your black hat on, it will be obvious to you that a hacker can just bit mask out the part of the hash they are interested in, sort of like performing a Python slice, and exclude the “salt” that way. So this erroneous approach has no meaningful improvement over the “simple hash only” example above.
Note: I actually found this erroneous approach used in an online source code recipe, several years ago. Of course it seems absurd to us under this analysis, but somebody thought it was correct enough to post on a source code recipe sharing website, so I think the point was worth belaboring here a bit.
A Better Seasoned Recipe
Here is a more accurate description of how to use salt to protect your password hashes:
- In context of creating a new user record or updating a password, receive the plaintext password from the user.
- Generate a strongly random number to use as the unique salt value for this user record.
- Compute: a hash of (the salt concatenated with a hash of (the salt concatenated with the password)). Here’s a link explaining why this expression needs to be this complex, instead of simply a hash of the concatenation.
- Store both the result of that final hash calculation, and also the unmodified salt value in your database in the user record. I personally like to concatenate the final hash and salt and store them in the same record attribute, just to be obscure. But that doesn’t really matter. Note that it’s OK to store the salt in plaintext; in fact, that’s required.
- After we are finished with this process, deliberately forget the plaintext password. Depending on the overall architecture, maybe it is was provided by the user, or maybe it was system-generated and must now be emailed to the user. Either way, it must not be stored as plaintext.
- Later on, when the user enters their user name and password to log in, look up the record by user name, then repeat the calculation in step 3 using the salt value retrieved from the record. The resulting hash (using the password being entered) can be checked against the stored hash from the database to determine if the user entered the right password.
If you do it right, your database’s hashes should now look totally scrambled and inscrutable to an unauthorized reader. (Reminder to self: next time must avoid blogging while hungry, especially about recipes for salted hashes and ordering crackable things as scrambled.)
Choosing a Hash Function
This blog post is not a complete treatment of the subject of server side salting and password hashing. Another important decision is what hashing function to use. A hash function in this context is typically chosen to be both secure and slow. But it’s also a moving target, as cryptographic standards must continually respond to rapidly advancing cracking capabilities. Somehow the very weak MD5 ended up as an entrenched hash function in very widespread use in the 90’s and aughts. (Boy, was that a short sighted mistake.) Many people are still using SHA-1, which wasn’t considered horrible just a few years ago, but really needs to be deprecated in favor of stronger options. I recommend you spend some time reading the links in this discussion to get a sense of what’s out there. I’m deliberately not giving a specific recommendation here, in order to reinforce that there is actually more than one possible answer, and also that the “correct answers” periodically change.
Don’t Try This At Home
My final advice may sound like it contradicts everything I’ve said thus far. But that’s OK. 🙂
If at all possible, you should not come up with your own implementation of these approaches. Ideally, you should rely on your framework libraries to provide high level, complete authentication and authorization wrappers. Or if not, at least you should find and integrate a secure implementation from a trusted source. I already showed you how some guy on the internet thought they were salting their passwords, but got it totally wrong; so definitely don’t trust random stuff you google up.
Be very cautious if you’re not a cryptographic expert. Certainly, you can and should learn the basics of information security, and use your knowledge to audit and critique your own systems. But any implementations you deploy to production should be from trusted frameworks, or at least closely follow standard industry best practices. Don’t assemble something off the top of your head, or it will almost certainly be cryptographically weak and defective.
Thanks for reading! You can take the black hat off now. Hopefully this was informative for somebody; if you have any questions or want to share your own advice for readers, I’d love to read your comments below.
(This is part 1 of a series on web security; see part 2.)
What’s wrong with this code?
Any jokester who says “it looks fine to me” will be sent to the spice mines of Kessel. But I think for observant readers, a couple of critical security errors will practically jump off the screen:
- User inputs are being concatenated directly into a SQL query string, risking a SQL injection attack.
- Passwords are stored in plaintext, exposing users to further harm in the event the database is accessed.
In the Real World
If you keep up with the news, you may have seen that no less an internet giant than Yahoo may have been guilty of both of the above mistakes, leading to a breach of 435K user credentials. This is a disaster for any company that safeguards private user data.
You can’t depend on users to protect themselves; somewhere out in the world today is a person who set up the same password on a Justin Bieber mailing list website and also their online banking. So if even a fluffy, non-important website drops the ball, users may see their bank accounts emptied. That’s probably an extreme worst case, though there is a whole range of other mischief you don’t want to enable either. The point is, your users trust you, and they don’t know what you are doing with their data. Be worthy of their trust.
Get In the Mindset
A technique I find helpful for programmers is to temporarily put down their shining knight helmet and try on a black hat for size. Put yourself in a creative trouble-maker frame of mind. Ask the question: how could somebody compromise my website and harm my users or my organization?
In the case of the code snippet at the top of this article, it is trivially easy for a hacker to enter a user name or password that will disrupt the query. By placing complex SQL expressions (perhaps cleverly including SQL comments) in the user name and password inputs, you can easily delete or insert records. What would totally frustrate your hacking attempts, though, is if the programmer used their framework’s feature for formal query value parameters (see for C#, Python, etc). And that, of course, is the correct answer here. Never concatenate user inputs into SQL strings. It’s even a bad idea to write your own sanitizing functions, because you’ll probably get a detail wrong, even if you’re a smart person. Just use the framework, all the time, or else use a mainstream ORM that hides the query string safely out of your code’s sight.
The other major problem is the use of plaintext password storage. That’s obviously a really bad thing if the database is accessed, and many programmers are aware that if you just do a one-way hash of the password, it’s more secure. You do lose the ability to remind the user of their password, but it’s usually OK, because they can just create a new one when needed via an email challenge/response. But even hashing itself doesn’t cut it. If you are not salting your password hashes, your users are exposed to unacceptable risk.