(This is the third in the “Radically Cross Platform” series of posts; see previous posts about Xamarin and Memory Management.)
When in the early stages of developing my cross platform game/graphical app engine, my first task was reading as much as I could find from internet articles, blog posts, and forum discussions. I wanted to find stuff written by people who had been there before me. I wanted to know what worked, and what didn’t; what kinds of pitfalls to avoid. One surprising piece of advice I found was to use binary serialization to optimize a mobile app’s speed and memory footprint.
The first post of this series described a choice of technologies and toolsets (based on Xamarin) that allows C# programmers to deploy graphics accelerated apps using OpenGL/DirectX across a radically cross platform spectrum of devices and operating systems.
That’s all fine and good, but it’s not the whole story. Just because you can build and run your app on a given platform doesn’t mean it’s ready to publish on an app store. Well, in the case of Windows and Mac x86 desktop apps, maybe it does. But when you take a code base with beautiful LINQ queries, effortless, strictly-typed data modeling, and world-class garbage collector (GC) expectations – and run them on a wimpy ARM-based mobile device, a funny thing happens.
Shared cross platform development is a concept that resonates very positively with all of us as programmers. It’s a nice outworking of the DRY principle, and seems like it would free engineers up to accomplish more. So why is it so rare that we do it?
I recently asked myself that question while planning a personal mobile project, and here’s what I came up with: (more…)
(This is part 2 of a series on web security; see part 1.)
In my last post we saw that what your users don’t know can hurt them. In other words, how securely you handle your users’ private data behind the scenes can have profound implications both for your business and your users’ well being. To put it bluntly, it’s bad for your business to be publicly shamed over your handling of sensitive data, and it’s bad for your users to have their bank accounts pilfered — those being some of the worse case scenarios.
So today I’d like to resume our discussion of secure password storage. Let’s put our black hat back on, and see what we can break.
I’ll start with the easiest case. Sometimes developers assume that as long as their database is safely hidden behind a firewall and an ordinary web server, then it’s OK to store everything in plaintext. But this is not true. There are many ways data can leak from your production database, including:
performing a SQL injection attack through your website
digging through a backup or archive of your database — therefore as long as you don’t create backups, you’re safe*
gaining access to the server file system, e.g. through telnet/SSH/RDP
*That was a joke.
It is a good rule of thumb that you should design your database with the assumption that malicious users may gain unrestricted access to it at some point. But even if that happens, you should be prepared to breathe a sigh of (slightly nervous) relief, knowing that they still won’t be able to use the information maliciously. That doesn’t mean you have to encrypt everything, but you should definitely encrypt anything sensitive, such as credit card numbers, passwords, and so on.
A natural first step is to perform a one-way encryption, or “hash” on passwords so they are no longer readable in the database. Here’s an example:
Do you see any problems with the above? OK, ignore the fact that the hashes are very small numbers. This is just pseudo-data for illustration.
Observant readers will notice that one of the hashes occurs more than once (JSmith and MRandolph). Did you catch that? This is one of the problems with storing password hashes in your database – it’s still very easy to see which users chose the same password. Remember, users won’t protect themselves, and a surprising number of users may have a password of “12345” or “password” (or “Password1”, just to anticipate and refute a well-intentioned, but ultimately insufficient attempt to solve this problem via a more strict password selection process).
Beware the Dictionary
An even deeper problem here is that a hashing scheme like the above is susceptible to a dictionary attack using a large, pre-calculated collection of hashes of common passwords. All it takes is one successful match to positively identify the hashing scheme used, and then start doing damage.
Don’t Follow This Recipe
So we have to make sure the hashes we store are unique. We don’t want an attacker to be able to recognize any of them. To do this, people use what’s called a “salt” to make the output more random.
The salt is just a random number, and you can combine it with the hash process to get a more random looking output. Here’s how one person did that, showing the same data from the table above, but with salt included. Pay special attention to the JSmith and MRandolph records, as before:
Whoa, wait a minute. Do you see a new problem here? It is true that each “PasswordHash” attribute is now unique since a random number has been prefixed. And the developers may run a few simple SQL queries and verify that no two PasswordHash attributes are the same, and pat themselves on the back. But that is merely a dangerous illusion, and this is a very wrong implementation.
Since you have your black hat on, it will be obvious to you that a hacker can just bit mask out the part of the hash they are interested in, sort of like performing a Python slice, and exclude the “salt” that way. So this erroneous approach has no meaningful improvement over the “simple hash only” example above.
Note: I actually found this erroneous approach used in an online source code recipe, several years ago. Of course it seems absurd to us under this analysis, but somebody thought it was correct enough to post on a source code recipe sharing website, so I think the point was worth belaboring here a bit.
A Better Seasoned Recipe
Here is a more accurate description of how to use salt to protect your password hashes:
In context of creating a new user record or updating a password, receive the plaintext password from the user.
Generate a strongly random number to use as the unique salt value for this user record.
Compute: a hash of (the salt concatenated with a hash of (the salt concatenated with the password)). Here’s a link explaining why this expression needs to be this complex, instead of simply a hash of the concatenation.
Store both the result of that final hash calculation, and also the unmodified salt value in your database in the user record. I personally like to concatenate the final hash and salt and store them in the same record attribute, just to be obscure. But that doesn’t really matter. Note that it’s OK to store the salt in plaintext; in fact, that’s required.
After we are finished with this process, deliberately forget the plaintext password. Depending on the overall architecture, maybe it is was provided by the user, or maybe it was system-generated and must now be emailed to the user. Either way, it must not be stored as plaintext.
Later on, when the user enters their user name and password to log in, look up the record by user name, then repeat the calculation in step 3 using the salt value retrieved from the record. The resulting hash (using the password being entered) can be checked against the stored hash from the database to determine if the user entered the right password.
If you do it right, your database’s hashes should now look totally scrambled and inscrutable to an unauthorized reader. (Reminder to self: next time must avoid blogging while hungry, especially about recipes for salted hashes and ordering crackable things as scrambled.)
Choosing a Hash Function
This blog post is not a complete treatment of the subject of server side salting and password hashing. Another important decision is what hashing function to use. A hash function in this context is typically chosen to be both secure and slow. But it’s also a moving target, as cryptographic standards must continually respond to rapidly advancing cracking capabilities. Somehow the very weak MD5 ended up as an entrenched hash function in very widespread use in the 90’s and aughts. (Boy, was that a short sighted mistake.) Many people are still using SHA-1, which wasn’t considered horrible just a few years ago, but really needs to be deprecated in favor of stronger options. I recommend you spend some time reading the links in this discussion to get a sense of what’s out there. I’m deliberately not giving a specific recommendation here, in order to reinforce that there is actually more than one possible answer, and also that the “correct answers” periodically change.
Don’t Try This At Home
My final advice may sound like it contradicts everything I’ve said thus far. But that’s OK. 🙂
If at all possible, you should not come up with your own implementation of these approaches. Ideally, you should rely on your framework libraries to provide high level, complete authentication and authorization wrappers. Or if not, at least you should find and integrate a secure implementation from a trusted source. I already showed you how some guy on the internet thought they were salting their passwords, but got it totally wrong; so definitely don’t trust random stuff you google up.
Be very cautious if you’re not a cryptographic expert. Certainly, you can and should learn the basics of information security, and use your knowledge to audit and critique your own systems. But any implementations you deploy to production should be from trusted frameworks, or at least closely follow standard industry best practices. Don’t assemble something off the top of your head, or it will almost certainly be cryptographically weak and defective.
Thanks for reading! You can take the black hat off now. Hopefully this was informative for somebody; if you have any questions or want to share your own advice for readers, I’d love to read your comments below.