Last Sunday a dead hard drive reminded me that the TODO to upgrade and automate my cloud backups needed to be moved from “do when I have time” list — which for me is hard to distinguish from the “never” list — to the “do now” list. I planned to user SpiderOak as my provider: I like their “zero knowledge” approach where, without the password that only I have, nobody can decrypt my data. A bit more shopping found Bitcasa, which offers the same zero knowledge storage, plus unlimited capacity for a fixed price. Sounds like a great deal. However, Bitcasa doesn’t give quite as much information about how they secure my data, so I decided to look a bit and see if I could confirm that my data would be as secure as it is with SpiderOak. For those who don’t want to read the whole post, the answer I found is for Bitcasa is “No, your private data could leak a bit.”
Convergent Encryption
A bit of Googling lead me to this interview with the CEO of Bitcasa, where he explains they rely on Convergent Encryption to help save space and make unlimited storage a practical reality. With convergent encryption, each file is encrypted with a key generated from the unencrypted contents (plaintext) of the file. That means that if two different users upload copies of the same file, they will result in identical encrypted data and only one copy of the data needs to be stored. Which in theory saves Bitcasa some money and lets the offer you unlimited storage.
File Confirmation Attack
The downside is that, unlike the SpiderOak case, if someone — Bitcasa employee, hacker, law enforcement agency — has access to your encrypted data they can easily tell if you have a copy of a certain file by simply comparing your encrypted data with encrypted data they created by encrypting a copy of the file themselves. Do you care? Probably not too frequently, but if you happen to posses a copy of a banned book, or a movie illegally downloaded from BitTorent, you might.
Exposing Unshared Data
But more troubling is that under limited circumstances Bitcasa can end up leaking private data that you don’t have in common with anyone. Suppose you have a large file that contains a lot of boilerplate — maybe a letter from your bank that they send out to all their customers — plus a small amount of individual data — maybe the PIN for your ATM card.
Bitcasa will actually encrypt that letter by breaking it into blocks and using convergent encryption on each block. Let’s say it takes 10 blocks to encrypt the data. Nine of those blocks will be the same for everyone who has that letter, so an attacker that gets your private data can confirm you have a copy of the letter, and they will know the 10th block contains your 4 digit ATM PIN. All they need to do to find what your PIN is, is try encrypting all the possible pin numbers and see which one matches the 10th block they have. They can make that attack offline, which means it is should be quite tractable for the attacker to brute force a small amount of personal data like a PIN, password, SSN, etc if they can identify its location in the context of a larger encrypted file.
Certainty
Am I certain that Bitcasa suffers from this potential problem? No, I’m having to infer what they’re doing from their very limited public statements. Maybe they’ve figured out a way to make convergent encryption work without those vulnerabilities. But I do think I’m entitled to assume the problem is real until they provide a detailed, credible explanation of why this isn’t a problem for them.
Conclusion
Backups are, by their nature, supposed to stay around for a long time. With their choice of Convergent Encryption, Bitcasa has created a hard job for themselves. They need to have well designed encryption, and make sure that nobody — via a software bug, badly configured server or human malice — gets their hands on my encrypted data. Long term, I don’t think that’s an easy thing to do with data in the cloud.
Comparatively, a service like SpiderOak has an easier job. As long as their encryption is well designed then even if someone gets my encrypted data, they cannot access it without the password that only I know.
Ultimately security isn’t a yes/no proposition; it’s up to each user to decide if the advantage of getting unlimited storage for a fixed price is worth the added risks that convergent encryption brings that their data might be exposed. For me the answer is “No”: the cloud isn’t a safe enough place long-term for that sort of risk. You should decide for yourself.