How to Build a Billion Dollar Password in 2018
We use passwords to defend our banking information, medical records, and personal communications, but how much do you really know about this little string of characters you trust to protect your data?
Most people’s familiarity is limited to what they are told when signing up for a new service, something like “must be at least 8-characters long”, “include a number”, and “mix upper and lowercase characters”. And why would you want to know more? The rules must be there for a reason, so just follow them and you should be good, right?
Below, you will find a pretty detailed, but super beginner friendly introduction to the world of passwords. You’ll learn:
- What “strength” means when talking about passwords
- How we calculate the “cost” of a password
- Why humans are actually really bad at making passwords
- Why password rules are sort of a confusing painful mess
- Why computers are really good at cracking human-chosen passwords
- Why letting a computer make your passwords for you can make your life easier and more secure.
The strength of the lock and the key
The security encryption provides is analogous to that of a standard lock and key — the strength depends on that of both the lock _and_ the key being used. For example, a lock’s body and shackle can be quite strong, able to resist all sorts of advanced tools and techniques used to cut or pry locks apart, but if the key is easy to pick or the combination is easy to guess, that strong lock won’t do much to protect.
The strength of a key can be understood as uniqueness — how many unique keys were cut for that type of lock, or how many possible combinations does the lock have? Assuming the attacker cannot break the lock itself, how long would it take to find the right key (or combination) to unlock the lock?
For example, let’s say Alice wants to lock her private journal away in a combination safe to prevent her 9-year-old brother Bobby from snooping. Bobby has no lock breaking experience, but he does come home an hour before Alice does, so he has a lot of time to test different combinations on her safe, one by one. This form of attack is known as “brute-forcing”, as there is no special technique or skill involved other than simply trying various combinations until the correct one is reached.
Now, if Alice decided to use a simple 3-digit combination lock like the one above, there are 1000 possible combinations (000 to 999), but probability says that Bobby would only need to test 501 of those combinations to be likely to crack the code. If Bobby knew a little bit more about lock breaking, he might reduce his time by prioritizing numbers Alice might be likely to choose, like her birthday or favorite number. Not considering this, Bobby starts entering combinations one by one from zero, 0–0–0, 0–0–1, 0–0–2, etc…
Only needing to rotate one number at a time and pull, Bobby is able to try a new combination every five seconds. This means Bobby can try 12 combinations a minute or 720 combinations an hour! Using a lock with only 1000 key combinations, poor Alice would have come home to find Bobby reading her journal only one day after locking it away.
Luckily, Alice being older and wiser thought ahead and bought a dial combination lock. The dial lock’s body and shackle is just as hard to cut or destroy as the three-digit combination lock, but differs by requiring three numbers to be dialed in on each attempt to open the lock. The dial also has 60 possible values for each number (0–59) instead of only 10 (0–9). In total the dial lock has 216,000 possible combinations, and takes Bobby three times as long to test individual codes, since he needs to dial in three numbers on each attempt instead of rotating just one. With Bobby now only able to try 240 combinations each day before Alice gets home, it would take him 1.23 years to try even half the combinations.
Calculating the cost of a password
This question of time and resources needed for a successful attack, is at the foundation of how Peerio’s password standards were developed. Instead of just time though, we ask “how much would it cost to crack a password in a year?”
These cost estimates are usually generated in terms of how fast a computer can compute a certain function (like turning a number), and how much that computation costs in terms of hardware, energy, and time. Even a small variance in the estimated time and cost of performing a function can result in massively different estimates when we try to estimate the cost of performing quintillions (10¹⁸) of functions. So, how could we possibly estimate something like this? With the help of bitcoin miners, of course.
The following explanation is a bit techy, but bear with me. In a 2014 research paper on password memorability, security researchers Joseph Bonneau (Stanford) and Stuart Schechter (Microsoft) estimated the cost of an attack based on the total annual payout to bitcoin miners in 2013.
In 2013, Bitcoin miners collectively performed ≈ 2⁷⁵ SHA-256 hashes in exchange for bitcoin rewards worth ≈ US$257M… this is the only publicly known operation performing in excess of 2⁶⁴ cryptographic operations and hence provides the best estimate available. Even assuming a centralized effort could be an order of magnitude more efficient, this still leaves us with an estimate of US$1M to perform a 2⁷⁰ SHA-256 evaluations and around US$1B for 2⁸⁰ evaluations. (source)
Here we have the billion dollar password estimate — even for a centralized state attacker, it would cost about $1 billion US dollars to compute 2⁸⁰ SHA-256 hash functions over the course of a year. This is like saying it would cost $1 billion USD to try 2⁸⁰ lock combinations over a year.
Since an attacker would be ‘likely’ to guess correctly with just one guess after the halfway point, Peerio uses an 81-bit (2⁸⁰ times two) minimum standard for our computer generated passphrases. We chose this standard because we wanted to make sure even a state level attacker would need to drop $1 billion US dollars to have even a coin’s toss chance of cracking a Peerio passphrase.
Humans are bad at making passwords
When we were considering some of the human factors of Peerio’s security, one unfortunate problem we encountered is that people are generally pretty bad at choosing their own passwords. People have a tendency to choose easy to guess passwords. For example passwords revealed from the Ashley Madison hack show that at least 200,000 users were using the passwords “123456”, “12345”, or “password”.
Beyond simply using common passwords, data collected from password breaches over the years has revealed numerous ways humans are quite predictable in password creation:
- People tend to add a number only to the end of a password1
- 23% of the time the number that gets added is “1”
- Lots of people use simple lett3r repl4cements
- Almost nobody uses more than 10-characters in their password, and 8-characters is most common (and often a website’s minimum)
- People think keyboard patterns like “asdfghjkl” or “1qaz2wsx” are pretty clever.
(You can find all of this data beautifully visualized here.)
You can see that a lot of these patterns are adaptions to common password rules (8-character minimum, add numbers, mix case, etc.).
Now, this problem may not be quite as bad if these passwords were only being used in low stakes situations, but when people like a Senior Manager at IBM is using “123456” for a password, a PayPal senior engineer is using “ex1422”, and high ranking positions at Nike, Uber, The Linux Foundation, the BBC, and Facebook are simply using their first name and/or birthdate, it’s pretty easy to imagine how data could be used to extort businesses, political organizations, or individual’s personal privacy and security.
Human predictability makes security guarantees difficult
Even if told to simply come up with random content, it turns out humans just aren’t great at being truly random. People have trouble breaking away from linguistic heuristics — that is, the way we’ve learned to use language over the years. We are apt to follow certain patterns, such as using adjectives before nouns, or pairing certain words with others more often, e.g. “inclement” and “weather”.
This predictability makes estimates of “entropy”, or the real-world “randomness” of passwords, much more difficult to establish. For example, passwords like _qwerty1!_ or _p@$$w0rD_ may both have 8 characters and include letters, symbols, and numbers, but they both follow much more predictable patterns and appear much more often than _jE+5$)Sn or ]Y[T7t3A_, computer generated passwords following the same rules.
Password crackers are designed to exploit human biases
Unfortunately, tools used to crack passwords take advantage of this predictability. Freely available password cracking software on a home computer is able to attempt hundreds of thousands, if not billions of guesses per second (depending on the hash used). In a few minutes, these tools can run through massive lists of known (i.e. common) passwords, modifications to these passwords (e.g. letter replacement or number additions), known personal information (e.g. your birthdate or pet’s name), and more.
In fact, back in 2012 one coder was able to crack over 50% of the 6.5 million passwords leaked from LinkedIn in a single night using a couple home computers, a few known password lists, and some simple rules for cracking.
The result is that passwords like qwerty1!, p@$$w0rD, “[Service name]12”, and “[pet’s name][birthdate]“ which some services will mark as “good” or “strong” passwords for meeting some base requirements, are actually extremely weak in practice. These passwords are actually very “low entropy” because an attacker wouldn’t actually need to try all possible 8-character passwords with numbers, symbols, and mixed case — they would simply try common passwords first and would guess correctly within the first few seconds.
Let computers make your passwords
So, while we can’t reliably determine the strength (or entropy) of a human chosen password, we can do so when a computer picks random variables from a given list.
Computers are able to select random content without the learned biases humans are subject to. So instead of wondering whether one part of a password that a a person selected is strong enough, or requiring people to make long passwords within ridiculous rule sets, we can simply feed a lot of data into a computer and let it spit out a random string using that data. This method helps ensure we can generate passphrases with pretty reliable entropy estimates and thus meet our minimum standard for password strength — our $1 billion a year goal.
Passphrases: Easier to remember, harder to crack
The idea of using passphrases for storing stronger secrets has been around since at least 1982, when Sigmund N. Porter discussed the idea in his research paper, _A password extension for improved human factors_ (1982), but passphrases probably really came to prominence thanks to this xkcd comic:
The idea is simply that language comes more naturally to us than strange sequences of letters, numbers, and symbols. Since there are many more words in a language than there are possible characters to type, a five word passphrase can be end up being easier to type and remember than a random string of 10 characters. For example, compare the following computer generated passwords, each with similar entropy / strength:
Five word passphrases tend to be easier to remember because it’s only five chunks of data instead of 10–14. Words also tend to be easier to type, since we have some learned muscle memory for these motions — we type words whenever we use a computer. A 10 character password like _,eg-;UJ 0-L_ might be shorter, but try typing it!
Key-stretching: When slow is good
You may have noticed that this chart shows Peerio’s 5-word passphrase at 67.5 bits, a good bit shy of our 81-bit goal. Recalling our lock analogy, a dial-combination lock wasn’t simply stronger because it had more combinations, it also took longer for an attacker to dial in each combination. Something similar can be applied to passwords through what’s known as key-stretching — intentionally increasing the time it takes to compute the verification function (the function that makes sure it’s really you when you login to a service).
Peerio does this by passing your passphrase through “scrypt”, a memory-hard key derivation function that generates a secret key that increases both the computational and memory expense of the verification function. You can read more about Peerio’s specs and scrypt parameters here, but the short story is that by increasing the time and memory needed to computer this function, we effectively add around 14-bits of entropy to the verification function. Adding these 14-bits to the base 67.5-bits of a Peerio passphrase, we end up with an 81.5-bit passphrase and our goal of a $1 billion a year attack cost.
Why a billion? Why not a trillion?
Security is certainly one of those topics where many people seem to feel the mantra “more is better” applies unequivocally. However, in the case of computer generated passcodes, increases in strength generally come with some sacrifice to usability. There are really only two options:
- Change the parameters of the scrypt function to further increase the time of the verification function, but because this function affects both an attacker and legitimate users, this would start to result in a noticeable delay when logging in to Peerio.
- Add more words to the passphrase. Adding only one extra word will add ~13.5 bits to your passphrase and raise the estimated attack cost to around $11.5 trillion per year. We know some people have pretty strong beliefs about password strength, so we decided to include an option to use up to 10 words in your passphrase.
That said, while you could continue to increase passphrase strength in either of these ways, it may not really provide any additional real world security. A single billion dollar bruteforce attack doesn’t really fit into most organizations’ budgets. Back in 2013, it was revealed that the USA’s intelligence agencies collectively had a “black budget” of about $52.6 billion dollars, with no single agency requesting more than $15 billion for the year. We also know most of these agencies have other ways of extracting the data they’re after.
$1,000,000,000 passphrase vs. a $5 wrench
While we think the idea of measuring password strength in terms of cost/year is a pretty practical evaluation tool, it is important to remember that strong passwords are only one component of a good security ecosystem. An attacker doesn’t need to be working with a spy agency to hire a private investigator to follow someone around, or to shoulder surf passwords. A smart attacker will aim for the weakest link.
Most of us aren’t facing the evil villains like those in this comic, but rather simpler and more inconspicuous threats — perhaps a snooping sibling or family member, security breaches at companies we have shared our data with, or low-level criminal hackers around the world trying to crack passwords to steal, sell, or ransom our data. By setting sufficiently strong passwords, using end-to-end encryption, and offering additional tools like two-factor authentication, Peerio can help protect your data from a pretty wide array of threats.
Designing security for everyone
No one tool or practice will somehow magically make you invulnerable to the entire world of digital threats. Security is a practice made up of many small considerations in how we interact with people and technology, the tools we use, the information we share, and the precautions we take. We know this can all seem a little overwhelming at first, but we think developers can do their part to help by actively addressing privacy and security issues in the software we make.
Most people won’t take the time to learn the detailed considerations of choosing a good password or why services provide the guidelines they do — they will simply work within those limits. Not because they’re lazy or can’t understand, but because they might have better things to do than calculate password entropy or check their passwords against existing databases.
By setting a strong standard for our passphrases and ensuring all Peerio users are given this baseline of protection, we at Peerio hope to have made at least one part of your own security puzzle a little bit easier.
Passphrases are an excellent tool, but we replaced them in Peerio. See our follow up to learn why.