What to look for in every data breach report.
Update: see the end of the article for an update related to the 9/13 Epik data breach.
We often talk about how you and I should keep our passwords secure, most commonly by using a password vault or manager.
But how do websites work? We trust them to do the same: keep our passwords secure from hacking and exposure. How do they do that?
It turns out to be deceptively simple.
When it's done correctly, of course.
Become a Patron of Ask Leo! and go ad-free!
Websites should only store what's called a "one-way hash" of your password, not the password itself. The original password cannot be determined from only its hashed value. If a site can tell you your password, they're doing security wrong. When you next hear of a data breach, pay attention to whether the password information is hashed or not. If it wasn't hashed, that implies your password, if included, is out there for anyone to see.
Websites shouldn't store your password
We'll start with the counterintuitive magic: websites shouldn't store your password. Period.
That leads to the question: how do they know you've entered your password correctly if they don't store it somewhere?
What websites should store is called a "one-way hash" of your password. A hash is a complex calculation that generates a large number. A good hash has three very important characteristics:
- It is statistically impossible for two passwords to generate the same hash number.
- You can create a hash from a password, but you cannot recover the password from the hash.
- A small change in the password generates a large change in the resulting hash number, making it impossible to recover "nearby" passwords, even if you know the password/hash combination for some.
That second one is key.
Let's look at an example. I'll use everyone's favorite password: "password".
One hash1 for "password" is 126,680,608,771,750,945,340,162,210,354,335,764,377. (More commonly expressed in base-16 numbers, aka hexadecimal, as in 5f4dcc3b5aa765d61d8327deb882cf99).
So, if you take password and hash it, you'll get 126,680,608,771,750,945,340,162,210,354,335,764,377.
If you hack a database and get that hash, all you have is 126,680,608,771,750,945,340,162,210,354,335,764,377 -- you know nothing. There's no way to take that number by itself and determine what password generated it.
Typing the right password
When you set up or change your password, the site you're setting it with will calculate the hash and store that number. If you enter "password" as your password, then our example site will store "126,680,608,771,750,945,340,162,210,354,335,764,377" in its database, along with your user ID and/or email address.
Now, days or weeks later you come back to the site and sign in. Here's what happens:
- You enter your password ("password", in our example).
- The site calculates the hash (126,680,608,771,750,945,340,162,210,354,335,764,377 in our example).
- The site compares the hash it just calculated against the hash it stored when you set your password.
- If they match, you must have typed your password correctly, since only the exact same password would generate the exact same hash.
- If they don't match, you didn't type the expected password that goes with the expected hash.
That's really all there is to it. Aside from some complex math to generate the hash, it's pretty simple.
Telling you your password
I repeatedly used the word "should" above.
A website doing password security correctly should only store the password hash, not the actual password. Since there's no way to go backward -- recovering the password from the hash -- that means a website using proper security cannot tell you what your password is. They can only tell you that you typed it correctly or not.
Unfortunately, not all sites do security correctly, and there's at least one way to test:
If a website can tell you your password, then they've got that password stored as-is in a database the staff can access.
That's poor security because your password could be exposed in a breach.
Passwords and data breaches
The next time you hear of a large data breach, particularly if it's happened at some online service you use, pay careful attention to the wording describing the information that was exposed.
For example, here's a description of a recent breach from HaveIBeenPwned:
In June 2020, the restaurant solutions provider OrderSnapp suffered a data breach which exposed 1.3M unique email addresses. Impacted data also included names, phone numbers, dates of birth and passwords stored as bcrypt hashes. The data was provided to HIBP by dehashed.com.
(Emphasis mine.) The passwords were stored as hashes. In this breach, passwords were not exposed. While there was other information included in the hack, the most sensitive of all -- passwords -- had been stored correctly.
Contrast that with this description:
In January 2019, a large collection of credential stuffing lists (combinations of email addresses and passwords used to hijack accounts on other services) was discovered being distributed on a popular hacking forum. The data contained almost 2.7 billion records including 773 million unique email addresses alongside passwords those addresses had used on other breached services.
There's no mention of hashing at all. This particular breach included actual passwords. Wherever those passwords came from, there was a lapse in security of some sort.
This is what you pay attention to: were the passwords exposed in a breach hashed? If not, change your password on the impacted service immediately. If they were hashed, you can still change your password if you like -- and perhaps you should for other reasons -- but you can be somewhat less concerned that your password is "out in the wild".
There's more to security than passwords
It's important to note that of course, there's much more to website security than whether or not passwords are hashed. Let's face it: data breaches shouldn't happen, either, but they do.
Hashing is just one (very important) part of website and online service security, which encompasses everything from keeping the web servers themselves secure and malware-free, to using proper database and software security, to ensuring only the proper personnel have access to sensitive information.
It's a complex world, and easy to get wrong, as every breach we hear about reminds us.
But the next time you hear of yet another breach, now you'll have at least one thing to look at to determine just how security-conscious the service was and how worried you should be.
A quick note to pedants
Since this type of overview tends to bring out those with an eye for excruciating detail and minutia, one small caveat:
This is only a high-level overview to make the concepts accessible to more people. Of course, password management implementation details can get very complex. If you're about to comment with a complaint that I didn't discuss different hashing algorithms (ugh), or that MD5 shouldn't be used for passwords (I agree), or that password hashes should be salted (ditto), and why didn't I talk about rainbow tables (hoo, boy) . . . don't.
Those concepts were never the point.
Update 2021-09-19: an example breach
On September 19th, 2021 Have I Been Pwned sent notifications to registered users involved in the so-called "Epik" data breach.
The notification does indeed talk about passwords, specifically "passwords stored in various formats". Unfortunately, that's not a lot to go on and tells you nothing about how concerned you should be.
If you are an actual Epik customer, assume the worst and change your password immediately. Also, if you're using that same password anywhere else, change it in all the places you've used it. Take this opportunity to stop re-using passwords and set them all to something unique.
The scraped "WHOIS records" will not include passwords. This is already generally public information about who owns domains on the internet; for example, it'll tell you I own askleo.com. Nothing particularly alarming here.
But what about . . . anything else? It seems that only passwords of Epik customers were exposed (in various formats), but there's no indication of any other account-related information we need to be concerned about.
But, as always, it pays to remain watchful for unexpected activity on your accounts.
Do this
Subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.
I'll see you there!
Podcast audio
Footnotes & References
1: There are several different hash algorithms that all have different strengths and weaknesses. Examples here all use MD5. And while so-called "salt" should also be used to further secure password hashes, I'm not going to address that in this basic overview.
How secure are password managers against being cracked? Your password to open the manager must be strong but, even if it is, how resistant are the PW managers themselves to being cracked?
BTW, I highly recommend that people lock all three of their credit bureau reports to help prevent identity theft. They can be selectively unlocked for a brief time…perhaps just for a specific firm to access…then they relock automatically.
A good password manager like LastPass uses the same techniques as described in this article. Websites that do security right take it one step further and use a salted hash which is essentially, adding a few more characters to the beginning or end password before hashing it to make it impossible for rainbow tables or brute force to cracke it.
Sorry Leo, I mentioned two of the things in your pedantic list ;-)
Funny thing about salted hash. In the supermarket, it’s another word for Spam. ;-)
There is a fourth credit bureau out there, called Innovis (innovis.com). They are based out of Pittsburgh, PA and, while not as well known as the other three, they do generate credit reports and have the same access to information as the others. I put a freeze on that one as well.
That’s true, but be careful. If your password is too short, the hash of that password is already published in “rainbow tables”. Hackers have computers creating hashes of all possible combinations of characters and creating a database of the hashes in what’s called rainbow tables. That’s why it’s necessary to use long passwords and the recommended password length is ever increasing. Now a recommended password is 14-20 characters where a few years ago, 10 was considered sufficient.
correcthorsebatterystaple 24 characters is better than *j*6j90@#% as it’s likely that a hash has been created for *j*6j90@#% and added to rainbow tables. If you are compelled to change your password, adding one letter to that password, correcthorsebatterystapler, for example, is as effective as changing the entire password string.
Warning, don’t use correcthorsebatterystaple as a password. It’s too common and has likely been added manually to rainbow tables. ;-)
If bad guys learn your password, the damage is mitigated (a bit) by NEVER re-using a password. Yes, it is a pain to keep track of 386 passwords, but it is necessary. Security always entails an increased hassle. You could get into your home faster if there was not lock on the front door, but …
A password manger is one approach, but for many people, the wrong one. Writing passwords on paper is a great solution for many people. There are other approaches too.
As for choosing a password, many end with a number, so putting your numbers elsewhere makes you more resistant to force guessing. Many passwords start with a capital letter, so again, don’t do that. And, the longer the better even if you are just stringing together words.
You said, that a password manger is one approach, but for many people, the wrong one. I’m curious which people a password manager would be wrong for.
If different passwords ALWAYS have different hashes, then the hashing function is one-to-one and therefore has an inverse. So it would always be mathematically possible to recover a password from its hash.
The mathematics is correct, so what am I missing?
Nope. It’s not one-to-one. It’s one to almost-always-one. In theory, it’s possible for two strings to generate the same hash. In practice, it’s impossible to intentionally make that happen. There’s no way to reverse the process, other than using a table of all possible hashes and the passwords that generated them (aka a rainbow table).
Using what’s called “salt” with a hashing algorithm alters the computational results such that the table generated by service’s “A” use of a specific hashing algorithm causes it to generate different values than service “B” using the same algorithm, meaning that a rainbow table for A would be invalid for B, and so on. It’s also why long passwords are so important, as they make a complete rainbow table impractical. It’s also why using unique passwords is so important because it makes using a rainbow table of known passwords useless.
How can lastpass retain a list of past PWs for a given site?
I’m not sure I understand the question — the job of any password vault is to keep those passwords in an encrypted database. Since the passwords need to be entered as if you’d typed them, the actual passwords are stored, again, securely encrypted behind your masterpassword.
Leo – I think I get it with Password Managers. In order for Password Managers to fill in a password on a website, you first need to be logged into your Password Manager with your master password (Make that password as strong as all get out). THAT password is probably (hopefully) hashed. Once you log in to your Password Manager and give it your master password, the Password Manager can then retrieve your other website passwords from a heavily encrypted file. I would hope they keep website URLs, website UserIDs, website passwords in separate heavily encrypted databases, linked by an internal ID that they use. If not, and if their one encrypted database is hacked, the hacker will have access to the website URL, a UserID, and a password.
What you’ve described is what I understand that LastPass does. The important caveat is that LastPass stores an encrypted “blob” which contains the database of everything that you’ve put in LastPass. When you provide the correct credentials the encrypted “blob” is passed back to the application or browser extension that you are using. Decryption of the “blob” occurs locally in the client application that you are using on your computer or phone. The same applies to additions and changes to the entries in LastPass.
If LastPass retains a history of passwords for a site, that information should be encrypted in the same manner as other information that you put into LastPass.
A more interesting question is how does LastPass allow you to go back to a previous password for getting into LastPass? Could be one of two ways… 1) they keep a record of the hash of the previous password and the current password and permit the older password to release a decryption key associated with the account. 2) They create a separate duplicate of your encrypted “blob” that can be accessed by the older password (effectively maintaining a record of multiple revisions).
One question: how does the hashing system work on websites that ask for a subset of one’s password characters? For example, on one login my bank may ask for the 2nd, 5th and 12th characters; on the next login, it may ask for for the 6th, 10th and 13th characters. I don’t understand how such a subset can be validated against a hash of the whole password.
This is called partial password. The theory (and that’s all it is) is that it may protect from keyloggers or actual observer watching you type a password. These are very unlikely threats. How it’s implemented can be as stupid as having to keep your actual password in a database (and don’t underestimate even a bank’s stupidity) to some very convoluted algorithms. For example, a simpler algorithm is to keep hashes of all combinations of the partial passwords. There is a lot of debate whether any partial password method is secure. From what I’ve seen the current consensus is that it’s not secure compared to hashing of the entire password, but depends on the algorithmic convolution. I see it as good feed material for PhD dissertations for a few years until it gets hacked.
It can’t be. I can think of two approaches: they may keep your actual password, OR they may keep a set of hashes of a pre-defined subset of characters.
Leo, you wrote:
“It is statistically impossible for two passwords to generate the same hash number.”
That sentence is missing an important word: the word “different.” As in…
“It is statistically impossible for two different passwords to generate the same hash number.”
I.e., the likelihood of “collisions” is nearly nonexistent.
Although you did end up clarifying this point later on, I would have been a lot more comfortable if you had been clear on this matter from the outset.
I don’t think most data breach notifications would tell the customers whether the passwords were breached or not. They simply would not include that sentence, especially the phrase “passwords stored as bcrypt hashes”, as most end users wouldn’t know what a “bcrypt hash” is anyway.
From what I’ve seen . . . mostly via HaveIBeenPwned . . . the phrase “and hashed passwords” is actually quite common. Security-knowledgable people read those reports as well, and if there’s nothing about passwords then it’s the FIRST thing they ask.
I just finished submitting the following on the Ask Leo page:
I have software that requires me to change my password every 90 days, which I dutifully do by simply changing two characters in the password to correspond to the quarter of the year, thereby enabling me to enter the password from memory. On other sites, when I have to submit a new password they may not accept a similar change. How is it, if they only have a hash of my password, that they can tell my new one is similar to my old one?
And followed a link to this discussion, which may have answered my question but I’m still uncertain. Would anyone care to expound?
They can save the hashes of your previous passwords to ensure you don’t re-use any old ones.
To judge “similarity” the only way I can think of to do that is to store the actual password. Both (judging similarity and storing the password) are bad ideas.
And I answered
“If they only have the hash, they can’t tell that you used a similar password. If they know it’s similar to a previous password, it appears they are doing security wrong and storing your password unencrypted.
Try a password recovery on their site to see if they send you your password. If they do, you know they are doing it wrong.”