What to look for in every data breach report.
Update: see the end of the article for an update related to the 9/13 Epik data breach.
We often talk about how you and I should keep our passwords secure, most commonly by using a password vault or manager.
But how do websites work? We trust them to do the same: keep our passwords secure from hacking and exposure. How do they do that?
It turns out to be deceptively simple.
When it’s done correctly, of course.
Become a Patron of Ask Leo! and go ad-free!
Websites should only store what’s called a “one-way hash” of your password, not the password itself. The original password cannot be determined from only its hashed value. If a site can tell you your password, they’re doing security wrong. When you next hear of a data breach, pay attention to whether the password information is hashed or not. If it wasn’t hashed, that implies your password, if included, is out there for anyone to see.
Websites shouldn’t store your password
We’ll start with the counterintuitive magic: websites shouldn’t store your password. Period.
That leads to the question: how do they know you’ve entered your password correctly if they don’t store it somewhere?
What websites should store is called a “one-way hash” of your password. A hash is a complex calculation that generates a large number. A good hash has three very important characteristics:
- It is statistically impossible for two passwords to generate the same hash number.
- You can create a hash from a password, but you cannot recover the password from the hash.
- A small change in the password generates a large change in the resulting hash number, making it impossible to recover “nearby” passwords, even if you know the password/hash combination for some.
That second one is key.
Let’s look at an example. I’ll use everyone’s favorite password: “password”.
One hash1 for “password” is 126,680,608,771,750,945,340,162,210,354,335,764,377. (More commonly expressed in base-16 numbers, aka hexadecimal, as in 5f4dcc3b5aa765d61d8327deb882cf99).
So, if you take password and hash it, you’ll get 126,680,608,771,750,945,340,162,210,354,335,764,377.
If you hack a database and get that hash, all you have is 126,680,608,771,750,945,340,162,210,354,335,764,377 — you know nothing. There’s no way to take that number by itself and determine what password generated it.
Typing the right password
When you set up or change your password, the site you’re setting it with will calculate the hash and store that number. If you enter “password” as your password, then our example site will store “126,680,608,771,750,945,340,162,210,354,335,764,377” in its database, along with your user ID and/or email address.
Now, days or weeks later you come back to the site and sign in. Here’s what happens:
- You enter your password (“password”, in our example).
- The site calculates the hash (126,680,608,771,750,945,340,162,210,354,335,764,377 in our example).
- The site compares the hash it just calculated against the hash it stored when you set your password.
- If they match, you must have typed your password correctly, since only the exact same password would generate the exact same hash.
- If they don’t match, you didn’t type the expected password that goes with the expected hash.
That’s really all there is to it. Aside from some complex math to generate the hash, it’s pretty simple.
Telling you your password
I repeatedly used the word “should” above.
A website doing password security correctly should only store the password hash, not the actual password. Since there’s no way to go backward — recovering the password from the hash — that means a website using proper security cannot tell you what your password is. They can only tell you that you typed it correctly or not.
Unfortunately, not all sites do security correctly, and there’s at least one way to test:
If a website can tell you your password, then they’ve got that password stored as-is in a database the staff can access.
That’s poor security because your password could be exposed in a breach.
Passwords and data breaches
The next time you hear of a large data breach, particularly if it’s happened at some online service you use, pay careful attention to the wording describing the information that was exposed.
For example, here’s a description of a recent breach from HaveIBeenPwned:
In June 2020, the restaurant solutions provider OrderSnapp suffered a data breach which exposed 1.3M unique email addresses. Impacted data also included names, phone numbers, dates of birth and passwords stored as bcrypt hashes. The data was provided to HIBP by dehashed.com.
(Emphasis mine.) The passwords were stored as hashes. In this breach, passwords were not exposed. While there was other information included in the hack, the most sensitive of all — passwords — had been stored correctly.
Contrast that with this description:
In January 2019, a large collection of credential stuffing lists (combinations of email addresses and passwords used to hijack accounts on other services) was discovered being distributed on a popular hacking forum. The data contained almost 2.7 billion records including 773 million unique email addresses alongside passwords those addresses had used on other breached services.
There’s no mention of hashing at all. This particular breach included actual passwords. Wherever those passwords came from, there was a lapse in security of some sort.
This is what you pay attention to: were the passwords exposed in a breach hashed? If not, change your password on the impacted service immediately. If they were hashed, you can still change your password if you like — and perhaps you should for other reasons — but you can be somewhat less concerned that your password is “out in the wild”.
There’s more to security than passwords
It’s important to note that of course, there’s much more to website security than whether or not passwords are hashed. Let’s face it: data breaches shouldn’t happen, either, but they do.
Hashing is just one (very important) part of website and online service security, which encompasses everything from keeping the web servers themselves secure and malware-free, to using proper database and software security, to ensuring only the proper personnel have access to sensitive information.
It’s a complex world, and easy to get wrong, as every breach we hear about reminds us.
But the next time you hear of yet another breach, now you’ll have at least one thing to look at to determine just how security-conscious the service was and how worried you should be.
A quick note to pedants
Since this type of overview tends to bring out those with an eye for excruciating detail and minutia, one small caveat:
This is only a high-level overview to make the concepts accessible to more people. Of course, password management implementation details can get very complex. If you’re about to comment with a complaint that I didn’t discuss different hashing algorithms (ugh), or that MD5 shouldn’t be used for passwords (I agree), or that password hashes should be salted (ditto), and why didn’t I talk about rainbow tables (hoo, boy) . . . don’t.
Those concepts were never the point.
Update 2021-09-19: an example breach
On September 19th, 2021 Have I Been Pwned sent notifications to registered users involved in the so-called “Epik” data breach.
The notification does indeed talk about passwords, specifically “passwords stored in various formats”. Unfortunately, that’s not a lot to go on and tells you nothing about how concerned you should be.
If you are an actual Epik customer, assume the worst and change your password immediately. Also, if you’re using that same password anywhere else, change it in all the places you’ve used it. Take this opportunity to stop re-using passwords and set them all to something unique.
The scraped “WHOIS records” will not include passwords. This is already generally public information about who owns domains on the internet; for example, it’ll tell you I own askleo.com. Nothing particularly alarming here.
But what about . . . anything else? It seems that only passwords of Epik customers were exposed (in various formats), but there’s no indication of any other account-related information we need to be concerned about.
But, as always, it pays to remain watchful for unexpected activity on your accounts.
Footnotes & References
1: There are several different hash algorithms that all have different strengths and weaknesses. Examples here all use MD5. And while so-called “salt” should also be used to further secure password hashes, I’m not going to address that in this basic overview.