Errata Security: LinkedIn vs. password cracking

I'm running through the LinkedIn password hashes right now, so I thought I'd do a live blog of the steps I'm doing. As I do each step, I'll update this blog live. When you reach the end, chances are good I'll be updating it again in a few hours.

Contents

The file is simply a list of the raw hashes, containing no username or other account information. We can assume that the original hacker has that information, but since it's of little use to cracking, we don't have it ourselves.

The following shows the contents of the file my simply dumping the contents on the command-line "more combo_not.txt".

One issue becomes apparent: about half the file has had the first 5 characters zeroed out. This is discussed at ycombinator. Atom has released a version of his Hashcat password cracker to deal with this. John-the-Ripper apparently also has published a patch for this.

Special hashcat version released. Its able to crack the "masked" 3.5mil hashes in your 6.5mil hashlist, use -m 150 hashcat.net/files/hashcat-…
— hashcat (@hashcat) June 6, 2012

In the meanwhile, I'm going to split the file into two. To do this, I type "grep -v ^000000 combo_not.txt > linked.hashes". I'm running Windows, but I do this command in a Linux VM on the same machine. Unix command-line programs like "grep" are too darn useful.

Look carefully at brief few lines shown above. The file is oddly sorted, with the first few bytes random, followed by a rough sort order of the remaining bytes. This is an artifact from something, either in the manner the hacker originally got the file, or in some tool he's been using to crack the file.Internally, tools will often sort these things in order to make multi-hash lookups faster.

Dictionary Crack

The first thing hackers try is the "dictionary crack". Brute-force password cracking takes time, but quickly looking up passwords in a dictionary is very fast.

The most common dictionary to use is the "RockYou" dictionary. This was from a massive dump from a few years ago from a popular gaming site. Unlike LinkedIn, RockYou stored their passwords in the clear. Thus, it serves as a useful mega-dictionary to crack passwords with.

This took only a few seconds to run. Notice that it wasn't very successful, finding only 93 passwords. But we expected that. According to the discussion on YCombinator, the hacker had already run a dictionary crack, and marked those passwords with leading zeros.

So let's try the other file containing the corrupted hashes using the updated Hashcat feature on the other file, containing all the zeroed-out hashes. This is shown below:

As you can see, this straight dictionary lookup results in 688-thousand passwords being cracked, or about one fifth of all the zeroed hashes.

The reason for such a small hit rate is that the original hacker probably tried dictionary words plus mutations. So, I'm going to try that next. Instead of using mutations with the entire RockYou dictionary, I'm going to use smaller dictionaries, like the one that comes with Cain+Able, John the Ripper, common Facebook names, and English words. I'm going to use the "best64.rules" of common mutations that comes with Hashcat, though they are coming up with a better best64 list of mutations, I don't have that list handy at the moment to use.

[OOPS. I made a mistake and chose the wrong attack method, using "permutations" instead of "straight". This would run much faster if I turned this off. However, since it's already been running for a while, I'll just let it finish the job, and see if anything interesting comes out of it.]

Dictionary cracks are fine, but brute-force cracks are also useful. There is a limit to how fast you can do brute-force.

To brute force, I'm switching to "oclHashcat+" instead of normal "hashcat". This uses OpenCL acceleration on the GPU, and runs about 10 times faster, but it's more limited on the complexity of the mutation rules. For simple brute-force, it's very nice.

Let's start with 5 character passwords containing the full gamut of Upper/lower case, digits, and symbols/punctuation.

As you can see from the "Recovered:" line, it found zero passwords. Thus, this means that the original hacker already did this brute-force. That's reasonable, it only takes 16 seconds to run this.

Now let's try to brute-force 6 character passwords and see what results we get.

As you can see, whereas 5 characters got us nothing (they'd already been cracked), brute-forcing 6 characters go us 32,163 passwords.Below, I dump the output file showing some of these:

Note that you see "patterns" here that are purely an artifact generated by the password cracking program. The last letter is always 'a'. As we scroll down in the file, this progresses to 'b', then 'c', and so on. That's as you expect from a 'brute-force' cracker, which tries each letter in turn. But weirdly, the second letter is also 'a'. That's probably related to how password attempts are distributed among the GPU cores. A graphics processor has 2000 slow processors rather than a few fast processors, each one cracking a different set of passwords at a time.

What we see here is that a lot of these short passwords are still based on dictionary words with minor mutations, like "Ram0na". That means for longer passwords for which we cannot brute-force, we can maybe tweak the mutation rules more to do a better job.

Is 6 characters long for a password? From the RockYou file, that was the most common length of password, accounting for roughly 25% of all passwords.

Or, we can graph this by looking at that length or shorter:

The thing that most people don't understand about passwords is that brute-force is an exponential problem. The amount of time it takes quickly grows out of all reasonableness. I've created a graph of this below:

People have the misconception that massive increases in performance lead to massive differences in password cracking, but it doesn't really. Moving from my desktop processor to a GPU that's 20 times faster only slightly increases the length of password I'm able to brute-force. Even going to a 1000 instance Amazon EC2 cluster with super-computer performance doesn't dramatically increase password lengths that I'm able to crack.

Although, that difference happens to be in the "sweet spot" of password lengths, so maybe it can make a difference.

As you can see at this point, my cracking processes are running in the background, so I'm busy playing with Excel instead to produce these graphs. My main CPU is still churning away doing a mutated-dictionary attack on the "zeroed.hashes", and my GPU is busy with a 7-letter brute-force fo the "linked.hashes".

I started a job with the "best64.rules" from oclHashcat, using not just RockYou, but a few other wellknown dictionaries. This is what the command-line looks like at startup:

It spams the screen for a bit, but here's what it ends up at:

As you can see, in 12 seconds, it found 31k hashes, out of a list that had already been purged of all easily crackable passwords

Running the same "best64.rules" over the "zeroed.hashes" list using the "-m150" version of Hashcat leads to the following result:

This recovered about 40% of the zeroed passwords, or 1.4 million out of the 3.7 million. That's to be expected, as these are passwords that the hacker already found, so should be easily found by us.

---
I just saw this Tweet go buy about the original forum post:

By request: copy of the original forum post discussing the 6.5M 'LinkedIn' passwords dl.dropbox.com/u/3077/6.5kk_S… (dug out of my browser cache)
— ashkan soltani (@ashk4n) June 7, 2012

Here is a screen shot of that link. You can't see it at the tiny resolution I've shrunk this to, but you can click on it to expand and read it. What you see here is how the "password cracking underground' works. When hackers break in, they distribute the password lists to other people, who each works on the file trying to use their own methods to crack passwords that others may not have found. This usually means custom dictionaries, as well as custom mutation rules applied to those dictionaries. The InsidePro forums are full of this stuff. They have removed this post, because of course, of the press involved, but there are plenty of other posts like this from smaller password dumps.

----
Oh, crap, I just found out that I'm using my older Radeon HD 6970 instead of the newer Radeon HD 7970. That's a big difference. Actually, it's good that way. I'll just stick the new card in another machine, and let it run around the clock without having to disturb my main machine.

-----

An explanation of HashCat's modes.

First, you need to decide on "hashcat" vs. "oclHashcat-plus" vs "oclHashcat-lite". The first uses the CPU and has the most features. The last uses the GPU and has the best speed, but the fewest features, and can only crack one password at a time. The one in the middle, "oclHashcat-plus", has a good set of features and GPU speed, so it's the one I use the most. You also have a choice of 4 programs to choose from, either the Linux or Windows version, at either 64 or 32 bits.

The command-line parameters for "oclHashcat-plus" are:
--hash-type 100

There are a lot of hashing algorithms. Currently, 27 different algorithms are supported. The number 100 represents "SHA1", the algorithm used for LinkedIn.
--attack mode 3: There are many attack modes. The most common are the "mutated dictionary attack" (1) and the "brute force" attack (3).
--custom-charset1 ?l?u?s?d: You have to configure your charset. You can configure different charsets for different letters, such as using upper-case only for the first letter, and a symbol-digit for the last letter. The symbol ?l means lower-case, the symbol ?u means upper-case, ?s means symbol/punctuation, and ?d means numeric digit. Using the "full" charset of all letters (upper/lower), digits, and symbols is 96 characters.
--outfile combo_not.out: This tells the program where to save the passwords it finds
--outfile-format 2: I like to just save the passwords alone, which is format number 2.
combo_not.txt: After all the options, the next expected input is the file containing the hashes.
?1?1?1?1?1?1: Lastly, it expects the "mask" of patterns to try. To try for a 6 character password, use the mask "?1?1?1?1?1?1". The number '1' refers to "charset1" from above.

Thus, to brute-force six-character passwords, you can run the following:

oclHashcat-plus64.exe --hash-type 100 --attack-mode 3 --custom-charset1 ?l?u?s?d --outfile combo_not.out --outfile-format 2 combo_not.txt ?1?1?1?1?1?1?1

---
I've been trying to debug something with oclHashcat. It appears that while the Radeon 7970 is 30% faster at cracking a single password (2-billion hashes/second) than the Radeon 6970 (1.3-billion hashes/second), it's slower at multi-hash when given the entire LinkedIn file, doing only 200-million/sec vs 400-million/sec. This seems wrong, because the newer card is has a much better memory subsystem than the previous card, and in other tests I've done, has been much faster at multi-password cracking.
--
Use --remove. Oh, I found the source of the problem: I was using original file on the faster processor, which is twice as large as the one cut down removed the "zeroed" hashes. Memory lookups on GPUs are slower with the larger amount of memory. Thus, shrinking the file makes a big difference in speed. I wonder if splitting the file into small chunks that fit better within the GPU cache might work better.

In the meanwhile, though, simply removing from the source file all the passwords found so far (about 2 million of them) seems to improve the speed by quite a lot

--
Hashcat has updated their tools to support the zeroed-out hashes (to ignore the first few bytes when matching SHA1 hashes). Info is here:

Still have fun with the linkedin list? Here is a GPU accelerated oclHashcat-plus special version with -m 150 support: hashcat.net/files/oclHashc…
— hashcat (@hashcat) June 7, 2012

Status

I left a bunch of stuff running over night, and have about 50% of all the passwords cracked. To summarize what I did:

First, I did a dictionary crack of some very large dictionaries. This took seconds, and got a large number of passwords. I'll rerun the numbers later, but it's like a third of all the passwords.

Second, I did a brute-force up to 6 characters. It appears LinkedIn has a minimum length of 6, so you won't find shorter passwords. This took 18 minutes. Going to 7 characters will take 3 days to complete, so I'm letting that run on a separate machine while I do shorter jobs on the main machine.

Third, I did "mutated dictionary" attacks. I used several basic dictionaries, such as the RockYou list, as well as the dictionaries that come with such tools as Cain+Able, John-the-Ripper (JtR), and a list of Facebook names. I ran through all the mutations in the "rules" directory that comes with Hashcat. This found quite a few new passwords not found by the other techniques.

Fourth, I'm doing what Hashcast calls a "hybrid" attack that combines a dictionary either prefixed or followed by a brute-force. For example, right now, I'm runnign a job that does all the words in the RockYou dictionary followed by six lower-case/digits/numbers.

The first jobs took little time, so I rapidly updated this blog post as I did every little thing. Since then, updates have been coming slower as the two computers spend more time crunching numbers.

30 comments:

Dan said...: I'm impressed. Keep up the good work.; 11:50 PM
disclosure said...: Searchable DB is available at http://dazzlepod.com/linkedin/; 2:22 AM
Anonymous said...: Enjoyable read! Thanks.; 2:58 AM
Unknown said...: While interesting, aside from making the point that long and complex passwords are essential, what further is to be gained by continuing to crack the secure hashes for the original hacker?; 4:12 AM
Anonymous said...: findstr /b is windows equiv of grep hat; 4:42 AM
Anonymous said...: having alot of luck with the GAWKER dictionary :); 7:10 AM
Anonymous said...: Haha we won new dictionary...I cracked 150.000 with inremental john 2 days i'll wait more and i will crack 3.000.000; 9:28 AM
MCAndre said...: Don't forget rainbow tables.; 1:58 PM
Anonymous said...: All your base are belong to us

Joe Buffufna; 6:39 PM
Anonymous said...: My cracked password was two antonyms, like Smart%Stuped, misspelled, not in English, and more than 15 characters long. How did they crack it? With Cain & Abel and foreign dictionaries?

What are the real rules for a strong and memorable password? Would four random words work? What about a sentence? They cracked stuff like ilovemyson and iwantanewjob. Would a 20-digit number composed of other numbers be secure?; 11:35 PM
Unknown said...: I must admit, 50% *already* is faster and more than I would have guessed. I still say that you have proved the point and nothing further needs be done.

I wonder if the site had simply been using at least SHA2-256 (unsalted) how much more time it would have taken, or SHA2-512.; 2:59 AM
cheap hosting said...: Is Possible? Have Fear Of Crack Of Linkdin; 3:35 AM
epixoip said...: This comment has been removed by the author.; 5:42 AM
epixoip said...: @Jeremy Collake,

Raw SHA-256 is about 2.8x slower to crack than SHA-1, and raw SHA-512 is about 21x slower than SHA-1.

However, the choice of algorithm is only one of the problems. Yes, SHA-1 was a poor choice, but the deeper issues are 1. they used a raw hash, and 2. the hashes were unsalted.

Anything less than using a crypt() algorithm with random, unique salts for each password in the database is unacceptable IMO.; 5:49 AM
Lo Sauer said...: This comment has been removed by the author.; 3:33 PM
Lo Sauer said...: I did a follow up
analysis

http://www.lsauer.com/2012/06/linkedin-lastfm-eharmony-analysis-of.html; 3:35 PM
Anonymous said...: I wrote a script to generate permutations of linkedin word that can be added to dictionary (
you can control the complexity of permutations by extending array)

http://pastebin.com/iuXEQa80; 4:19 PM
Unknown said...: This comment has been removed by the author.; 3:43 AM
Unknown said...: This comment has been removed by the author.; 3:48 AM
Unknown said...: This comment has been removed by the author.; 3:55 AM
Unknown said...: Nice article, thanks!
For the cracking-capability graphs, what parameters did you use to generate those? The look at bit extreme... that's why I'm wondering.; 4:31 AM
Anonymous said...: @Christian Fuchs

It is something like ((size of alphabet)^(size of password))/hashs_per_second

so it is an exponentially function; 7:34 AM
Unknown said...: Hehe, nice article!; 10:35 AM
Lo Sauer said...: And some background to Database SQL Injections used by Hackers to get to the data: http://www.lsauer.com/2012/06/internet-security-sql-injection-attacks.html

Meanwhile Jeremy might get some additional stats out of his processing tasks.; 12:06 PM
Anonymous said...: Scum posted 45 million hashes.
http://forum.insidepro.com/viewtopic.php?t=14887; 1:24 AM
Lo Sauer said...: Applying permutation entropy to LinkedIn password-hash files: http://goo.gl/hZG8S

PS:The Scum resource is interesting.; 4:59 PM
f8lerror said...: Great read, you could have used my PATH script to help with some automation. My script analyzes the results of a dictionary attack, sorts and generates the hashcat masks and feeds them into hashcat one at a time. http://infosecsee.com if you're interested.; 2:17 PM
Anonymous said...: No more Updates? Or are the calculations still running?; 10:48 AM
nyxgeek said...: I'm up to 2,195,118 total cracked using JtR and RockYou + rules.; 9:35 PM
V Care Crypto said...: This comment has been removed by a blog administrator.; 8:07 PM