The file is simply a list of the raw hashes, containing no username or other account information. We can assume that the original hacker has that information, but since it's of little use to cracking, we don't have it ourselves.
The following shows the contents of the file my simply dumping the contents on the command-line "more combo_not.txt".
One issue becomes apparent: about half the file has had the first 5 characters zeroed out. This is discussed at ycombinator. Atom has released a version of his Hashcat password cracker to deal with this. John-the-Ripper apparently also has published a patch for this.
Special hashcat version released. Its able to crack the "masked" 3.5mil hashes in your 6.5mil hashlist, use -m 150 hashcat.net/files/hashcat-…
— hashcat (@hashcat) June 6, 2012
In the meanwhile, I'm going to split the file into two. To do this, I type "grep -v ^000000 combo_not.txt > linked.hashes". I'm running Windows, but I do this command in a Linux VM on the same machine. Unix command-line programs like "grep" are too darn useful.
The first thing hackers try is the "dictionary crack". Brute-force password cracking takes time, but quickly looking up passwords in a dictionary is very fast.
The most common dictionary to use is the "RockYou" dictionary. This was from a massive dump from a few years ago from a popular gaming site. Unlike LinkedIn, RockYou stored their passwords in the clear. Thus, it serves as a useful mega-dictionary to crack passwords with.
This took only a few seconds to run. Notice that it wasn't very successful, finding only 93 passwords. But we expected that. According to the discussion on YCombinator, the hacker had already run a dictionary crack, and marked those passwords with leading zeros.
So let's try the other file containing the corrupted hashes using the updated Hashcat feature on the other file, containing all the zeroed-out hashes. This is shown below:
As you can see, this straight dictionary lookup results in 688-thousand passwords being cracked, or about one fifth of all the zeroed hashes.
The reason for such a small hit rate is that the original hacker probably tried dictionary words plus mutations. So, I'm going to try that next. Instead of using mutations with the entire RockYou dictionary, I'm going to use smaller dictionaries, like the one that comes with Cain+Able, John the Ripper, common Facebook names, and English words. I'm going to use the "best64.rules" of common mutations that comes with Hashcat, though they are coming up with a better best64 list of mutations, I don't have that list handy at the moment to use.
[OOPS. I made a mistake and chose the wrong attack method, using "permutations" instead of "straight". This would run much faster if I turned this off. However, since it's already been running for a while, I'll just let it finish the job, and see if anything interesting comes out of it.]
Dictionary cracks are fine, but brute-force cracks are also useful. There is a limit to how fast you can do brute-force.
To brute force, I'm switching to "oclHashcat+" instead of normal "hashcat". This uses OpenCL acceleration on the GPU, and runs about 10 times faster, but it's more limited on the complexity of the mutation rules. For simple brute-force, it's very nice.
Let's start with 5 character passwords containing the full gamut of Upper/lower case, digits, and symbols/punctuation.
As you can see from the "Recovered:" line, it found zero passwords. Thus, this means that the original hacker already did this brute-force. That's reasonable, it only takes 16 seconds to run this.
Now let's try to brute-force 6 character passwords and see what results we get.
As you can see, whereas 5 characters got us nothing (they'd already been cracked), brute-forcing 6 characters go us 32,163 passwords.Below, I dump the output file showing some of these:
Note that you see "patterns" here that are purely an artifact generated by the password cracking program. The last letter is always 'a'. As we scroll down in the file, this progresses to 'b', then 'c', and so on. That's as you expect from a 'brute-force' cracker, which tries each letter in turn. But weirdly, the second letter is also 'a'. That's probably related to how password attempts are distributed among the GPU cores. A graphics processor has 2000 slow processors rather than a few fast processors, each one cracking a different set of passwords at a time.
What we see here is that a lot of these short passwords are still based on dictionary words with minor mutations, like "Ram0na". That means for longer passwords for which we cannot brute-force, we can maybe tweak the mutation rules more to do a better job.
Is 6 characters long for a password? From the RockYou file, that was the most common length of password, accounting for roughly 25% of all passwords.
Or, we can graph this by looking at that length or shorter:
The thing that most people don't understand about passwords is that brute-force is an exponential problem. The amount of time it takes quickly grows out of all reasonableness. I've created a graph of this below:
People have the misconception that massive increases in performance lead to massive differences in password cracking, but it doesn't really. Moving from my desktop processor to a GPU that's 20 times faster only slightly increases the length of password I'm able to brute-force. Even going to a 1000 instance Amazon EC2 cluster with super-computer performance doesn't dramatically increase password lengths that I'm able to crack.
Although, that difference happens to be in the "sweet spot" of password lengths, so maybe it can make a difference.
As you can see at this point, my cracking processes are running in the background, so I'm busy playing with Excel instead to produce these graphs. My main CPU is still churning away doing a mutated-dictionary attack on the "zeroed.hashes", and my GPU is busy with a 7-letter brute-force fo the "linked.hashes".
I started a job with the "best64.rules" from oclHashcat, using not just RockYou, but a few other wellknown dictionaries. This is what the command-line looks like at startup:
It spams the screen for a bit, but here's what it ends up at:
As you can see, in 12 seconds, it found 31k hashes, out of a list that had already been purged of all easily crackable passwords
Running the same "best64.rules" over the "zeroed.hashes" list using the "-m150" version of Hashcat leads to the following result:
This recovered about 40% of the zeroed passwords, or 1.4 million out of the 3.7 million. That's to be expected, as these are passwords that the hacker already found, so should be easily found by us.
I just saw this Tweet go buy about the original forum post:
By request: copy of the original forum post discussing the 6.5M 'LinkedIn' passwords dl.dropbox.com/u/3077/6.5kk_S… (dug out of my browser cache)
— ashkan soltani (@ashk4n) June 7, 2012
Oh, crap, I just found out that I'm using my older Radeon HD 6970 instead of the newer Radeon HD 7970. That's a big difference. Actually, it's good that way. I'll just stick the new card in another machine, and let it run around the clock without having to disturb my main machine.
An explanation of HashCat's modes.
First, you need to decide on "hashcat" vs. "oclHashcat-plus" vs "oclHashcat-lite". The first uses the CPU and has the most features. The last uses the GPU and has the best speed, but the fewest features, and can only crack one password at a time. The one in the middle, "oclHashcat-plus", has a good set of features and GPU speed, so it's the one I use the most. You also have a choice of 4 programs to choose from, either the Linux or Windows version, at either 64 or 32 bits.
The command-line parameters for "oclHashcat-plus" are:
- There are a lot of hashing algorithms. Currently, 27 different algorithms are supported. The number 100 represents "SHA1", the algorithm used for LinkedIn.
- --attack mode 3
- There are many attack modes. The most common are the "mutated dictionary attack" (1) and the "brute force" attack (3).
- --custom-charset1 ?l?u?s?d
- You have to configure your charset. You can configure different charsets for different letters, such as using upper-case only for the first letter, and a symbol-digit for the last letter. The symbol ?l means lower-case, the symbol ?u means upper-case, ?s means symbol/punctuation, and ?d means numeric digit. Using the "full" charset of all letters (upper/lower), digits, and symbols is 96 characters.
- --outfile combo_not.out
- This tells the program where to save the passwords it finds
- --outfile-format 2
- I like to just save the passwords alone, which is format number 2.
- After all the options, the next expected input is the file containing the hashes.
- Lastly, it expects the "mask" of patterns to try. To try for a 6 character password, use the mask "?1?1?1?1?1?1". The number '1' refers to "charset1" from above.
oclHashcat-plus64.exe --hash-type 100 --attack-mode 3 --custom-charset1 ?l?u?s?d --outfile combo_not.out --outfile-format 2 combo_not.txt ?1?1?1?1?1?1?1
I've been trying to debug something with oclHashcat. It appears that while the Radeon 7970 is 30% faster at cracking a single password (2-billion hashes/second) than the Radeon 6970 (1.3-billion hashes/second), it's slower at multi-hash when given the entire LinkedIn file, doing only 200-million/sec vs 400-million/sec. This seems wrong, because the newer card is has a much better memory subsystem than the previous card, and in other tests I've done, has been much faster at multi-password cracking.
Use --remove. Oh, I found the source of the problem: I was using original file on the faster processor, which is twice as large as the one cut down removed the "zeroed" hashes. Memory lookups on GPUs are slower with the larger amount of memory. Thus, shrinking the file makes a big difference in speed. I wonder if splitting the file into small chunks that fit better within the GPU cache might work better.
In the meanwhile, though, simply removing from the source file all the passwords found so far (about 2 million of them) seems to improve the speed by quite a lot
Hashcat has updated their tools to support the zeroed-out hashes (to ignore the first few bytes when matching SHA1 hashes). Info is here:
Still have fun with the linkedin list? Here is a GPU accelerated oclHashcat-plus special version with -m 150 support: hashcat.net/files/oclHashc…
— hashcat (@hashcat) June 7, 2012
StatusI left a bunch of stuff running over night, and have about 50% of all the passwords cracked. To summarize what I did:
First, I did a dictionary crack of some very large dictionaries. This took seconds, and got a large number of passwords. I'll rerun the numbers later, but it's like a third of all the passwords.
Second, I did a brute-force up to 6 characters. It appears LinkedIn has a minimum length of 6, so you won't find shorter passwords. This took 18 minutes. Going to 7 characters will take 3 days to complete, so I'm letting that run on a separate machine while I do shorter jobs on the main machine.
Third, I did "mutated dictionary" attacks. I used several basic dictionaries, such as the RockYou list, as well as the dictionaries that come with such tools as Cain+Able, John-the-Ripper (JtR), and a list of Facebook names. I ran through all the mutations in the "rules" directory that comes with Hashcat. This found quite a few new passwords not found by the other techniques.
Fourth, I'm doing what Hashcast calls a "hybrid" attack that combines a dictionary either prefixed or followed by a brute-force. For example, right now, I'm runnign a job that does all the words in the RockYou dictionary followed by six lower-case/digits/numbers.
The first jobs took little time, so I rapidly updated this blog post as I did every little thing. Since then, updates have been coming slower as the two computers spend more time crunching numbers.