Thursday, August 20, 2015

A lesson in BitTorrent

Hackers have now posted a second dump of Ashley-Madison, this time 20-gigabytes worth of data. Many, mostly journalists, are eagerly downloading this next dump. However, at the time of this writing, nobody has finished downloading it yet. None of the journalists have a complete copy, so you aren't seeing any new stories about the contents. It promises the full email spool of the CEO in the file name, but no journalist has yet looked into that mail spool and reported a story. Currently, the most any journalist has is 85% of the dump, slowly downloading the rest at 37-kilobytes/second.

Why is that? Is AshMad doing some sort of counter-attack to stop the downloaded (like Sony did)? Or is it overloaded because too many people are trying to download?

No, it's because it hasn't finished seeding.

BitTorrent is p2p (peer-to-peer). You download chunks from the peers, aka. the swarm, not the original source (the tracker). Instead of slowing down as more people join the swarm to download the file(s), BitTorrent downloads become faster -- the more people you can download from, the faster it goes.

But 9 women can't make a baby in 1 month. The same goes for BitTorrent. You can only download chunks from peers if they've got all the chunks. That's the current problem with the AshMad dump: everyone combined has only 85% of all possible chunks. The remaining 15% of the chunks haven't been uploaded to the swarm yet. Nobody has a complete copy. The original tracker is seeding at a rate of 37-kilobytes/second, handing off the next chunk to a random person in the swarm, who quickly exchanges it with everyone else in the swarm.

Thus, we see something like the following image, where everyone is stuck at 85% download:


It'll take many more hours until this is complete.

I point this out because it's a useful real-world lesson for BitTorrent. Peer-to-peer speeds up downloads in ideal cases, but it can't overcome physics. Physics, in this case, means that nobody yet has a complete 100% copy, so nobody else can download one.

2 comments:

Unknown said...

Where can we find the actual 2nd data dump?

Gary Horn said...

So, if nobody has the entire data set, does that mean that somebody is extracting/exfiltrating the data from AshMad real-time, and then copies are being made from that real-time extraction/exfiltration?