Sunday, May 07, 2017

Hacker dumps, magnet links, and you

In an excellent post pointing out Wikileaks deserves none of the credit given them in the #MacronLeaks, the author erroneously stated that after Archive.org took down the files, that Wikileaks provided links to a second archive. This is not true. Instead, Wikileaks simply pointed to what's known as "magnet links" of the first archive. Understanding magnet links is critical to understanding all these links and dumps, so I thought I'd describe them.

The tl;dr version is this: anything published via BitTorrent has a matching "magnet link" address, and the contents can still be reached via magnet links when the original publisher goes away.


In this case, the leaker uploaded to "archive.org", a popular Internet archiving resource. This website allows you to either download files directly, which is slow, or via peer-to-peer using BitTorrent, which is fast. As you know, BitTorrent works by all the downloaders exchanging pieces with each other, rather getting them from the server. I give you a piece you don't have, in exchange for a piece I don't have.

BitTorrent, though still requires a "torrent" (a ~30k file that lists all the pieces) and a "tracker" (http://bt1.archive.org:6969/announce) that keeps a list of all the peers so they can find each other. The tracker also makes sure that every piece is available from at least one peer.

When "archive.org" realized what was happening, they deleted the leaked files, the torrent, and the tracking.

However, BitTorrent has another feature called "magnet links". This is simply the "hash" of the "torrent" file contents, which looks something like "06724742e86176c0ec82e294d299fba4aa28901a". (This isn't a hash of the entire file, but just the important parts, such as the filenames and sizes).

Along with downloading files, BitTorrent software on your computer also participates in a "distributed hash" network. When using a torrent file to download, your BitTorrent software still tell other random BitTorrent clients about the hash. Knowledge of this hash thus spreads throughout the BitTorrent world. It's only 16 bytes in size, so the average BitTorrent client can keep track of millions of such hashes while consuming very little memory or bandwidth.

If somebody decides they want to download the BitTorrent with that hash, they broadcast that request throughout this "distributed hash" network until they find one or more people with the full torrent. They then get the torrent description file from them, and also a list of peers in the "swarm" who are downloading the file.

Thus, when the original torrent description file, the tracker, and original copy goes away, you can still locate the swarm of downloaders through this hash. As long as all the individual pieces exist in the swarm, you can still successfully download the original file.

In this case, one of the leaked documents was a 2.3 gigabyte file called "langannerch.rar". The torrent description file called "langanerch_archive.torrent" is 26 kilobytes in size. The hash (magnet link) is 16 bytes in size, written "magnet:?xt=urn:btih:06724742e86176c0ec82e294d299fba4aa28901a". If you've got BitTorrent software installed and click on the link, you'll join the swarm and start downloading the file, even though the original torrent/tracker/files have gone away.

According to my BitTorrent client, there are currently 108 people in the swarm downloading this file world-wide. I'm currently connected to 11 of them. Most of them appear to be located in France.

Looking at the General tab, I see that "availability" is 2.95. That means there exist 2.95 complete copies of the download. In other words, if there are 20 pieces, it means that for one of the pieces in the swarm, only 2 people have it. This is dangerously small -- if those two people leave the network, then a complete copy of the dump will no longer exist in the swarm, and it'll be impossible to download it all.

Such dumps can remain popular enough for years after the original tracker/torrent has disappeared, but at some point, a critical piece disappears, and it becomes impossible for anybody to download more than 99.95%, with everyone in the swarm waiting for that last piece. If you read this blogpost 6 months from now, you are likely to see 10 people in the swarm, all stuck at 99.95% complete.

Conclusion

The upshot of this is that it's hard censoring BitTorrent, because all torrents also exist as magnet links. It took only a couple hours for Archive.org to take down the tracker/torrents/files, but after complete downloads were out in the swarm, all anybody needed was the hash of the original torrent to create a magnet link to the data. Those magnet links had already been published by many people. The Wikileaks tweet that linked to them was fairly late, all things considered, other people had already published them.

No comments: