Sunday, February 04, 2007

BitTorrent curiousities

I've been playing around with BitTorrent again lately. One of the things that been nagging me is where "failed hashes" come from.

BitTorrent transfers files in smaller pieces, usually around 256kbytes, and double-checks them with their own hash (.torrent files are so big because they contain a list of all the hashes for all the pieces). Sometimes when you download a piece from a source, it fails the hash check, meaning it was corrupted.

One reason that's been documented on the web is that sometimes Internet devices have bugs that corrupt data. D-Link has a "gaming" mode that tries to fix some gaming protocols by correcting your NATted IP address. This means, in 4-billion bytes of random/compressed data, it will mistakenly see what it thinks is an IP address is needs to correct, thereby corrupting the chunk.

Another source of corruption is TCP. Its checksum doesn't always catch multi-bit errors. Therefore, it will report a packet as good that is actually corrupted.

Finally, one source I've found is that large chunks of a piece can be corrupted. I'm guessing that the file system on the disk drive of the sender got corrupted.

This points to two obvious changes that would be good for BitTorrent clients. The first is that senders should re-verify pieces when they send them (not just on reception) to see if they've been corrupted on the disk in the meantime. Second, clients can easily save the bad chunks and figure out why they got corrupted.

For example, a client could compare the bad chunk with the eventual re-download of a good chunk. It could run tests on the regions of the pieces that differ. The nice thing about the TCP checksum algorithm is that you can just run it over those regions: if the corrupted piece and good piece have the same TCP checksum, then it's a good chance that the reason the chunk was corrupted was because of a network problem.

Likewise, if an entire 4k portion was corrupted, it's likely a disk error. If a 4-byte part is different, then it's likely the D-Link bug.

2 comments:

a. said...

You said "For example, a client could compare the bad chunk with the eventual re-download of a good chunk. It could run tests on the regions of the pieces that differ.". What should it do, then?

Tell the user? Does she care? Would she take the time to understand the problem and do something about it (if she could and cared)? Don't think so.

There may be some small gains in trying to catch those errors. But in times where bandwith is cheap, why not just download the same chunk from somewhere else and ignore the one machine sending out bad data.

A better idea might be to report stations which send bad chunks to a central authority: "That guy is sharing some music files, but he sends it corrupted so there is no copyright violation. No need to call the cops on him".

I guess some users would like to switch on some feature like that. ;-)

Ryan Russell said...

There are groups out there that intentionally poison downloads. I believe they eventually get voted off the island in most torrent systems.

There was a better hashing arrangement, merkel hashes. Bram never implemented it though, and tended to squash any discussion by others.