In my previous post, I pointed out that the claims of a new TCP DoS are probably true. Like the researchers that discovered the issues, I too have been playing around in TCP stacks, and I find weirdness.
One thing that has annoyed me recently is the way that stacks abuse the "selective ack" feature. In the past, the receiver would only acknowledge the continuous data received. If a packet were lost on the network, and a gap appeared, the sender wouldn't know the fate of the packets after the discontinuity. This was solved with "selective acks", where the receiver could say "I received your first 100,000 bytes, and bytes 101,000-108,000, but I'm missing those in between". This increased the speed at which TCP stacks could recover from lost packets and retransmit the necessary data.
I'm seeing something unexpected, though. When clients are downloading large files, the servers aren't immediately retransmitting the lost packets. The file download past the gap continues for a very long time. So far, I've seen the file download continue for 3-megabytes before the server goes back and fills in the gap. This can be easily 20 seconds later.
This annoys me because my network monitoring tools like Ferret have to buffer all that data. My TCP stack has to process data in-order, so I have to buffer 3-megabytes until I can process the retransmitted packet.
In the old days, kernels had a fixed amount of buffer space, usually not very big. They wouldn't be able to buffer 3-megabytes like this. This implies that the kernel is allocating more memory on demand, controlled by the other side of the connection.
This suggests a way that I can bluescreen a desktop computer. I would have a user follow a link to download a simple webpage. I then intentionally miss a packet as I send data in response. As the client selectively acknowledges my data, I continue streaming forever, but I never fill in that gap. A well-designed TCP stack would put a limit on how much memory it would allocated. A poorly designed stack would allocate data until it ran out, at which point the machine would likely crash the next time another kernel process needed more memory.
When an application runs out of memory, it can use "virtual memory" paged to disk. The kernel cannot. If it runs out of a virtual memory, just the application will crash. When the kernel runs out of memory, it bluescreens.
I could do the same thing with a server. I could connect to the web server, send a few bytes, "drop" a packet, then continue to stream data forever after that. An Apache web-server will only accept the first 16-kilobytes, but I don't care, because the kernel hasn't delivered the first 16-kilobytes yet - it is waiting for me to retransmit that missing packet before all the data goes up the stack to Apache.
I can't explain why the server is taking 20 seconds to retransmit data. One idea is that operating systems have specialized "sendfile" functions that hand off the kernel the responsibility for sending the contents of a file across the network socket. Maybe the reason it takes so long is that the missing packet isn't buffered in memory: the kernel has to re-read the data from the disk in order to re-transmit it. On a busy file server, this can take many seconds. If my theories are true, I could DoS a server by forcing it to go back and retransmit lots of 1-byte chunks all over the disk. I could cause the disk heads to grind away for very low bandwidth.
I'm not sure how selective-acks work with other mechanisms. In order for a TCP stack to acknowledge a "FIN" flag is to acknowledge the next byte after the flag. I can do that with selective-acks, without acknowledging the data right before it. This puts the TCP state machine into a weird place. One part knows that the FIN has been received and behave accordingly, but another part is still trying to retransmitted the data. This conflict wasn't possible with old stacks because acknowledging the FIN also acknowledged all the data up to it.
I find selective-acks annoying and I'm just writing a simple network monitoring application. I'm sure they cause stack designers a lot more headaches, and that if I write an active stack, I could cause a lot of problems. That's why I believe those researchers when they say they have found problems.