Tuesday, January 14, 2014

Masscan: diagnosing slow transmit speeds

My port scanner (masscan) can easily transmit at full 1-gbps speeds, even from a slow laptop. Yet, often when you run it, it fails to run that fast. Instead of 1.488 million packets/second, it does only 0.2 million packets/second. Why is that?

There are lots of reasons, but the biggest is probably "flow control" on the Ethernet switch.


First, you have to understand that the transmit function is "blocking". That means if your Ethernet card is slow, then masscan is slow. If masscan tells the Ethernet hardware to transmit a packet, but the hardware can't, then masscan will stop an wait until the Ethernet hardware is ready. This is a good design -- if masscan were to continue regardless, then those packets would just be dropped rather than being sent, and your results would be incomplete. (I didn't explicitly design it this way, it's just how the libpcap transmit function works).

You can emulate "non-blocking" speed by adding the "--offline" parameter to masscan. In this case, masscan doesn't send the packets at all -- it just does everything but send. The "--offline" parameter acts as a good benchmark for testing how fast masscan could run if the network stack/hardware weren't the bottleneck. If you are using a slow notebook computer or an ARM notebook (like a Chromebook), you might want to do this benchmark just to make sure things will run as fast as you expect. Masscan does a lot of 64-bit processing and integer devision, which can be slow on low-end processors.

So after doing these steps, we've determined it's the Ethernet hardware that's blocking. Well, why should this be? Shouldn't the Ethernet hardware be able to keep up with full speed? Well, while Ethernet is supposed to be dumb and just forward packets, it's gotten a lot smarter over the years, offloading processing on both transmit and receive. It can calculate and verify checksums for you. It can reassemble incoming TCP streams. It can do a lot. All of this extra processing is bad for masscan. If you are trying to diagnose errors, you should turn off all the offloading features. The tool "ethtool" can be used for this purpose.

The most important smart feature of Ethernet, though, is "flow control". When this feature is enabled, the switch informs your computer that it's overloaded, causing your Ethernet to slow down -- which causes masscan to slow down. In other words, the reason masscan is slow isn't anything on your computer itself, but in the switch.

There are two reasons why the switch might tell your computer to slow down. The first is because there really is congestion inside the switch, and that if you don't slow down, the switch is going to drop the packets. In this case, flow control is good.

The other reason that the switch is lying, there is no congestion problem. Normal network communications uses large packets, meaning low packet rates. High rates of small packets are extremely unusual. Network equipment doesn't handle weird things well. I've verified that a switch will send flow control messages at max speed even when it's not overloaded. 

The way to "fix" this is to turn off hardware flow control, either using 'ethtool' to reconfigure the driver, or reconfiguring the switch. If the switch is just lying to you, then this is good fix, and your packets will get through. If the switch is telling the truth, then this is bad fix, because your packets will be dropped by the switch and go nowhere.

In order to tell if your packets are getting through, you need to look upstream, such as SNMP monitoring of your switch, or the router upstream from your switch. If these tools report the same transmit rate that masscan reports, you are likely okay. Note, though, that lots of weird things happen. For example, one ISP reversed the incoming/outgoing numbers on us. Another ISP uses load balanced routing, so the rates reported by both routers had to be combined together. Frankly, getting good numbers out of the upstream network equipment is, in itself, a big debugging hassle.

Masscan only works in terms of "packets-per-second". It doesn't give you the option to specify "bits-per-second". The reason is that it impossible to actually throttle at bits-per-second. The bits sent onto the local Ethernet wire are not the same bits that'll be sent by your ISP upstream to the Internet. Different links have different per-packet overhead -- I have no idea what your ISP's per packet overhead will be. This matters for masscan because the packets are usually smaller than the overhead (unlikely normal communications where the overhead is only a few percentage of the overall traffic). This is explained in this blog post (which you should really read). The upshot is that when you transmit at the max gigabit speed of 1.488 million packets/second, you are only transmitting around 476-mbps of Internet traffic onto the local wire. What your ISP will report as your "bandwidth" being used is likely somewhere between 476-mbps and 1000-mbps.

Just because your machine has a gigabit Ethernet adapter doesn't mean that it's running at a gigabit. In lots of situations, the switch has decided to downgrade your connection to only 100-mbps. I've seen people complain that they are only getting 47-mbps or 148,000 packets/seconds -- nice round numbers that strongly indicate that they are really connected at fast-ethernet instead of gigabit-ethernet.

Virtual machines (VMs) add another wrinkle. The virtualization process adds more per-packet overhead to transmission, reducing transmission rates from 1.488 million packets/second down to around 500,000 packets/second. This depends on the virtualization and the hardware, of course. In theory, Intel has a solution for their cards that completely virtualizes the hardware, such that a virtual machine can directly transmit packets without involvement of the hypervisor, and hence transmit at full speed. I've never seen such a system, but they theoretically exist.

Ethernet on USB is really slow, especially when transmitting small packets. Even though USB 2.0 is nominally 470-mbps, USB Ethernet will struggle even at 100-mbps speeds. It depends heavily on which USB drivers/hardware you have and which Ethernet drivers/hardware you have. If you have USB Ethernet, but masscan appears to be running fast, then it's likely the drivers are silently dropping packets instead of forcing masscan to block. I mention this because new "ultrabooks" don't have Ethernet ports, so may be using USB Ethernet. Also, the Ethernet port on the Raspberry Pi is actually connected via internal USB, so has this class of problem.

WiFi is pretty weird. There are a few issues to be aware of. The first is that masscan isn't WiFi aware. It'll actually work, but because your laptop changes the Ethernet packets into WiFi packets for you underneath. This may introduce problems, causing slow operation, or fast operation because packets are actually being dropped. My Macbook gets 47,000 packets/second on 802.11n whose connection speed claims to be 150-mbps.

By the way, beware NAT. When you go through WiFi you are also probably going through NAT, or network address translation. Each SYN packet masscan sends creates a new TCP/IP connection entry in the NAT tables. These tables are fairly small, designed for homes and small offices where 100,000 simultaneous connections would be considered an unreasonably large amount. At typical WiFi speeds, Masscan will fill up these tables in two seconds, probably causing your NAT to crash. 

In the above post, I quote 1.488 million packets-per-second because that's the theoretical maximum for gigabit Ethernet. In theory, masscan can go even faster, doing millions of packets-per-second. Therefore, you'd expect the transmit rate reported by masscan to be exactly this number. In practice, though, it's often slower, only around 1.35 million packets-per-second. What's going on? I don't know. I've swapped out the 1-gbps card with a 10-gbps to confirm that yes, in theory, higher rates are possible. I suspect that there's something in the Linux operating system that just doesn't like hitting max speed, throttling you back to only 95% max speed.

Lastly, let's talk about operating systems. The most common platform is Linux. I've tried lots of different hardware/drivers, from Intel, Realtek, and Atheros, and they all can do full gigabit speeds, even on slow laptops. As I mention above, things get weird at 100% max speed, but virtually anything can reach 95% max speed.

I don't know how fast things are on Mac OS X. I assume you'll get gigabit performance. I should probably benchmark this and report the results. Note that on Macbooks, using USB Ethernet is bad, but Thunderbolt Ethernet should be just as fast as any built-in gigabit Ethernet adapter.

For Windows, things get really weird. Just running the normal way gets only 30,000 packets/second, which is unreasonably slow. Using the "--sendq" parameter to send packets in batches increase this to 300,000 packets/second, which is still bad. Using "--source-ip" to spoof a separate IP address, with the "--sendq" parameter, gets 1.3 million packets/second. In other words, Windows is as fast as Linux in theory, but in practice, it's got problems you have to struggle to get beyond.

Summary

Today's computers are fast enough that even 10-gbps speeds are possible, but as this post shows, even 1-gbps speeds can be a problem. It comes down to the fact that today's networks are built/tested for large packets, and sending lots of small packets is unusual. Unusual things break networks. While sometimes transmitting at max speed is as simple as setting the "--rate 1500000", at other times, you'll have to play with things to get masscan running that fast.





No comments: