GitHub is a key infrastructure website for the Internet, being the largest host of open-source projects, most famously Linux. (I host my code there). It's also a popular blogging platform.
Among the zillions of projects are https://github.com/greatfire and https://github.com/cn-nytimes. These are mirrors (copies) of the websites http://greatfire.org and http://cn.nytimes.com. GreatFire provides tools for circumventing China's Internet censorship, the NYTimes contains news stories China wants censored.
China blocks the offending websites, but it cannot easily block the GitHub mirrors. Its choices are either to block or allow everything on GitHub. Since GitHub is key infrastructure for open-source, blocking GitHub is not really a viable option.
Therefore, China chose another option, to flood those specific GitHub URLs with traffic in order to pressure GitHub into removing those pages. This is a stupid policy decision, of course, since Americans are quite touchy on the subject and are unlikely to comply with such pressure. It's likely GitHub itself can resolve the issue, as there are a zillion ways to respond. If not, other companies (like CloudFlare) would leap to their defense.
The big question is attribution. Is this attack authorized by the Chinese government? Or is it the work of rogue hackers?
Netresec could clearly identify that a man-in-the-middle was happening by looking at the TTL fields in the packets. TTL, or time-to-live, is a field in all Internet packets that tracks the age of the packet. Each time a router forwards a packet, one is subtracted from the field. When it reaches zero, the packet is discarded. This prevents routing loops from endlessly forwarding packets around in circle.
Many systems send packets with a starting TTL of 64. Thus, when a packet arrives with a value of 46, you know that that there are 18 hops between you and the sender (64 - 18 = 46).
What Netresec found was a situation shown in the following picture. This picture shows a sequence of packets to and from the server. My packets sent to the Baidu server have a TTL of 64, the starting value I send with. The first response from the server has a value of 46 -- because while they transmitted the packet with a value of 64, it was reduced by 18 by the time it arrived at my computer. After I send the web request, I get weird TTLs in response, with values of 98 and 99. These obviously did not come from the original server, but some intermediate man-in-the-middle device.
I know this man-in-the-middle is somewhere between me and Baidu, but where? To answer that, we use the concept of traceroute.
Traceroute is a real cool trick. Instead of sending packets with a TTL of 64, the tool sends them with a TTL of 1, then 2, then 3, and so on. Because the TTL is so low, they won't reach their destination. Instead, the TTL will eventually reach 0, and routers along the way will drop them. When routers do this, they send back a notification packet called a Time-Exceeded message -- using the router's Internet address. Thus, I can collect all these packets and map the routers between me and a target.
The tool that does this is shown below, where I traceroute to the Baidu server from my machine:
The second column is time. As you can see, it takes almost 80-milliseconds for my packets to reach Los Angeles, and then the delay jumps to 230-milliseconds to reach China. Also note that I can't quite reach the server, as there is a firewall after hop 16 that is blocking traceroute from working.
So where along this route is the man-in-the-middle interception happening? To answer this question, I had write some code. I wrote my own little traceroute tool. Instead of sending a single packet, it first established a connection with normal TTLs, so that it would reach all the way to the target server. Then, when it sent the web request packet, it used a smaller TTL, so it would get dropped before reaching the server -- but hopefully after the man-in-the-middle saw it. By doing these with varying TTLs, I should be able to discover at which hop the evil device is lurking.
I found that the device lurks between 11 and 12 hops. The web request packets sent with a TTL of 11 are not seen, while packets with TTL of 12 are, generating a response, as shown below:
The black line above shows the packet I sent, with a TTL of 12. The orange line (and the two packets above it) show the packets received from the man-in-the-middle device. When I send packets with a TTL of 11, I never get a response from that evil device.
By looking at the IP addresses in the traceroute, we can conclusive prove that the man-in-the-middle device is located on the backbone of China Unicom, a major service provider in China.
The next step is to traceroute in the other direction, from China to a blocked address, such as the http://www.nytimes.com address at 18.104.22.168. Using the website http://www.linkwan.net/tr.htm, I get the following:
This shows that the Great Firewall runs inside the China Unicom infrastructure.
Using my custom http-traceroute, I've proven that the man-in-the-middle machine attacking GitHub is located on or near the Great Firewall of China. While many explanations are possible, such as hackers breaking into these machines, the overwhelmingly most likely suspect for the source of the GitHub attacks is the Chinese government.
This is important evidence for our government. It'll be interesting to see how they respond to these attacks -- attacks by a nation state against key United States Internet infrastructure.