Wednesday, April 01, 2015

Pin-pointing China's attack against GitHub

For the past week, the website "GitHub" has been under attack by China. In this post, I pin-point where the attack is coming from by doing an http-traceroute.

GitHub is a key infrastructure website for the Internet, being the largest host of open-source projects, most famously Linux. (I host my code there). It's also a popular blogging platform.

Among the zillions of projects are https://github.com/greatfire and https://github.com/cn-nytimes. These are mirrors (copies) of the websites http://greatfire.org and http://cn.nytimes.com. GreatFire provides tools for circumventing China's Internet censorship, the NYTimes contains news stories China wants censored.

China blocks the offending websites, but it cannot easily block the GitHub mirrors. Its choices are either to block or allow everything on GitHub. Since GitHub is key infrastructure for open-source, blocking GitHub is not really a viable option.

Therefore, China chose another option, to flood those specific GitHub URLs with traffic in order to pressure GitHub into removing those pages. This is a stupid policy decision, of course, since Americans are quite touchy on the subject and are unlikely to comply with such pressure. It's likely GitHub itself can resolve the issue, as there are a zillion ways to respond. If not, other companies (like CloudFlare) would leap to their defense.

The big question is attribution. Is this attack authorized by the Chinese government? Or is it the work of rogue hackers?

The company Netresec in Sweden partially answered this problem by figuring out most of the details of the hack. The way the attack worked is that some man-in-the-middle device intercepted web requests coming into China from elsewhere in the world, and then replaced the content with JavaScript code that would attack GitHub. Specifically, they intercepted requests to Baidu's analytics. The search-engine Baidu is the Google of China, and it runs analytics software like Google in order to track advertising. Everyone outside China visiting internal pages would then run this JavaScript to attack GitHub. Since the attack appears to be coming "from everywhere", it's impractical for GitHub to block the attack.

Netresec could clearly identify that a man-in-the-middle was happening by looking at the TTL fields in the packets. TTL, or time-to-live, is a field in all Internet packets that tracks the age of the packet. Each time a router forwards a packet, one is subtracted from the field. When it reaches zero, the packet is discarded. This prevents routing loops from endlessly forwarding packets around in circle.

Many systems send packets with a starting TTL of 64. Thus, when a packet arrives with a value of 46, you know that that there are 18 hops between you and the sender (64 - 18 = 46).

What Netresec found was a situation shown in the following picture. This picture shows a sequence of packets to and from the server. My packets sent to the Baidu server have a TTL of 64, the starting value I send with. The first response from the server has a value of 46 -- because while they transmitted the packet with a value of 64, it was reduced by 18 by the time it arrived at my computer. After I send the web request, I get weird TTLs in response, with values of 98 and 99. These obviously did not come from the original server, but some intermediate man-in-the-middle device.


I know this man-in-the-middle is somewhere between me and Baidu, but where? To answer that, we use the concept of traceroute.

Traceroute is a real cool trick. Instead of sending packets with a TTL of 64, the tool sends them with a TTL of 1, then 2, then 3, and so on. Because the TTL is so low, they won't reach their destination. Instead, the TTL will eventually reach 0, and routers along the way will drop them. When routers do this, they send back a notification packet called a Time-Exceeded message -- using the router's Internet address. Thus, I can collect all these packets and map the routers between me and a target.

The tool that does this is shown below, where I traceroute to the Baidu server from my machine:


The second column is time. As you can see, it takes almost 80-milliseconds for my packets to reach Los Angeles, and then the delay jumps to 230-milliseconds to reach China. Also note that I can't quite reach the server, as there is a firewall after hop 16 that is blocking traceroute from working.

So where along this route is the man-in-the-middle interception happening? To answer this question, I had write some code. I wrote my own little traceroute tool. Instead of sending a single packet, it first established a connection with normal TTLs, so that it would reach all the way to the target server. Then, when it sent the web request packet, it used a smaller TTL, so it would get dropped before reaching the server -- but hopefully after the man-in-the-middle saw it. By doing these with varying TTLs, I should be able to discover at which hop the evil device is lurking.

I found that the device lurks between 11 and 12 hops. The web request packets sent with a TTL of 11 are not seen, while packets with TTL of 12 are, generating a response, as shown below:


The black line above shows the packet I sent, with a TTL of 12. The orange line (and the two packets above it) show the packets received from the man-in-the-middle device. When I send packets with a TTL of 11, I never get a response from that evil device.

By looking at the IP addresses in the traceroute, we can conclusive prove that the man-in-the-middle device is located on the backbone of China Unicom, a major service provider in China.

The next step is to traceroute in the other direction, from China to a blocked address, such as the http://www.nytimes.com address at 170.149.168.130. Using the website http://www.linkwan.net/tr.htm, I get the following:


This shows that the Great Firewall runs inside the China Unicom infrastructure.

Conclusion

Using my custom http-traceroute, I've proven that the man-in-the-middle machine attacking GitHub is located on or near the Great Firewall of China. While many explanations are possible, such as hackers breaking into these machines, the overwhelmingly most likely suspect for the source of the GitHub attacks is the Chinese government.

This is important evidence for our government. It'll be interesting to see how they respond to these attacks -- attacks by a nation state against key United States Internet infrastructure.

13 comments:

Finn said...

Planning on posting the code for the http traceroute tool?

Unknown said...

feels ashamed as a software engineer from China

Michael Tyler said...

It's an interesting post.

sydflyer said...

Great Post!!!!

danielkza said...

Just a note: GitHub only hosts a mirror for the Linux kernel, the original host is kernel.org

Sudsy said...

"It's choices are either to block or allow" ->

"Its choices are either to block or allow"

Anonymous said...

http://greatfire.com Should be corrected to http://greatfire.org

Unknown said...

@Def Abc: I lived in Shanghai for almost 6 years, and China produces some of the best software engineers in the world. Nothing to be ashamed of.

You're not alone, many of us feel the same way about our government. It's extremely hard for one person to influence the actions of their government. Therefore we shouldn't blame ourselves for those actions.

Robert Graham said...

Yea, @Def Abc and @Richie Thomas:

Politicians are retards, as are our fellow citizens. Nothing to be ashamed of. Praise what your country does well, but also be vocal about what you oppose.

Unknown said...

Would you release your modified http traceroute tool? It would come in helpful in detecting other attacks like this

RDM said...

@richie Thomas yes, that's gov'actions not mine, we love the world and github

newgoat said...

Trying to understand how this works.
With TTL 12, you request packet reaches baidu server, which sends reply back with normal TTL of 64. On the way, that reply packet was modified by the middle man. Where in this process can you tool determine the IP of the middle man?

Anonymous said...

Great work!
By the way, did you consider other "good" mitm-like techniques that result in ttl weirdness?

E.g.
1. TCP syn cookies in firewalls,
- first TTL will be from FW, al the rest - from server.
2. TCP offload in NICs - SYNs TTL set by OS, data transfer ttl's set by driver etc.