Friday, July 05, 2013

Scanning the Internet

Long time readers might remember the Errata Project to map the Internet. We started in October of 2011 and we are still at it. I wanted to give a few updates:
  • We have moved the process to a new, more powerful host, 216.75.60.203. The two previously given IPs (216.75.60.94, 66.240.192.147) are no longer being used for this project.
  • We have mapped the space we intended and are now on a second round to update any changes. We are hoping to make this data available by the end of the year.
  • We are looking for interesting ways to visualize the data, if you have suggestions I would love to hear them. 
  • We have collected nearly 600 gigs worth of data so far. 
  • World events are making me feel like an old man visiting the neighborhood I grew up in: "Look at that block, before the revolution it was all HPUX servers, now its all Dell Blade servers with Win2K8...progress they say..." *shakes fist in the air*


3 comments:

Jonathan Quimbly said...

Is it all via nmap? I have three ISPs, one home and two colos, all proscribe port scanning. How are you dealing with any such limitations?

On visualization, I'd really like to see which regions leave what ports exposed, just to learn whether what the uptake rate of newer pre-blocked routers is. Also, a global heat map indicating where web hosts are concentrated. A similar map of hosts with open proxy ports would be interesting. Also, are there any DEC systems still connected to the 'net? Trivia would be fun.

Consider ingesting the data into a db like SQLite, and making it available via torrent, to share with your friends. :)

Philip said...

I'm sure you would have seen the Internet Census 2012 website:
http://internetcensus2012.bitbucket.org/
It's currently down unfortunately, but should be back in a few hours.

The way they map out the IPv4 address space is really neat, using the Hilbert Curve. It groups nearby addresses together so that if, for example, you colour code things depending on certain criteria, as you zoom out you can give a summary of an address block more easily. More info here:
http://en.wikipedia.org/wiki/Hilbert_curve

XKCD made a version, too...
http://xkcd.com/195/
... and as he points out in the mouseover text, the IPv6 space is a little more complex!

Scotty said...

At the university I work at we have considered tailoring border firewall responses for our /16 so we draw pictures in the data for projects like these when visualized. You'll need to let us know the details of the visualization when you figure it out.

Here is a video of some of the real-time network visualization we do if you don't mind watching some videos that might put you to sleep.
https://www.youtube.com/watch?v=iZ4SpDVagtU