The unique feature of Masscan is that it has it’s own TCP/IP stack, bypassing the kernel’s stack. This has interesting benefits, such as being able to maintain a TCP connection with all 30 million HTTPS servers on the Internet simultaneously. However, it means that (at the moment) it’s difficult to write your own protocols. At some point I’m going to add LUA scripting to the system and this technical detail won’t matter, but in the meanwhile, if you want to write your own protocols, you’ll have to know the tricks.
The issue Masscan solves is scalability, such as maintaining 30 million concurrent TCP connections. In a standard Linux environment, the system requires about 40 kilobytes per TCP connection, meaning a system would need 1.2 terabytes of RAM to hold all the connections. This is beyond what you can get for standard servers.
Masscan reduces this. At the moment, it uses only 442 bytes per TCP connection – including the memory for difficult protocols like SSL. That’s less than 16-gigabytes of RAM for 30 million concurrent connections.
This is a little excessive, because connections are quick. Even a fast scan of the Internet takes long enough that at any point in time, fewer than 100,000 connections are needed. Therefore, there is no technical reason why masscan should be so paranoid about reducing memory consumption. I do this way for trying out other things.
Masscan’s stack does no TCP reassembly. It does handle overlaps and ordering, but it doesn’t reassemble fragments.
Protocol parsers are written as “state-machines”. This means they don’t need reassembly. The state-machine pauses when it runs off the end of one packet and resumes where it left off at the start of the next packet.
The lack of reassembly conserves a lot of memory in the system, and increases speed. Instead of buffering incoming packets, waiting for the application to read the buffer, Masscan forces the application to parse packets immediately as it arrives, because the packet will be discarded immediately afterward.
All parsers are “state-machines” in theory. The way masscan does parsers is to make this explicit. The parser reads a stream of bytes from the input and parses then one-by-one. Each byte causes a transition in from one state to the next.
A model of this is the SSL parser. Put a breakpoint at the start of ‘ssl_parse_record()’ and run masscan under a debugger with the “—selftest” parameter. This will exercise the SSL protocol by sending a dummy packet to it. Simultaneously, look at Wireshark and how it decodes the initial SSL packet. In the debugger, you’ll see how masscan does this a byte-at-a-time in a state-machine fashion, eventually decoding everything Wireshark does, but in a very strange manner.
After the TCP connection has been established, the next step is the “hellos” from either side of the connection. Sometimes the server initiates this, as in the case of FTP, SMTP, SSH, and VNC. Sometimes the client initiates this, as in the case of HTTP and SSL.
Masscan waits three seconds before sending client-hellos, in case the server sends a hello first. That way, when scanning for SSL or HTTP, it can detect that the port is actually being used for SSH or VNC. In other words, when you scan for HTTP, you’ll get some SSH and VNC records in response.
The file “proto-banner1.c” currently contains the list of patterns in server-hellos, and the logic it will use in order to configure which protocol parser should handle a TCP connection.
The file “proto-ftp.c” is a good example of a server-hello protocol. If you just search everywhere for “FTP” in the source, you’ll see how to write a similar protocol for yourself. Yes, it’s an ugly hack that needs to be cleaned up.
For a client-hello protocol, then use HTTP as your example.
For simple “banner” checking, all you need is to either send or receive the hello. More complicated tasks require additional transmits, with back-and-forth exchanges with the server.
These exchanges are stateless. In other words, you write your TCP parser for the data coming back from the server without regard to what you think you’ve transmitted. All the state is on the server side.
The best example of this is the “proto-vnc.c” parser. It must do several back-and-forth exchanges with the server. You’ll see that at several points it must call the “tcp_transmit()” function when parsing the response in order to send a request to the server.
Long term direction
The reason the model sucks right now is because I’m working on adding LUA scripting in the long run.
Of all the scripting languages, it looks like LUA will have the least overhead per TCP connection.
Of all the scripting languages, it looks like LUA has the easiest support for “coroutines”. That means when a script calls “read()” to read bytes from the network, I can do a user-mode context switch. Thus, while LUA parsers appear synchronous, they are in fact asynchronous.
But of course, the real reason is to get nmap NSE compatibility.
This is a short guide for hacking your own protocol interactions into masscan. I seriously need to get working on the LUA integration, but in the meanwhile, this how you’d add stuff.
The best way is to contact me, describe your problem, then have me integrate a prototype for your protocol that you can then fill out at your leisure.