Thursday, June 04, 2009

Why deep packet inspection is faster

Snort recently added a more complex NetBIOS, SMB, DCE-RPC protocol parser into its code. In other words, it added "deep packet inspection" (DPI) for these protocols.

This means Snort is now slower, right? If you've got an internal network full of these sorts of packets, shouldn't you be worried that your Snort boxes might be overloaded with this new deep-packet-inspection code?

Nope. Snort is now faster.

The reason is that deep packet inspection is actually FASTER than blindly searching traffic for patterns. The more you understand about the structure of a packet, the LESS work you have to do analyzing it for intrusions.

This was the curious thing we found with BlackICE/Proventia (the IDS/IPS that does more deep packet inspection than any competing product). As everyone knows, adding signatures to an IDS makes it slower. We found the reverse: as we added signatures, the product got faster. The reason was because as we added signatures, we also added more deep-packet-inspection logic. This then meant we needed to do less work later on, and the faster the product became.

This is why Snort still struggles at 1-gbps, whereas Proventia scales to 6-gbps: Proventia does more DPI.

Not all DPI will speed up code, of course. When DPI can be done in a single pass, then it will speed things up. Some DPI, though, requires you to backtrack, which further requires you to buffer old data so that you can backtrack to it. This is the case when looking for intrusions within Word documents. Also, decompression streams can be slow: a 1-gbps gzipped stream can easily expand out to 10-gbps worth of data. If you put Proventia in front of your servers sending out compressed HTTP traffic, you might want to turn off the decompression feature for that reason.

Also, a lot depends upon how you write your DPI logic. The Snort NetBIOS/DCE code isn't horrendously bad, but it's slower than it needs to be. For example, it uses the "ntohs()" function to swap bytes, which is a bad way of coding. Most DPI code, like that you find in e-mail servers, is a lot worse. That's why DPI is considered "slow", it's because most programmers don't write DPI code well.

UPDATE

Consider this rule I downloaded from EmergencyThreats.net.


alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (\
msg:"ET P2P ABC Torrent User-Agent (ABC/ABC-3.1.0)"; \
flow:to_server,established; \
content:"User-Agent\: ABC/ABC"; nocase; \
sid:2003475;)


This is blind to the HTTP protocol. It is slow, because it must search everything that goes across those ports. It's prone to false positives, because the pattern may exist for reasons unrelated to the original attack.

However, with hypothetical DPI extensions to Snort, you might write it like the following. Since it reduces the range of the pattern down to just that header field, it would be faster, and less prone to false-positives.


alert http $HOME_NET any -> $EXTERNAL_NET any (\
msg:"ET P2P ABC Torrent User-Agent (ABC/ABC-3.1.0)"; \
header.useragent:"ABC/ABC"; \
sid:2003475;)

8 comments:

Erik said...

I agree, I’ve had a similar experience when developing NetworkMiner. Well, NetworkMiner isn’t really an IDS, but I’ve still found that being able to parse the application layer protocol actually makes the traffic analysis less resource intensive (both in CPU and memory terms).

One issue that need to be solved when performing deep packet inspection is of course to know which protocol parser to throw at a particular TCP session. Just using the port number isn’t reliable enough.

Hence, there is a great need for PIPI (port independent protocol identification, a term coined by Richard Bejtlic) methods. One such method is SPID, which performs protocol identification (or application layer protocol classification) based on statistical measurements of various properties in a session. A proof-of-concept implementation of the SPID algorithm is available at SourceForge: http://sourceforge.net/projects/spid

Just drag-and-drop a pcap to it and see which protocols are being used. It works like a charm!

There is also a research paper available on the subject here:
http://spid.sourceforge.net/sncnw09-hjelmvik_john-CR.pdf

I guess many traditional pattern matching IDS’s also are in need of protocol identification, since that would allow them to limit the scope of what patterns to look for in a session depending on the application layer protocol.

Robert Graham said...

I've never taken Richard Bejtlich seriously. He rips on BlackICE/Proventia because it's not open source, but fails to understand anything the product does. It has had port-independent protocol identification since 1998, eight years before his blog post.

mokum von Amsterdam said...

The power of appreciation.

How really kind and good of you to point out the improvement within Snort and even pointing out ways to improve it even more. I am sure this will help the Snort devs more then anything.

Anonymous said...

excellent post .... just one comment ..Proventia scaling upto 6Gbps has lot more to do with with its custom hardware architecture ..but having said that it's still twice as fast as snort...

Martin said...

This "hypothetical" extension is the current Snort HTTP preprocessor which allows this rule to be written like:

alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (\
msg:"ET P2P ABC Torrent User-Agent (ABC/ABC-3.1.0)"; \
flow:to_server,established; \
content:"User-Agent\: ABC/ABC"; nocase; \
http_header; sid:2003475;)

I've been urging the folks on the EMERGING (not "emergency") threats.net list to take advantage of the HTTP preprocessor for exactly the reasons you have described. When running the HTTP preproc, which has been enabled by default for a long time, all packets on port 80 and any other ports you specify (yes, PIPI would be nicer) get parsed as HTTP requests and responses. So, using plain old content matches means you are double matching the packets, first to parse HTTP, then to content search. When using the "http_header" content modifier, the content search is restricted to the few bytes parsed as the HTTP header and is therefore much faster and less prone to false positives.

So, when are you going to post the Proventia code for others to review, a luxury you have been graciously guaranteed by the authors of open source software? How about the signatures? I'd love to return the favor you've done the Snort community by providing some constructive criticism for you.

Robert Graham said...

BlackICE/Proventia code is the property of IBM. I do not even have a copy of it. I have not seen it in three years. One of the problems of proprietary code is that when you leave a company, you lose access to all the code you've worked hard on for years. It's a painful experience. It feels like part of your brain is missing. When trying to solve a problem, I know where in the Proventia code I've solved something similar before, but I can't simply go reference it.

Matthew Watchinski said...

Just an FYI, http_inspect hands the detection engine the http_header keyword. This limits the detection to just HTTP headers. While not as cool as http_header.useragent = blah, it anchors matching to just http headers.

Also I've got plenty of sensors running snort doing more than 6Gbps

Richard Bejtlich said...

Hey Robert, thanks for the love. I seem to have forgotten my "rips on BlackICE/Proventia because it's not open source." If you can cite anything, I'd be interested in seeing what I've said.