The same principle needs to be applied to parsing network protocols. Incoming network packets are exposed raw to hackers on the Internet. Hackers break into machines because inexperienced programmers make common mistakes, like buffer overflows.
This is the problem with Silent Circle, a company who makes a product for encrypting phone calls. Their crypto algorithms are designed by crypto experts, such as Phil Zimmerman (of PGP fame). Their network parsers, though, are designed by inexperienced coders.
Silent Circle released their code yesterday. I haven’t found an exploitable flaw yet, but the first file I looked at demonstrated common parser mistakes. I thought I’d document them in this post. Specifically, I’m referring to their SDP parser. SDP is part of SIP, which is used to establish phone calls (call, hangup, etc.), but doesn’t itself carry the data.
char vs. unsigned char
Their code refers to the packet as “char *buf”. This is wrong. The packet contains “bytes” not “characters”. It should be “unsigned char *” or “uint8_t *” instead.
SDP is a text-based protocol. That implies to programmers that it contains “characters”. But that’s not a requirement, just a convention. Hackers don’t have to follow that convention, and can insert any sort of bytes they want into the packet.
The problem is that “char” is not defined to be “signed” or “unsigned”. When “char” is signed, it sign extends. This causes 32-bit values to appear when the code is only expecting 8-bit values. A good example is the call to “islower(*buf)” on line 196 of their code. If “*buf” is 0x80, then it sign extends to 0xffffff80. This can cause weird things to happen.
Some static code analyzers check this. Some dynamic stuff also checks this. If you were to run this SDP parser under Microsoft Visual Studio and then fuzz this protocol, it’ll throw an assertion on the call to “islower(0xffffff80)”.
A lot of programmers will claim that I’m being pedantic, that this isn’t really a problem. Compilers can be set to force “char” to be unsigned anyway, so it’s “good enough”. That’s the same thinking pedantic cryptographers have been fighting for years. This is the thinking behind most crypto failures. Likewise, protocol experts have a lot of experience when “char*” isn’t good enough.
The “const” keyword is rarely used in their code. This is wrong. Most pointer parameters to should have it. Functions should avoid changing their input. The “const” keyword enforces this.
Non-const inputs lead to code instability. Over time, a programmer may decide to change input in a function, while caller depends upon the parameters not changing. Thus as a general principle of good code, more “const” is needed.
This is especially true for packets. Packet data read from the network should be immutable. Parsers should not change the data as they parse it. Again, this is one of those pedantic things: in the past, there are lots of known problems that arose because packet data was changed unexpectedly, so we’ve developed the principle that it shouldn’t change.
Thus, the correct type of packet bytes is "const unsigned char *", or equivalent things like "const uint8_t *". Everything else is incorrect.
Protocol parsing is CPU independent. There is never a reason to do something different depending upon the CPU.
Consider line 96 in their code that says the following:
#if !defined(_WIN32) && !defined(__SYMBIAN32__) //if not arm
The code is trying for an optimization here, to compare 4 bytes at once rather than 4 separate single byte compares. The problem is that same RISC processors crash when given unaligned data. Hence the conditional compile, either the fast way for some CPUs or the slow but safe way for others.
This is monumentally stupid. I don’t want to sound harsh, but this optimization is so tiny it makes no measurable difference ever. All it does is destabilize the code for no good reason. It makes testing the code a lot harder. It allows subtle bugs to creep through testing as things behave differently on different platforms.
Per-OS, per-compiler, per-CPU conditional compiles are NEVER needed for protocol parsing. If you use them, then you are doing something very wrong.
Casting and byte-swapping
The mistake of doing an #if conditional above comes from trying to fix an underlying mistake of casting between “char*” and “int*’. This is a common technique taught in your “UNIX Network Programming” class. It’s also wrong. You should never do it when parsing packets.
There are two reasons to avoid this. The first is that (as mentioned above) some CPUs, such as SPARC and some versions of ARM crash when referencing unaligned integers. This makes network code unstable on RISC systems, because most integers are usually aligned anyway, meaning a lot of alignment issues escape undetected into shipping code. The only way to make stable code is to stop casting integers in network (or file) parsers.
The second problem is that it causes confusion with byte-order/endianess that doesn’t happen if you just don’t cast integers. Consider the IP address “10.1.2.3”. There are only two forms for this number, either an integer with the value of 0x0a010203, or an array of bytes with the value 0a 01 02 03. The problem is that little endian machines are weird. The integer 0x0a010203 is represented internally as 03 02 01 0a on x86 processors, with the order of bytes “swapped”.
But this is just an internal detail that YOU NEVER NEED TO WORRY ABOUT. As long as you never cross the streams and cast from a “char*” to an “int*” (or the reverse), then the byte-order/endianness never matters.
If your first language was anything other than C/C++, then you are probably scratching your head wondering what the fuss is. Since your language doesn’t allow you to cast between “char*” and “int*”, you’ve never been confused with byte-order/endianess.
On line 119 their code calls ‘reverseIP’ to byte swap the IP address. That’s a clear sign that things aren’t getting coded right.
The only time to byte-swap is right at the moment of using the Sockets interface. That’s because Sockets itself was written wrong back in the early 1980s. Thus, the only time you byte-swap IP addresses is when you do something like “sin.sin_addr.s_addr = htonl(my_ip_address)”.
Follow the spec
Early Internet doctrine was to write code that followed the spec “close enough”. This was good in developing the protocols themselves, but was bad for long-term interoperability as popular products proliferated using wrong techniques, that all products then had to unofficially support.
Being liberal is allows hackers to inject a lot bad packets into the system that will nonetheless work. This allows them to reach vulnerable code that would otherwise have been rejected. Thus, critical software (like telephone encryption) needs to be conservative about what they accept, not liberal.
Consider the line from SDP like the following:
c=IN IP4 126.96.36.199/127
The SilentCircle parser checks for the ‘c’, then skips the next character whatever it is. Thus, the following line is allowed:
crIN IP4 188.8.131.52/127
Silent Circle needs to more formally follow the SDP spec at RFC 4566, rather than coding "whatever works".
Update: In the comments to this post, somebody proves I'm wrong on this point, because while '=' isn't check in the expected location in the code, it's checked elsewhere. Thus, my "crIN" example won't actually work.
Some good stuff
The SilentCircle code isn’t all bad. They do some things right.
One thing they get right is the “concreteness” of their parser. Protocol specifications are ugly, and code should be written to match the specification. That means protocol parsing code should be ugly. There’s nothing more annoying than debugging protocol parsers that have been prettied up, where I can't match up things in the spec and the packes with what I see in the code.
In school, your professors stressed that code should have “no magic numbers”. They were wrong: protocol parsers should be full of magic numbers. Consider the line 255 in the SDP code where is references the letter ‘c’. What does ‘c’ mean? Shouldn’t we instead use a #define somewhere, like:
#define SIP_SDP_CONNECTION_INFO 'c'
Wouldn’t that help make the code more self documenting? No! it wouldn’t! The spec says “c” so the code should too. Otherwise, I have to hunt all over the place to match things up.
My point is this: other’s might criticize the code for being ugly, but it’s not. It’s as pretty as it should be.
The other good thing the code does is correct coupling and cohesion. All this module does is parse the data, but it doesn’t really act on the data. That’s the correct feature of protocol parsers. Parsers are the external attack surface exposed to hackers. You want to keep that bit as independent from the rest of the system as possible. The SDP file has only two includes. That’s awesome.
The Silent Sircle protocol parsers aren’t all bad. Nor have I found a bug. It’s just that the first file I looked at, the SDP parser, demonstrates inexperience on the part of the coder. Good crypto needs experienced cryptographers, good network parsing code likewise needs experienced coders.