Wednesday, July 02, 2008

Code auditing: the abstract vs. concrete

[This issue comes up often enough I thought I'd write it down in a blog post]

Among the services Errata provides is "code auditing", where we read the source code (and/or binary code) of a product looking for security flaws.

Hackers think different than coders. Hackers deal with the concrete, coders with the abstract. This difference is at the root of hacking.

Part of the indoctrination of coders is to beat "concrete thinking" out of them. This is why universities teach very abstract languages like LISP, or object-oriented coding like C++ and Java.

The consequence of this is that coders rarely understand how their code actually works. This is why Java is so often unbearably slow - the coders don't understand what the software is really doing. The biggest problem for Java is the automated memory management. Unlike C/C++, coders don't have to worry about allocating and freeing memory, because Java takes care of that for them. The unfortunate consequence is that seemingly simple code will recklessly make copies of objects in memory, spending most of its time allocating/freeing memory invisible to the coder. The coders who study the mechanics of the Java virtual machine can avoid this problem and write fast code with few memory copies. Few Java programmers study these mechanics, however.

We see this tension between abstract and concrete in file-formats and network-protocols. When hackers attack your code (through buffer overflows, for example), they will try to corrupt the file-format or network-protocol. Unfortunately, most coders do not understand what their format/protocol looks like. This is a concrete detail lost in the code. The format of the file on the disk, or the packet on the wire, is essentially an accidental byproduct of the various abstractions in the code. (This is also why there are so many vulnerabilities - file/packet processing is spread throughout the code rather than located in one place).

Thus, the first thing we are going to do when analyzing code is fire up a packet-sniffer (like Wireshark) and look at the packets it is sending on the wire. We'll open up a binary editor (like Hexview) to see how the files are formatted. We'll use reverse engineering tools (like IDApro) to see what's in the code. We'll run debuggers (like gdb or Visual Studio) to step through the code.

Thus, the first step in auditing code isn't to figure out the abstract intent of what it's supposed to be doing, but to figure out the concrete reality of what it's actually doing. Even when we have the source, we are still going to reverse-engineer your binary.

Of course, the first day on any project we often hear customers say "...oh, I didn't know it was doing that".

1 comment:

Toby said...

Thinking in terms of the abstract is fine only if the implementation of the language in which you're writing is "fully abstract".

Roughly, full abstraction implies that assumptions made at the abstract level hold at the level of the implementation.

Failures of full abstraction naturally lead to vulnerabilities unless one thinks in terms of the implementation when one is writing code.

A great article on this
http://research.microsoft.com/~akenn/sec/appsem-tcs.pdf