Saturday, August 10, 2013

Witty hubris

In response to my previous two blogposts on "12 steps to safe code" and "trustworthy code", @bellytales makes a fair point:




We had two DoSes in the kernel which I'm pretty sure could not result in reliable remote-code execution. The Witty Worm bug, on the other hand, was a horribly simple bug with devastating consequences.

The Witty bug came from code that looks like the following:

sprintf(buf, "%s", pkt);

This is an OMFGBBQWTF?? bug. It could easily be found by anybody running "strings" on the binary looking for "%s". It's as obvious a vulnerability as a vulnerability can be. How could it possibly get into the code?

That's actually a pretty useful story. The answer is that it wasn't in the code, precisely. It was in a bit of code that was supposed to only exist within an #ifdef DEBUG version.  We were reverse engineering the ICQ protocol, which wasn't documented back then. We had all sorts of extra code to print stuff out on the command-line to help us in that reverse-engineering effort.

Somehow in the course of time, the #ifdef disappeared, and that section of code ended up in the release build. That's not an excuse, just an explanation. A large number of vulnerabilities have a similar sort of explanation: it's not just one thing that lead to the vuln, but a sequence of bad decisions.

Dealing with "network packets" is inherently dangerous in the C programming language. With BlackICE, we had a technique for dealing with inherent unsafety by using "state machines", a weird programming model that fixed once-and-for-all the problem of buffer-overflows in C network code. Except it didn't stop buffer-overflows, as the above discussion proves. This should be a good lesson for the C programming languages: you need to increase your paranoia level.

We tend to work from the model of "provable security" -- we prove to ourselves that something is secure. This is a critical thinking failure: we need keep trying to prove something is insecure. I'm a reverse engineer. I've reverse engineered other people's products to find vulns. Had I simply taken that same hostile attitude toward my own product, I would've found the %s and hence the vuln. That's why I put in my 12 Steps the idea that you reward people for finding problems. You need to reward internal people for finding such things, and you need bounty programs for outside people to find these things.

We had a "secure development" program from the very start with BlackICE. We had automated regression testing. We were doing "fuzz testing" before that technique even had a name. As mentioned above, we used "inherently safe" programming techniques. Yet, after Witty, we overhauled things to make things even more secure.

The first thing we did was add "code coverage" to the regression tests. We required that the automated regression tests exercise something like 75% of all lines of code, that new modules would have a requirement of 100% coverage, and that over time, we'd increase the code coverage requirements. This would have solved the Witty vuln: those unnecessary sprintf()s would've stood out as un-exercised code.

We created a "banned function" list, to ban things like strcpy() and sprintf(). Again, this would've prevented the Witty bug.

We created our own "stack" and "heap" cookies, since at the time, these weren't built into the compiler. We also included special switches on the product to allow a stack/heap overflow with a well-known packet, so that outsiders could test this defense technique. This would've mitigated the Witty vuln, as the stack cookie would've prevented easy exploitation.

We hired two companies to do static analysis of the code, one using the source, the other with only the binary. Both confirmed that there were no other bugs in the packet/protocol parsing code. However, they did find one additional bug in the driver. This was quite annoying, as the driver code was about 5% of the protocol parsing code, but except for the Witty bug, accounted for all the vulns. This was doubly annoying: I told them "don't bother looking in the driver, because the attack surface is tiny compared to the protocol parsing code". I was dead wrong. To this day, former employees won't let me live this down.

When I created BlackICE, we had daily builds, and while the regression tests were mostly automated, we didn't run them constantly. When ISS bought my product, they changed to hourly builds, and included the regression test with the build. Thus, "breaking build" wasn't because your code didn't compile, but because the automated regression test failed. The upshot of this was that we fixed the Witty vuln in under 24 hours and released it to our customers. That was a decade ago, today Google does 24 hour fixes in Chrome, a massively complex product. Having gone through the experience, I think this is a reasonable standard to hold companies to: fixing vulns within 24 hours. If a company promotes its security, and cannot fix its vulns in a timely manner, then you have to question that company's commitment. This is probably Microsoft's greatest weakness -- it still takes them too long to fix a bug.

We didn't have bug bounties back a decade ago. Today, with the success of so many bounty programs, I feel we have to start adding that as a "required" feature of trustworthy companies. Apple has no such bounty program, so we need to use this to doubt Apple's trustworthiness.


Going back to the original tweet above with the accusation of "glasses houses": the issue isn't the vuln, but the handling of the vuln.  I feel I handled my vulns well, my criticism of Silent Circle is that they handle their vulns poorly.

No comments: