Being that this is the decadal anniversary, I thought I’d write up something on the bug from the insider’s perspective.
Back in 1998, I created “BlackICE”, the first “intrusion prevention system”. One variant was a desktop product, “BlackICE Defender”, that acted as a personal firewall. If you are above a certain age in the cybersec industry, you probably played around with it at the time. Another variant of the code was “BlackICE Guard”, acting as a network IPS. The third variant was “BlackICE Sentry”, acting merely as an IDS – but the first IDS that could run at a full gigabit per second.
In terms of intrusion detection, my product was 10 times faster, 10 times better at catching intrusions, with 10 times fewer “false-positives” than the market leading IDS called “RealSecure” from a company called “Internet Security Systems”. This is not (merely) me boasting about my great product – it was the conclusion of their own competitive analysis team. Consequently, ISS purchased my company, and replaced their IDS technology with my own.
At the time of Witty, there were two sets of products the worm would infect: the legacy BlackICE product, and the new RealSecure code that included BlackICE code. It infected the entire range, from desktop to servers to IPS to gigabit IDS. However, since normal use of the product caused frequent engine updates, most customers had already patched the bug – even though the worm came out soon after disclosure.
It was an “sprint(%s)” bug.
If you know vulns, you’ll know that this is the OMGWTFBBQ most obvious sort of bug possible, one that somebody like me should never, ever make, for any reason. It’s inconceivable.
My excuse is this: it was debug code that was never intended to ship with the product. We wrote a parser for ICQ, but parts were undocumented. As programmers are wont to do, we added bunches of sprint() to help us reverse engineer the parts of ICQ we didn’t understand.
Sure, they were vulnerable as heck, but it didn’t matter, because they were only compiled into the debug versions, and were only temporary. Once we figured out ICQ, we’d be taking them out. They would never ship to customers, so they were okay. Except we shipped them in the “release” version to customers.
There is a lesson here about vulns. How they are viewed by the attacker is wholly unlike how things are perceived by the defender. The attacker just loads IDApro, searches for “%s” strings, then takes one step back to see if they are used in “sprint()”. They don’t know the why’s and wherefore’s of source-code. They don’t see the faulty “#ifdef DEBUG”. They just know what they see in the released binary.
By the way, part of my embarrassment is that I’m also a reverse engineer. Had I loaded my own code in IDApro, I would’ve seen this bug immediately. It is the #1 most visible, easily found bug by reverse engineers. But of course, why would I reverse engineer my product if I already have the source?
In any case, I’d like to argue it’s the fault of Dennis Ritchie or Brian Kernighan. The buffer overflow happens in sprint(), their code, so technically, IT’S NOT MY FAULT!!! It’s a good, sound argument, but I’m not sure how many of my fellow cybersecurity experts I’ll be able to convince of this.
ISS hired independent auditors to come in and look at the code, one to do a source code audit, and one to do a binary audit without access to the source code. Both audits came up clean, except for one bug in the driver (which was separate . There were no systematic bugs in the code – it really was a one-off thing.
By the way, this is why I have no patience for things like RSA’s position on the Dual_EC_DRBG backdoor. It’s an enormous failure. It demands an independent accounting. It’s what I did when I failed – it’s what everyone should be made to do.
Like many vulnerabilities, such as the recent Apple #gotofail bug, this bug would’ve been caught by “code coverage”. The branch of code containing the sprint(%s) was never executed, and would’ve stood out had we been measuring unit test code coverage. It would’ve forced an engineer to look more closely at those lines of code – which would have led to their immediate removal.
I forget the exact details, but after Witty, we created a policy of at least 80% code coverage for any existing source file, and 100% coverage for new files. That “merely 80%” may seem lax, but it was difficult to get old files up to that level. Programming for 100% coverage requires very different techniques.
I’m generally a “security metrics” hater, because there are so few good security metrics. The ones people come up with to satisfy bureaucratic requirements are nonsense metrics. However, “code coverage” is an excellent security metric. It’s so strong it’s something you should demand of your suppliers. Your RFPs should always ask the question “Your unit tests cover what percentage of source code?”
In many respects, Witty was a typical worm.
One interesting bit was that it was a “UDP worm”, like Slammer. This meant that it was able to overload local links, spewing out packets as fast as it could. TCP worms don’t do that, because they eventually have to stop and wait for responses.
The most unique aspect of Witty was that it was destructive: it intentionally corrupted files on the disk. Blaster (which tried to overload or “blast” Microsoft’s website) and Witty were really the only two “malicious” worms of that era – worms trying to damage rather than merely spread.
Some researchers wrote a paper that cleverly reverse engineered some details about the worm. They were able to find “patient-zero”, the machine that launched the worm.
However, they made a big mistake. They erroneously claimed that the worm started with a “hit-list” of about 50 computers located on a US Army base. This caused much speculation about whether the worm as an inside job, or was an attack against the army.
It’s just a big deal that I’m not going to discuss it in this post. Instead, I put that discussion in a second post here. I’ll show conclusive proof that’ll convince even the original authors they were wrong on this point.
As a result of the worm, I spent the next several months traveling around the world to our largest customers, prostrating myself in front of them. I think this may be the only time that the programmer who was at least partly responsible for the bug also was made to appear before big governments and big corporations to apologize for it.
It was right that I should do so, but it was also difficult. For example, the Army was convinced (by the aforementioned researchers) that it was an attack on them. I walked them through the data to show otherwise (as I’ll show in the next post). I wasn’t very successful – to this day there are many in the military who believe Witty was a cyberwar offensive attack.
There was also the business difficulty, where customers wanted to be reimbursed for the damage they suffered from the worm, or be given free products. This is impossible. It’s the nature of software, and the reason why software liability of software is ludicrous. The problem of software security is so huge that no vendor can shoulder that risk. It’s like my masscan program, which a bunch of people are using. It has bugs, I do reasonable things to reduce the risk (like offer bounties and good unit tests), but ultimately, I’m not going to indemnify you: it’s up to you to accept the risk, or not.
Who wrote Witty?
Because of that hit-list argument, some felt it must’ve been an inside job, possibly insiders at ISS. That’s nonsense. For one thing, the company that notified us, “eEye”, gave us virtually no details, and the patch was available in a day, giving any insider no time.
My guess is that the creator of the worm had information – but from eEye. That company had many blackhats working for it, and had extensive interaction with fellow blackhats, and would regularly leak inside information out to their tight-knit community. For example, when Kevin Mitnick was released from jail, he almost immediately got inside information on CodeRed a month before it went public.
The exploit used by the worm was related to scanner-check eEye published. Moreover, the actions of the worm corrupting files meant it attacked an eEye competitor.
I have no proof, of course, but I feel that the most probable identity of the attacker was a member of the eEye community.
Though, it did become a joke at ISS for a while, with people claiming “I wrote Witty”. The humor came from watching those in the marketing department panic: even as a joke, it’s not something any ISS employee should be caught saying. So we kept saying it … in their presence.
What makes my gut churn
You’d think the Witty worm was the worst experience of my career, but it wasn’t. Mistakes are part of life. I do what I can to mitigate them, and do what’s right to atone for them.
Instead, what really bothered me around the same timeframe was Gartner. What that company sells is astrology, and the industry as a whole is so stupid that they buy it. For example, Gartner claimed that “IDS was dead” because, for example, no existing IDS product of the time could handle more than 500-mbps of traffic. This was false then, as proven by the fact that IDS is still strong today. It was also false, because my product could handle 1-gbps just fine. I was at a meeting at the Pentagon whose engineers were happily running my product at 800-mbps, and the Gartner didn’t believe them, because he knew (based on zero data) that it wasn’t possible. In any event, since I created the first IPS, I asked the Gartner analyst why he didn’t mention that fact. His response was because I had not hired him to consult on our marketing strategy.
It’s Gartner, not the failings in my code, which caused me to give up on IDS/IPS. Even today, watching the nonsense Gartner spouts about IDS/IPS, I still can’t believe you people give them any credence. But now, this causes laughter in my gut, rather than ulcers.
There’s probably a lot more to this story. If you’ve got questions, or things to add, add them to the comments, or send me a tweet @ErrataRob.