Saturday, September 27, 2014

The shockingly obsolete code of bash

One of the problems with bash is that it's simply obsolete code. We have modern objective standards about code quality, and bash doesn't meet those standards. In this post, I'm going to review the code, starting with the function that is at the heart of the #shellshock bug, initialize_shell_variables().

K&R function headers

The code uses the K&R function headers which have been obsolete since the mid-1980s.

I don't think it's there to support older compilers, because other parts of the code use modern headers. I think it's there simply because they are paranoid about making unnecessary changes to the code. The effect of this is that it messes up static analysis, both simple compiler warnings as well as advanced security analysis tools.

It's also a stylistic issue. There's only one rule to coding style, which is "avoid surprising things", and this is surprising.

Ultimately, this isn't much of an issue, but a symptom that there is something seriously wrong with this code.

Global variables everywhere

Global variables are bad. Your program should have a maximum of five, for such things as the global debug or logging flag. Bash has hundred(s) of global variables.

Also note that a large number of these globals are defined in the local file, rather than including a common definition from an include file. This is really bad.

Another way of looking at the problem is looking at the functions that operate on global variables. In such cases, the functions have no parameters, as in the following:

Functions with no parameters (void) and no return should be a an extreme rarity in C. It means the function is operating on global variables, or is producing some side effect. Since you should avoid globals and side effects, such functions should virtually never exist. In Bash, such functions are pervasive.

Lol, wat?

The the first step in this function is to initialize the "environmental variables" (the ones that get executed causing the #shellshock vuln), so the first code is a for loop "for all variables". This loop contains a really weird syntax:

This is painful for so many reasons, the most noteworthy of which is that instead of incrementing the index in the third clause of the for loop, that clause is empty and instead the programmer does it in the second clause. In other words, it should look like this:

(Astute readers will note that this change isn't precisely identical, but since 'string_index' is used nowhere else, either inside or after the loop, that slight change in meaning is moot).

There is really no excuse for this sort of programming. In terms of compiler optimizations, it makes no difference in performance. All it does is confuse programmers later who are trying to read your spaghetti code. To be fair, we all have brain farts where we do something weird like this -- but it seems like oddities like this are rather common in bash.

I suspect the reason the programmer did this was because they line was getting rather long, and short lines are easier to read. But the fault here is poor choice of variable names. There is no need to call the index variable 'string_index' since it's used nowhere else except on line 329. In such cases, the variable 'i' is far superior. It communicates to the reader that you are doing the expected thing of simply enumerating 'env[]', and that the index variable is unimportant except as to index things. This is really more of a stylistic issue and isn't terribly important, but I use it to hammer home the point that the less surprising the code, the better. The code should've looked like this:

Finally, the middle clause needs some work. The expected operation here is a comparison not an assignment. This will cause static analyzers to throw up nasty warning messages. I suppose you could call this a "false positive", since the code means to do an assignment here, but here's the rule of modern C programming: you write to make static analyzers happy. Therefore, the code needs to be changed to:

The lesson here is that enumerating over a NULL-terminates list of strings is a damn common pattern in C. The way you do it should look like the same way that everybody does it, and the above snippet code is the pattern that most everyone uses. When you don't do the expected, you confuse everyone, from code reviewers to static analyzers.

No banned functions

Today, we know not to use dangerous functions like strcpy(). strncpy(), sprintf(), and so forth. While these functions can be used safely by careful programmers, it's simply better to ban their use altogether. If you use strcpy(), the code reviewer has to check each and every instance to make sure you've used it safely. If you've used memcpy(), they don't.

The bash code uses these dangerous functions everywhere, as in the following lines:

I've included enough lines to demonstrate that their use of strcpy() is safe (char_index is the length of the name string). But that this particular instance is safe isn't the issue -- the issue is that it's really difficult for code reviewers to verify that it's safe. Simply using the safer 'snprintf()' would've been much easier to verify:

One thing to remember when doing code is that writing it to make it clear to security-reviewers tends to have the effect of making code clearer for everyone else as well. I think the above use of snprintf() is much clearer than strcpy() -- as well as being dramatically safer.


In response to #shellshock, Richard Stallman said the bug was just a "blip". It's not, it's a "blimp" -- a huge nasty spot on the radar warning of big things to come. Three more related bugs have been found, and there are likely more to be found later. The cause isn't that a programmer made a mistake, but that there is a systematic failure in the code -- it's obsolete, having been written to the standards of 1984 rather than 2014.

Where to go from here

So now that we know what's wrong, how do we fix it? The answer is to clean up the technical debt, to go through the code and make systematic changes to bring it up to 2014 standards.

This will fix a lot of bugs, but it will break existing shell-scripts that depend upon those bugs. That's not a problem -- that's what upping the major version number is for. The philosophy of the Linux kernel is a good one to emulate: documented functionality describing how the kernel should behave will be maintained until Linus's death. Undocumented, undefined behavior that stupid programs depend upon won't be maintained. Indeed, that's one of the conflicts between Gnu and Linux: Gnu projects sometimes change documented behavior while at the same time maintaining bug-compatibility.

Bash isn't crypto, but hopefully somebody will take it on as a project, like the LibreSSL cleanup effort.

Friday, September 26, 2014

Do shellshock scans violate CFAA?

In order to measure the danger of the bash shellshock vulnerability, I scanned the Internet for it. Many are debating whether this violates the CFAA, the anti-hacking law.

The answer is that everything technically violates that law. The CFAA is vaguely written allowing discriminatory prosecution by the powerful, such as when AT&T prosecuted 'weev' for downloading iPad account information that they had made public on their website. Such laws need to be challenged, but sadly, those doing the challenging tend to be the evil sort, like child molesters, terrorists, and Internet trolls like weev. A better way to challenge the law is with a more sympathetic character. Being a good guy defending websites still doesn't justify unauthorized access (if indeed it's unauthorized), but it'll give credence to the argument that the law is unconstitutionally vague because I'm obviously not trying to "get away with something".

Law is like code. The code says (paraphrased):
intentionally accesses the computer without authorization thereby obtaining information
There are two vague items here, "intentionally" and "authorization". (The "access" and "information" are also vague, but we'll leave that for later).

The problem with the law is that it was written in the 1980s before the web happened. Back then, authorization meant explicit authorization. Somebody first had to tell you "yes, you can access the computer" before you were authorized. The web, however, consists of computers that are open to the public. On the web, people intentionally access computers with the full knowledge that nobody explicitly told them it was authorized. Instead, there is some vague notion of implicit authorization, that once something is opened to the public, then the public may access it.

Unfortunately, whereas explicit authorization is unambiguous, the limits of implicit authorization are undefined. We see that in the Weev case. Weev knew that AT&T did not want him to access that information, but he believed that he was nonetheless authorized because AT&T made it public. That's the tension in the law, between unwanted access vs. unauthorized access.

It would be easy to just say that anything the perpetrator knows is unwanted is therefore unauthorized, but that wouldn't work. Take the NYTimes, for example. They transmit a "Cookie" to your web-browser in order to limit access to their site, in order to encourage you to pay for a subscription. The NYTimes knows that you don't want the cookie, that placing the cookie on your computer is unwanted access. This unwanted access is clearly not hacking.

Note that the NYTimes used to work a different way. It blocked access until you first created an account and explicitly agreed to the cookie. Now they place the cookie on your computer without your consent.

Another example is Google. They access every public website, downloading a complete copy of the site in order to profit by other people's content. They know that many people don't want this.

Finally there is the example of advertisements, especially Flash/JavaScript ads with flashy content that really annoy us. This unwanted code is designed to annoy us -- as long as it gets our attention. (h/t @munin).

These, and a thousand other examples, demonstrates that "unwanted but authorized" access on the public Internet is the norm.

Figuring out when public, but unwanted, access crosses the line to "unauthorized" is the key problem in the CFAA. Because it's not defined, it invites arbitrary prosecution. Weev embarrassed the powerful, not only AT&T and Apple, but the politicians whose names appeared in the results. Prosecutors therefore came up with a new interpretation of the CFAA by which to prosecute him.

A common phrase you'll hear in the law is that "ignorance of the law is no excuse". For example, a lot of hackers get tripped up by "obstruction of justice". It's a law that few know, but ignorance of it doesn't make you innocent. Barret Brown's mother is serving a 6-month sentence for obstruction of justice because she didn't know that hiding her child's laptop during execution of a search warrant would be "obstruction of justice".

But this "ignorance of the law" thing doesn't apply to the Weev case, because everyone is ignorant of the law. Even his lawyers, planning ahead of time, wouldn't be able to figure it out. In my mass scanning of the Internet people keep telling me I need to consult with a lawyer to figure out if it's "authorized". I do talk to lawyers about it, including experts in this field. Their answer is "nobody knows". In other words, the answer is that prosecutors might be able to successfully prosecute me, but not because the law clearly says that what I'm doing is illegal, but because the law is so vague that it can be used to successfully prosecute anybody for almost anything -- like Weev.

That's the central point of any appeal in my case of getting arrested for scanning: that the CFAA is "void for vagueness".  The law is clearly too vague for the average citizen to understand. Of course, every law suffers from a little bit of vagueness, but in the case of the CFAA, the unknown parts are extremely broad, covering virtually all public access of computers. When computers are public, as on the web, and you do something slightly unusual, there is no way for reasonable people to tell if the conduct is "authorized" under the law. The very fact that my lawyers can't tell me if mass scanning of the Internet is "authorized" is a clear indication that the law is too vague.

The reason vagueness causes the law to become void is that it violates due process. It endangers a person with arbitrary and discriminatory prosecution. Weev was prosecuted not because a reasonable person should have known that such access was impermissible under the CFAA, but because his actions embarrassed AT&T, Apple, and some prominent politicians like Rahm Emanuel.

Lawyers think that the word "intentional" in the CFAA isn't vague. It's the mens rea component, and is clearly defined. There are four levels of mens rea: accidental/negligent, reckless, knowing, and intentional. It differentiates manslaughter (negligent actions that lead to death) vs. murder (intentionally killing someone). The CFAA has the narrowest mens rea component, intentional. That partially resolves the problem of accessing public websites: you may not be authorized, but as long as you don't know it, then your access is not illegal. Thus, you can click on the following link xyzpdq, and even though you suspect that I'm trying to trick you into accessing something you shouldn't, it's still okay, because you didn't know for certain if it was unauthorized. (Yes, that URL is designed to look like hacking, but no, I'm fairly certain it won't work, because the NSA has never had a 'cgi-bin' subdirectory according to Google). You can "recklessly" access without authorization, but as long as it's not "intentional", you don't violate the CFAA.

Lawyers think this is clear, but it isn't. We know Weev's state of mind. We knew he believed his actions were authorized. For one thing, all his peers in the cybersecurity community think it's authorized. For another thing, he wouldn't have published the evidence of his 'crime' on Gawker if he thought it were a crime.

Yet, somehow, this isn't a mens rea defense. You can read why on the Wikipedia article on mens rea. This is merely the subjecive test, but the courts also have an objective test. It's not necessarily Weev's actual intentions that matter, but the intentions of a "reasonable person". Would a reasonable person have believed that accessing AT&T's servers that way was unauthorized?

This test is bonkers for computers, because a "reasonable person" means an "ignorant person". Reasonable people who know how the web works, who have read RFC 2616, believe Weev's actions are clearly authorized. Other reasonable people who know nothing except how to access Facebook with an iPad often believe otherwise -- and it's the iPad users the court relies upon for "reasonable person".

If you are on a desktop/laptop, you are reading this blogpost in a browser. At the top of your browser is the URL field. You can click on this and edit this field. When presented with a URL like "", you know you can edit the URL, changing the '5' to a '6', and thereby access the next article in the sequence. Reasonable people who know how the web works routinely do this every day -- we know the URL field is there for exactly this reason. Ignorant-but-reasonable people who don't know how computers work have never edited the URL. To the ignorant, the URL is some incomprehensible detail that nobody would ever edit, and that if they ever did, it was because they were "hacking".

In legal terms, this means that the mens rea for the CFAA is actually "strict liability". Your actual intentions are irrelevant, because it's the intentions of the ignorant that matter. And the ignorant think anything other than clicking on links is unauthorized. Hence, editing the URL field is "intentional unauthorized access".

I have this fantasy that one day Tim Berners-Lee (the designer of the web) gets prosecuted for incrementing the URL to access the next article. In the debate about "how the web works" and "what does authorization mean", Tim will be refering to RFC 2616 which he wrote. However, he'll be found guilty because the ignorant people in the jury box, consisting of his 'reasonable' peers, thinks it works a different way. Tim will say "I designed the web so that people could increment the URL" whereas the jury would claim "no reasonable person would ever increment the URL".

What we have is something akin to the Salem Witch Trials, where a reasonable jury of their peers convicted people for practicing witchcraft. To the average person on the street, computers work by magic, and those who do strange things are practicing witchcraft. Weev was convicted of witchcraft, and nothing more.

That brings me back to my scan of the Internet for the Shellshock bug. The facts are not in doubt. I document exactly what I sent to the web servers. That I didn't intend to "hack" the servers and believed my accessed was "authorized" is likewise clear.

Some of my peers are uncomfortable, though, because the nature of the access is unusual. But they haven't thought things through. This isn't a buffer-overflow remote-code execution, where data becomes code contrary to the expectations of the programmer. Instead, it's code execution according to the intentions of the programmer. Shellshock is a feature whose defined intent was to execute code. Shellshock is fixed by removing a feature from bash that has been used for 20 years. That servers are misconfigured to run shellshock code doesn't make it unauthorized.

Furthermore, there is the "thereby obtains information" clause. If my command were "cat /etc/passwd", I can understand there'd be an issue. In the Weev cause, it's clear that the programmers intended for the iPad account information to be public, but it's clear in this case that nobody intends "/etc/passwd" to be public. But I don't use Shellshock to get the password file, I use 'ping' because clearly pinging is authorized -- because pings are a normal authorized interaction between two computers on the Internet.

If you want to claim that all "code execution" is invalid, then a lot of what we do becomes invalid. For example, our community routinely adds a tick mark ' onto URLs to test for SQL injection. That's technically code execution. By pasting strings, website programmers have implicitly authorized us to run some SQL code, like tick marks. It doesn't mean they've authorized us to execute all code, like getting the password file, or doing the famous "; DROP TABLES Students". But it does mean that they've authorized the principle of running code -- which is why we put tickmarks in URLs with reckless abandon. Heck, when websites are broken, we'll write entire SQL queries to get the information in our account that we believe we are authorized to.

At least, that's the narrow reading we've all been using of the CFAA: when they make a website public, and they've configured certain features (albeit without full understanding of their actions), then we feel authorized to use them. It's their responsibility to make thinks explicitly un-authorized, not our responsibility to figure out what's been implicitly authorized. If they put a password on it, we recognize that as "authorization", and we don't try to bypass the password even if we can (even with URL editing, even with SQL code). Conversely, when it's public, we treat things as public. We have simple criteria, "authorized means explicit" and "public means public".

I know that I'm at risk for prosecution of the CFAA, but somebody has to do this. Unless security researchers are free of the chilling-effects of the law, Chinese cyberwarriors and cyberterrorists will devastate our country. More importantly, the CFAA is unconstitutionally vague violating due process, and somebody has to defend the constitution. I can handle getting prosecuted, so I'm willing to stick my neck out.

Update: The point I'm trying to make about 'mens rea' is that it doesn't resolve the ambiguity over "authorization". Some people have claimed that the law isn't void for vagueness, because 'intent' clarifies things. It doesn't. All access is intentional, it's authorization that's the question. If I think I'm authorized, but the law disagrees, then "ignorance-of-law-is-no-excuse" trumps "I thought I was authorized", thus we are right back at strict liability. Only in the case of recklessly clicking on web links is there a difference. Anything more complex that technical people do collapses to ill-intentioned witchcraft.

Thursday, September 25, 2014

Many eyes theory conclusively disproven

Just because a bug was found in open-source does not disprove the "many eyes" theory. Instead, it's bugs being found now that should've been found sometime in the last 25 years.

Many eyes are obviously looking at bash now, and they are finding fairly obvious problems. It's obvious that the parsing code in bash is deeply flawed, though any particular bug isn't so obvious. If many eyes had been looking at bash over the past 25 years, these bugs would've been found a long time ago.

Thus, we know that "many eyes" haven't been looking at bash.

The theory is the claim promoted by open-source advocates that "many eyes makes bugs shallow", the theory that open-source will have fewer bugs (and fewer security problems) since anyone can look at the code.

What we've seen is that, in fact, very few people ever read code, even when it's open-source. The average programmers writes 10x more code than they read. The only people where that equation is reversed are professional code auditors -- and they are hired primarily to audit closed-source code. Companies like Microsoft pay programmers to review code because reviewing code is not otherwise something programmers like to do.

From bash to OpenSSL to LZO, the evidence is clear: few eyes are looking at open-source.

Shellshock is 20 years old (get off my lawn)

The bash issue is 20 years old. By this I don't mean the actual bug is that old (though it appears it might be), but that we've known that long that passing HTTP values to shell scripts is a bad idea.

My first experience with this was in 1995. I worked for "Network General Corporation" (which would later merge with McAfee Associates). At the time, about 1000 people worked for the company. We made the Sniffer, the original packet-sniffer that gave it's name to the entire class of products.

One day, the head of IT comes to me with an e-mail from some unknown person informing us that our website was vulnerable. He was in standard denial, asking me to confirm that "this asshole is full of shit".

But no, whoever had sent us the email was correct, and obviously so. I was enough of a security expert that our IT guy would come to me, but I hadn't considered that bug before (to my great embarrassment), but of course, one glance at the email and I knew it was true. I didn't have to try it out on our website, because it was self evident in the way that CGI scripting worked. I forget the exact details, but it was essentially no different than the classic '/cgi-bin/phf' bug.

So we've known for 20 years that this is a problem, so why does it even happen? I think the problem is that most people don't know how things work. Like the IT guy 20 years ago, they can't look at it and immediately understand the implications and see what's wrong. So, they keep using it. This perpetuates itself into legacy code that we can never get rid of. It's mainframes, 20 years out of date and still a 50-billion dollar a year business for IBM.

Wednesday, September 24, 2014

Bash 'shellshock' bug is wormable

Early results from my scan: there's about 3000 systems vulnerable just on port 80, just on the root "/" URL, without Host field. That doesn't sound like a lot, but that's not where the bug lives. Update: oops, my scan broke early in the process and stopped capturing the responses -- it's probably a lot more responses that than.

Firstly, only about 1 in 50 webservers respond correctly without the proper Host field. Scanning with the correct domain names would lead to a lot more results -- about 50 times more.

Secondly, it's things like CGI scripts that are vulnerable, deep within a website (like CPanel's /cgi-sys/defaultwebpage.cgi). Getting just the root page is the thing least likely to be vulnerable. Spidering the site, and testing well-known CGI scripts (like the CPanel one) would give a lot more results, at least 10x.

Thirdly, it's embedded webserves on odd ports that are the real danger. Scanning for more ports would give a couple times more results.

Fourthly, it's not just web, but other services that are vulnerable, such as the DHCP service reported in the initial advisory.

Consequently, even though my light scan found only 3000 results, this thing is clearly wormable, and can easily worm past firewalls and infect lots of systems. One key question is whether Mac OS X and iPhone DHCP service is vulnerable -- once the worm gets behind a firewall and runs a hostile DHCP server, that would "game over" for large networks.

Update: As many people point out, the path variable isn't set, so I need '/usr/ping' instead to get even more results.

Update: Someone is using masscan to deliver malware. They'll likely have compromised most of the system I've found by tomorrow morning. If they using different URLs and fix the Host field, they'll get tons more.

Bash 'shellshock' scan of the Internet

NOTE: malware is now using this as their User-agent. I haven't run a scan now for over two days.

I'm running a scan right now of the Internet to test for the recent bash vulnerability, to see how widespread this is. My scan works by stuffing a bunch of "ping home" commands in various CGI variables. It's coming from IP address

The configuration file for masscan looks something like:

target-ip =
port = 80
banners = true
http-user-agent = shellshock-scan (
http-header[Cookie] = () { :; }; ping -c 3
http-header[Host] = () { :; }; ping -c 3
http-header[Referer] = () { :; }; ping -c 3

(Actually, these last three options don't quite work due to bug, so you have to manually add them to the code

Some earlier shows that this bug is widespread:
A discussion of the results is at the next blogpost here. The upshot is this: while this scan found only a few thousand systems (because it's intentionally limited), it looks like the potential for a worm is high.

Bash bug as big as Heartbleed

Today's bash bug is as big a deal as Heartbleed. That's for many reasons.

The first reason is that the bug interacts with other software in unexpected ways. We know that interacting with the shell is dangerous, but we write code that does it anyway. An enormous percentage of software interacts with the shell in some fashion. Thus, we'll never be able to catalogue all the software out there that is vulnerable to the bash bug. This is similar to the OpenSSL bug: OpenSSL is included in a bajillion software packages, so we were never able to fully quantify exactly how much software is vulnerable.

The second reason is that while the known systems (like your web-server) are patched, unknown systems remain unpatched. We see that with the Heartbleed bug: six months later, hundreds of thousands of systems remain vulnerable. These systems are rarely things like webservers, but are more often things like Internet-enabled cameras.

Internet-of-things devices like video cameras are especially vulnerable because a lot of their software is built from web-enabled bash scripts. Thus, not only are they less likely to be patched, they are more likely to expose the vulnerability to the outside world.

Unlike Heartbleed, which only affected a specific version of OpenSSL, this bash bug has been around for a long, long time. That means there are lots of old devices on the network vulnerable to this bug. The number of systems needing to be patched, but which won't be, is much larger than Heartbleed.

There's little need to rush and fix this bug. Your primary servers are probably not vulnerable to this bug. However, everything else probably is. Scan your network for things like Telnet, FTP, and old versions of Apache (masscan is extremely useful for this). Anything that responds is probably an old device needing a bash patch. And, since most of them can't be patched, you are likely screwed.

Update: I think people are calling this the "shellshock" bug. Still looking for official logo.

Update: Note that the thing with the Heartbleed bug wasn't that that the Internet was going to collapse, but that it's in so many places that we really can't eradicate it all. Thus, saying "as bad as Heartbleed" doesn't mean your website is going to get hacked tomorrow, but that a year from now we'll be reading about how hackers got in using the vulnerability to something interesting.

Exploit details: The way this bug is exploited is anything that that first sticks some Internet parameter in an environmental variable, and then executes a bash script. Thus, simply calling bash isn't the problem. Thus, some things (like PHP apparently) aren't necessarily vulnerable, but other things (like CGI shell scripts) are vulnerable as all get out. For example, a lot of wireless routers shell out to "ping" and "traceroute" -- these are all likely vulnerable.

Tuesday, September 23, 2014

EFF, Animal Farm version

In celebration of "Banned Books Week", the EFF has posted a picture of their employees sitting around "reading" banned-books. Amusingly, the person in the back is reading "Animal Farm", a book that lampoons the populist, revolutionary rhetoric the EFF itself uses.

Orwell wrote Animal Farm at the height of World War II, when the Soviet Union was our ally against Germany, and where Stalin was highly regarded by intellectuals. The book attacks Stalin's cult of personality, showing how populist "propaganda controls the opinion of enlightened in democratic countries". In the book, populist phrases like "All animals are equal" over time get amended with such things as "...but some animals are more equal than others".

The hero worship geeks have for the EFF is a modern form of that cult of personality. Computer geeks unquestioningly support the EFF, even when the EFF contradicts themselves. There are many examples, such as supporting coder's rights while simultaneously attacking "unethical" coders. The best example, though, is NetNeutrality, where the EFF wants the government to heavily regulate Internet providers like Comcast. This is a complete repudiation of the EFF's earlier position set forth in their document "Declaration of Independence of Cyberspace".

So I thought I'd amend that document with updated EFF rhetoric:

  • You [governments] are not welcome among us, but corporations are even less welcome.
  • You have no some sovereignty where we gather.
  • You have no moral right to rule us to excess.
  • We did not invite you then, but we invite you now.
  • Do not think that you can build it, as though it were a public construction project. Thanks for building cyberspace, now please run it like a public utility.

Sunday, September 14, 2014

Hacker "weev" has left the United States

Hacker Andrew "weev" Auernheimer, who was unjustly persecuted by the US government and recently freed after a year in jail when the courts agreed his constitutional rights had been violated, has now left the United States for a non-extradition country:

I wonder what that means. On one hand, he could go full black-hat and go on a hacking spree. Hacking doesn't require anything more than a cheap laptop and a dial-up/satellite connection, so it can be done from anywhere in the world.

On the other hand, he could also go full white-hat. There is lots of useful white-hat research that we don't do because of the chilling effect of government. For example, in our VNC research, we don't test default password logins for some equipment, because this can be interpreted as violating the CFAA. However, if 'weev' never intends on traveling to an extradition country, it's something he can do, and report the results to help us secure systems.

Thirdly, he can now freely speak out against the United States. Again, while we theoretically have the right to "free speech", we see how those like Barret Brown are in jail purely because they spoke out against the police-state.

Thursday, September 11, 2014

Rebuttal to Volokh's CyberVor post

The "Volkh Conspiracy" is a wonderful libertarian law blog. Strangely, in the realm of cyber, Volokh ignores his libertarian roots and instead chooses authoritarian commentators, like NSA lawyer Stewart Baker or former prosecutor Marcus Christian. I suspect Volokh is insecure about his (lack of) cyber-knowledge, and therefore defers to these "experts" even when it goes against his libertarian instincts.

The latest example is a post by Marcus Christian about the CyberVor network -- a network that stole 4.5 billion credentials, including 1.2 billion passwords. The data cited in support of its authoritarianism has little value.

A "billion" credentials sounds like a lot, but in reality, few of those credentials are valid. In a separate incident yesterday, 5 million Gmail passwords were dumped to the Internet. Google analyzed the passwords and found only 2% were valid, and that automated defenses would likely have blocked exploitation of most of them. Certainly, 100,000 valid passwords is a large number, but it's not the headline 5 million number.

That's the norm in cyber. Authoritarian types who want to sell you something can easily quote outrageous headline numbers, and while others can recognize the data are hyped, few have the technical expertise to adequately rebut them. I speak at hacker conferences on the topic of password hacking [1] [2]; I can assure you those headline numbers are grossly inflated. They may be true after a fashion, but they do no imply what you think they do.

That blog post also cites a study by CSIS/McAfee claiming the economic cost of cybercrime is $475 billion per year. This number is similarly inflated, between 10 to 100 times.

We know the sources of income for hackers, such as credit card fraud, ransomware, and DDoS extortion. Of these, credit card fraud is by far the leading source of income. According to a July 2014 study by the US DoJ and FTC, all credit card fraud world-wide amounts to $5.55 billion per year. Since we know that less than half of this is due to hackers, and that credit card fraud is more than half of what hackers earn, this sets the upper limit on hacker income -- about 1% of what CSIS/McAfee claim as the cost of cybercrime. Of course, the costs incurred by hackers can be much higher than their income, but knowing their income puts us in the right ballpark.

Where CSIS/McAfee get their eye-popping numbers is vague estimates about such things as "loss of reputation" and "intellectual property losses". These numbers are arbitrary, depending upon a wide range of assumptions. Since we have no idea where they get such numbers, we can't put much faith in them.

Some of what they do divulge about their methods is obviously flawed. For example, when discussing why some countries don't report cybercrime losses, they say:
that some countries are miraculously unaffected by cybercrime despite having no better defenses than countries with similar income levels that suffer higher loss—seems improbable
This is wrong for two enormous reasons.

I developed a popular tool for scanning the Internet, and use it often to scan everything. Among the things this has taught me is that countries vary enormously, both in the way they exploit the Internet and in their "defenses". Two neighboring countries with similar culture and economic development can nonetheless vary widely in their Internet usage. In my person experience, it is not improbable that two countries with similar income levels will suffer different losses.

The second reason the above statement is wrong is their view of "defenses", as if the level of defense (anti-virus, firewalls, intrusion prevention) has a bearing on rates of hacking. It doesn't. It's like cars: what matters most as to whether you die in an accident is how often you drive, how far, where, and how good a driver you are. What matters less are "defenses" like air bags and anti-lock brakes. That's why automobile death rates in America correlate with things like recessions, the weather, building of freeways, and cracking down on dunk drivers. What they don't correlate with are technological advances in "defenses" like air bags. These "defenses" aren't useless, of course, but drivers respond by driving more aggressively and paying less attention to the road. The same is true in cyber, technologies like intrusion prevention aren't a magic pill that ward off hackers, but a tool that allows increased risk taking and different tradeoffs when exploiting the Internet. What you get from better defenses is increased profits from the Internet, rather than decreased losses. I say this as the inventor of the "intrusion prevention system", a popular cyber-defense that is now a $2 billion/year industry.

That McAfee and CSIS see "defenses" the wrong way reflects the fact that McAfee wants to sell "defensive" products, and CSIS wants to sell authoritarian legislation. Their report is not an honest assessment from experts, but an attempt to persuading people into buying what these organizations have to sell.

By the way, that posts mentions "SQL injection". It's a phrase you should pay attention to because it's been the most common way of hacking websites for over a decade. It's so easy teenagers with little skill can do SQL injection to hack websites. It's also easily preventable, just use a thing called "parameterized queries" instead of a thing called "string pasting". Yet, schools keep pumping out website designers that know nothing of SQL injection and who "paste strings" together. This leads to the intractable problem that if you hire a university graduate to do your website, they'll put SQL injection flaws in the code that your neighbor's kid will immediately hack. Companies like McAfee try to sell you defenses like "WAFs" that only partly defend against the problem. The solution isn't adding "defenses" like WAFs, but to change the code from "string pasting" to "parameterized queries" which does completely prevent the problem. That our industry thinks in terms of "adding defenses" from vendors like McAfee, instead of just fixing the problem, is why cybersecurity has become intractable in recent years.

Marcus Christian's post ends with the claim that "law enforcement agencies must assume broader roles and bear greater burdens", that "individual businesses cannot afford to face cybercriminals alone", and then paraphrases text of recently proposed cybersecurity legislation. If you are libertarian, you should oppose this legislation. It's a power grab, increasing your own danger from law enforcement, and doing nothing to lessen the danger from hackers. I'm an expert in cybersecurity who helps companies defend against hackers, yet I'm regularly threatened and investigated by law enforcement thugs. They don't understand what I do, it's all witchcraft to them, so they see me as part of the problem rather than the solution. Law enforcement already has too much power in cyberspace, it needs to be rolled back, not extended.

In conclusion, rather than an "analysis" as Eugene Volokh claims, this post from Marcus Christian was transparent lobbying for legislation, with the standard distortion of data that the word "lobbying" implies. Readers of that blog shouldn't treat it as anything more than that.