Sunday, August 30, 2015

About the systemd controversy...

As a troll, one of my favorite targets is "systemd", because it generates so much hate on both sides. For bystanders, I thought I'd explain what that is. To begin with, I'll give a little background.

An operating-system like Windows, Mac OS X, and Linux comes in two parts: a kernel and userspace. The kernel is the essential bit, though on the whole, most of the functionality is in userspace.

The word "Linux" technically only refers to the kernel itself. There are many optional userspaces that go with it. The most common is called BusyBox, a small bit of userspace functionality for the "Internet of Things" (home routers, TVs, fridges, and so on). The second most common is Android (the mobile phone system), with a Java-centric userspace on top of the Linux kernel. Finally, there are the many Linux distros for desktops/servers like RedHat Fedora and Ubuntu -- the ones that power most of the servers on the Internet. Most people think of Linux in terms of the distros, but in practice, they are a small percentage of the billions of BusyBox and Android devices out there.

The first major controversy in Linux was the use of what's known as the microkernel, an idea that removes most traditional kernel functionality and puts it in userspace instead. It was all the rage among academics in the early 1990s. Linus famously rejected the microkernel approach. Apple's Mac OS X was originally based on a microkernel, but they have since moved large bits of functionality back into the kernel, so it's no longer a microkernel. Likewise, Microsoft has moved a lot of functionality from userspace into the Windows kernel (such as font rendering), leading to important vulnerabilities that hackers can exploit. Academics still love microkernels today, but in the real world it's too slow.

The second major controversy in Linux is the relationship with the GNU project. The GNU project was created long before Linux in order to create a Unix-like operating system. They failed at creating a usable kernel, but produced a lot of userland code. Since most the key parts of the userland code in Linux distros comes from GNU, some insist on saying "GNU/Linux" instead of just "Linux". If you are thinking this sounds a bit childish, then yes, you are right.

Now we come to the systemd controversy. It started as a replacement for something called init. A running Linux system has about 20 different programs running in userspace. When the system boots up, it has only one, a program called "init". This program then launches all the remaining userspace programs.

This init system harks back to the original creation of Unix back in the 1970s, and is bit of a kludge. It worked fine back then when systems were small (when 640k of memory was enough for anybody), but works less well on today's huge systems. Moreover, the slight difference in init details among the different Linux distros, as well as other Unix systems like Mac OS X, *BSD, and Solaris, is a constant headache for those of us who have to sysadmin these boxes.

Systemd replaces the init kludge with a new design. It's a lot less kludgy. It runs the same across all Linux distros. It also boots the system a lot a faster.

But on the flip side, it destroys the original Unix way of doing things, becoming a lot more like how the Windows equivalent (svchost.exe) works. The Unix init system ran as a bunch of scripts, allowing any administrator to change the startup sequence by changing a bit of code. This makes understanding the init process a lot easier, because at any point you can read the code that makes something happen. Init was something that anybody could understand, whereas nobody can say for certain exactly how things are being started in systemd.

On top of that, the designers of systemd are a bunch of jerks. Linus handles Linux controversies with maturity. While he derides those who say "GNU/Linux", he doesn't insist that it's wrong. He responds to his critics largely by ignoring them. On the flip side, the systemd engineers can't understand how anybody can think that their baby is ugly, and vigorously defend it. Linux is a big-tent system that accepts people of differing opinions, systemd is a narrow-minded religion, kicking out apostates.

The biggest flaw of systemd is mission creep. It is slowly growing to take over more and more userspace functionality of the system. This complexity leads to problems.

One example is that it's replaced traditional logging with a new journal system. Traditional, text-based logs were "rotated" in order to prevent the disk from filling up. This could be done because each entry in a log was a single line of text, so tools could parse the log files in order to chop them up. The new journal system is binary, so it's not easy to parse, and hence, people don't rotate the logs. This causes the hard drive to fill up, killing the system. This is noticeable when doing things like trying to boot a Raspberry Pi from a 4-gigabyte microSD card. It works with older, pre-systemd versions of Linux, but will quickly die with systemd if something causes a lot of logging on the system.

Another example is D-Bus. This is the core system within systemd that allows different bits of userspace to talk to each other. But it's got problems. A demonstration of the D-Bus problem is the recent Jeep hack by researchers Charlie Miller and Chris Valasek. The root problem was that D-Bus was openly (without authentication) accessible from the Internet. Likewise, the "AllJoyn" system for the "Internet of Things" opens up D-Bus on the home network. D-Bus indeed simplifies communication within userspace, but its philosophy is to put all your eggs in one basket, then drop the basket.


Personally, I have no opinion on systemd. I hate everything. Init was an uglier kludge, and systemd appears to be just as ugly, albeit for difference reasons. But, the amount of hate on both sides is so large that it needs to be trolled. The thing I troll most about is that one day, "systemd will replace Linux". As systemd replaces more and more of Linux userspace, and begins to drive kernel development, I think this joke will one day become true.


Saturday, August 29, 2015

No, this isn't good code

I saw this tweet go by. No, I don't think it's good code:




What this code is trying to solve is the "integer overflow" vulnerability. I don't think it solves the problem well.

The first problem is that the result is undefined. Some programmers will call safemulti_size_t() without checking the result. When they do, the code will behave differently depending on the previous value of *res. Instead, the code should return a defined value in this case, such as zero or SIZE_MAX. Knowing that this sort of thing will usually be used for memory allocations, which you want to have fail, then a good choice would be SIZE_MAX.

The worse problem is integer division. On today's Intel processors, integer multiplication takes a single clock cycle, but integer division takes between 40 and 100 clock cycles. Since you'll be usually dividing by small numbers, it's likely to be closer to 40 clock cycles rather than 100, but that's still really bad. If your solution to security problems is by imposing unacceptable tradeoffs, then you are doing security wrong. If you introduced this level of performance hit, then you might as well be programming in a safer language like JavaScript than in C.

An alternative would be the OpenBSD function reallocarray(), which I'm considering using in all my code as a replacement for malloc(), calloc(), and realloc(). It looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/*
 * This is sqrt(SIZE_MAX+1), as s1*s2 <= SIZE_MAX
 * if both s1 < MUL_NO_OVERFLOW and s2 < MUL_NO_OVERFLOW
 */
#define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4))

void *
reallocarray(void *optr, size_t nmemb, size_t size)
{
    if ((nmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) &&
        nmemb > 0 && SIZE_MAX / nmemb < size) {
            errno = ENOMEM;
            return NULL;
    }
    return realloc(optr, size * nmemb);
}

Firstly, it doesn't call the horrible integer division function (unless one of the parameters is larger than 2-gigs on a 32-bit processor). Secondly, it always has a defined result.

Personally, I would improve upon this function by simply calling a signal(). Virtually no code can recover from a bad memory allocation, so instead of returning NULL, it's better to crash right here.





Friday, August 28, 2015

On science literacy...

.
In this WIRED article, a scientifically illiterate writer explains "science literacy". It's as horrid as you'd expect. He preaches the Aristotelian version of science that Galileo proved wrong centuries ago. His thesis is that science isn't about knowing scientific facts, but being able to think scientifically. He then claims that thinking scientifically is all about building models of how the world works.

This is profoundly wrong. Science is about observation and experimental testing of theories.

For example, consider the following question. If you had two balls of the same size, one made of lead and the other made of wood, and you dropped them at the same time, which would hit the ground first (ignoring air resistance)? For thousands of years Aristotelian scientists claimed that heavier objects fell faster, purely by reasoning about the problem. It wasn't until the time of Galileo that scientists conducted the experiment and observed that these balls hit the ground at the same time. In other words, all objects fall at the same speed, regardless or size or weight (ignoring air resistance). Feathers fall as fast as lead on the moon. If you don't believe me, drop different objects from a building and observe for yourself.

Likewise, Aristotle taught that men had more teeth than women, you know, because that makes logical sense. Galileo first got in trouble with the "scientists" of his time by actually asking people to open their mouths and counting their teeth. As it turns out, men and women have the same number of teeth.

The point here is that science is based on observation, not pure reason. Doing science means either understanding the observations made by previous scientists (i.e. "facts") or making the observations yourself. Doing science means making predictions based on theories, then conducting experiments to see if the prediction is correct. There is no science without observation.

The WIRED writer poses a similar question about a fan pushing an object across a frictionless surface. It's a silly question because, presumably, we are supposed to assume air exists for the fan to work, but that air doesn't exist to slow things down. In any event, you can't really reason about this without first learning the scientific theories of "mass" and Newtonian equations like F=MA. These theories were developed based on observation. The writer demands that to "do science" means approaching this problem from an Aristotelian method of reasoning, divorced from previous scientific observations.

Similarly, he poses the question about the phases of the moon if it were a cube instead of a sphere. Well, this has complications. I doubt the face of the moon would appear to be a square, as my understanding of orbital mechanics suggests that it'd be a corner facing the earth instead of a square side (assuming it would stay tidally locked). But even assuming we got a cubic face, then there are still the problems of inclined orbits and libration. Finally, he poses the question right at the precise moment between when such a side would emerge from the shadow and become lit -- so therefore it's impossible to say whether the side would be dark or lit. It's stupid reasoning about this -- it's something that ought to be observed -- if only with a computer model. I guess the thing you ought to learn is that the entire face of the cube is either all light or all dark, unlike a sphere which gets partially lit

That WIRED writer says science is not about knowing the difference between a "planet" and a "dwarf planet" like Pluto. He's wrong. Pluto is much smaller than the 8 planets. Whereas the 8 planets have nearly circular orbits in the same plane, Pluto has a highly elliptical orbit that takes it sometimes inside the orbit of Neptune and far above the orbital plane of the other planets. Moreover, in recent years, we have observed many other Pluto-sized objects that share these same characteristics with Pluto (like "Eris", which is more massive than Pluto). Yes, the names that we give these things don't matter, but the observed differences matter a heck of a lot. Science is about knowing these observations. That we teach students the names of planets, but not what what we observe about them, is a travesty that leads to illiteracy.

Science is sadly politicized, such as with issues like Climate-Change/Global-Warming. We are expected to believe Science as some sort of religion, where the common people are unable to read the Latin Bible. We are not expected to understand things like "absorption spectra" or "thermal infrared".  To point out that scientific observations have shown that hurricanes haven't, in fact, gotten worse is considered heresy, because it denies computer models that claim hurricanes will get worse. Climate change is a problem we need to address, but with science rather than current scientific illiteracy and quasi-religious dogma.


Scientific literacy starts with understanding what science is, namely that it's based on observation, coming up with theories/hypotheses to explain the observations, then relentlessly testing those theories, trying to prove them wrong. Secondly, scientific literacy means learning the observations made by scientists over the last few hundred years. We don't have to come up with F=MA or the speed-of-light ourselves, but learn from previous scientists. Believing in Evolution doesn't make you scientifically literate, understanding radioisotope dating and rock strata does.

What this WIRED article highlights is that Aristotelian science illiteracy is so pervasive it even infects science writers at major publications. What you should do about this is pick up a book and try to cure your own illiteracy. Really, any high-school textbook should do.

Thursday, August 20, 2015

A lesson in BitTorrent

Hackers have now posted a second dump of Ashley-Madison, this time 20-gigabytes worth of data. Many, mostly journalists, are eagerly downloading this next dump. However, at the time of this writing, nobody has finished downloading it yet. None of the journalists have a complete copy, so you aren't seeing any new stories about the contents. It promises the full email spool of the CEO in the file name, but no journalist has yet looked into that mail spool and reported a story. Currently, the most any journalist has is 85% of the dump, slowly downloading the rest at 37-kilobytes/second.

Why is that? Is AshMad doing some sort of counter-attack to stop the downloaded (like Sony did)? Or is it overloaded because too many people are trying to download?

No, it's because it hasn't finished seeding.

BitTorrent is p2p (peer-to-peer). You download chunks from the peers, aka. the swarm, not the original source (the tracker). Instead of slowing down as more people join the swarm to download the file(s), BitTorrent downloads become faster -- the more people you can download from, the faster it goes.

But 9 women can't make a baby in 1 month. The same goes for BitTorrent. You can only download chunks from peers if they've got all the chunks. That's the current problem with the AshMad dump: everyone combined has only 85% of all possible chunks. The remaining 15% of the chunks haven't been uploaded to the swarm yet. Nobody has a complete copy. The original tracker is seeding at a rate of 37-kilobytes/second, handing off the next chunk to a random person in the swarm, who quickly exchanges it with everyone else in the swarm.

Thus, we see something like the following image, where everyone is stuck at 85% download:


It'll take many more hours until this is complete.

I point this out because it's a useful real-world lesson for BitTorrent. Peer-to-peer speeds up downloads in ideal cases, but it can't overcome physics. Physics, in this case, means that nobody yet has a complete 100% copy, so nobody else can download one.

AshMad is prostitution not adultery

The Ashley-Madison website advertises adultery, but that's a lie. I've talked to a lot of users of the site, and none of them used it to cheat on their spouse. Instead, they used it as just a "dating" site -- and even that is a misnomer, since "dating" often just means a legal way to meet prostitutes. According to several users, prostitutes are really the only females they'd consistently meet on Ashley-Madison.

In other words, Ashley-Madison is a prostitution website, not an adultery website. "Cheating" is just the hook, to communicate to the users that they should expect sex, but not a future spouse. And the website is upfront about charging for it.

I point this out because a lot of people have gone over-the-top on the adultery angle, such as this The Intercept piece. That's rather silly since Ashley-Madison wasn't really about adultery in the first place.







Wednesday, August 19, 2015

Trump is right about the 14th Amendment

Trump sucks all the intelligence out of the room, converting otherwise intelligent and educated pundits into blithering idiots. Today's example is the claim that Trump said:
"The 14th Amendment is unconstitutional."
Of course he didn't say that. What he did say is that the 14th Amendment doesn't obviously grant "birthright citizenship" to "anchor babies". And he's completely correct. The 14th Amendment says:
"All persons born or naturalized in the United States, and subject to the jurisdiction thereof, are citizens of the United States"
The complicated bit is in parentheses. If you remove that bit, then of course Trump would be wrong, since it would clearly say that being born in the U.S. grants citizenship.

But the phrase is there, so obviously some babies born in the U.S. aren't guaranteed (by the constitution) citizenship. Which babies are those?

The immigration law 8 U.S.C. § 1408(a) lists some of them: babies of ambassadors, heads of state, and military prisoners. [UPDATE: this appears wrong, I saw it in many Internet posts, but it appears to be untrue. But, it doesn't change the conclusion. I'll update this post again when I figure this out].

It's this law that currently grants babies citizenship, not the constitution. Laws can be changed by Congress. Presumably, "illegal aliens" could easily be added to the list.

This would be challenged, of course, and it'd probably work it's way up to the Supreme Court, at which point they'd rule definitively on whether the Constitution grants all babies citizenship. The point is simply that the Supreme Court hasn't ruled yet. Nobody can cite a Supreme Court decision clearly disproving Trump.

Thus, if you listen to Trump's remarks that everyone is criticizing, you'll see that he's right. Not all babies are granted citizenship (those of foreign ambassadors, heads of state, and military prisoners). Lots of legal scholars believe the same extends to babies of illegal aliens. There is a good chance the Supreme Court would rule in Trump's favor on the issue if current immigration law were changed. (And likewise, a good chance they'd rule against him).


My point is this. Trump is a filthy populist troll. Don't feed the trolls. No really, stop it. It's like Kansas farmer's advice: Never wrestle a pig. The pig loves it, and you'll just get muddy. Trump is going to say lots of crazy things. Just ignore him rather than descending to his level and saying crazy/dumb things back.





The closest we have to a Supreme Court decision on the matter is Plyler v. Doe, which deals with a separate issue. It's discussion of 'jurisdiction' could potentially apply to newborns.

The second closest is US v. Won Kim Ark, which (as the Wikipedia article says), many legal scholars do not think applies to illegal immigrants.


Notes on the Ashley-Madison dump

Ashley-Madison is a massive dating site that claims 40 million users. The site is specifically for those who want to cheat on their spouse. Recently, it was hacked. Yesterday, the hackers published the dumped data.

It appears legit. I asked my twitter followers for those who had created accounts. I have verified multiple users of the site, one of which was a throw-away account used only on the site. Assuming my followers aren't lying, this means the dump is confirmed. Update: one follower verified his last 4 digits of credit-card number and billing address was exposed.

It's over 36-million accounts. That's not quite what they claim, but it's pretty close. However, glancing through the data, it appears that a lot of the accounts are bogus, obviously made up things for people who just want to look at the site without creating a "real" account.

It's heavily men. I count 28-million men to 5 million woman, according to the "gender" field in the database (with 2-million undetermined). However, glancing through the credit-card transactions, I find only male names.

It's full account information. This includes full name, email, and password hash as you'd expect. It also includes dating information, like height, weight, and so forth. It appears to contain addresses, as well as GPS coordinates. I suspect that many people created fake accounts, but with an app that reported their real GPS coordinates.

Passwords hashed with bcrypt. Almost all the records appear to be protected with bcrypt. This is a refreshing change. Most of the time when we see big sites hacked, the passwords are protected either poorly (with MD5) or not at all (in "clear text", so that they can be immediately used to hack people). Hackers will be able to "crack" many of these passwords when users chose weak ones, but users who strong passwords are safe.

Maybe 250k deleted accounts. There are about 250k accounts that appear to have the password information removed. I don't know why, maybe it's accounts that have paid to be removed. Some are marked explicitly as such, others imply that.

Partial credit card data. It appears to have credit card transaction data -- but not the full credit card number. It does have full name and addresses, though. This is data that can "out" serious users of the site.

You can download everything via BitTorrent. The magnet number is
40ae8a90de40ca3afa763c8edb43fc1fc47d75f1. If you've got BitTorrent installed, you can use this to download the data. It's 9.7 gigabytes compressed, so you'll need a good Internet connection.

The hackers call themselves the "Impact Team". Their manifesto is here. They appear to be motivated by the immorality of adultery, but in all probability, their motivation is that #1 it's fun and #2 because they can. They probably used phishing, SQL injection, or re-used account credentials in order to break in.

They deserve some praise. Compared to other large breaches, it appears Ashley-Madison did a better job at cybersecurity. They tokenized credit card transactions and didn't store full credit card numbers. They hashed passwords correctly with bcrypt. They stored email addresses and passwords in separate tables, to make grabbing them (slightly) harder. Thus, this hasn't become a massive breach of passwords and credit-card numbers that other large breaches have lead to. They deserve praise for this.

Josh Duggar. This Gawker article appears correct from my reading of the data.

Some stories in the press:
http://www.wired.com/2015/08/happened-hackers-posted-stolen-ashley-madison-data/
http://arstechnica.com/security/2015/08/data-from-hack-of-ashley-madison-cheater-site-purportedly-dumped-online/
http://fusion.net/story/184982/heres-what-we-know-about-the-ashley-madison-hack/




Thursday, July 30, 2015

A quick review of the BIND9 code

BIND9 is the oldest and most popular DNS server. Today, they announced a DoS vulnerability was announced that would crash the server with a simply crafted query.  I could use my "masscan" tool to blanket the Internet with those packets and crash all publicly facing BIND9 DNS servers in about an hour. A single vuln doesn't mean much, but if you look at the recent BIND9 vulns, you see a pattern forming. BIND9 has lots of problems -- problems that critical infrastructure software should not have.


Its biggest problem is that it has too many feature. It attempts to implement every possible DNS feature known to man, few of which are needed on publicly facing servers. Today's bug was in the rarely used "TKEY" feature, for example. DNS servers exposed to the public should have the minimum number of features -- the server priding itself on having the maximum number of features is automatically disqualified.

Another problem is that DNS itself has some outdated design issues. The control-plane and data-plane need to be separate. This bug is in the control-plane code, but it's exploited from the data-plane. (Data-plane is queries from the Internet looking up names, control-plane is zones updates, key distribution, and configuration). The control-plane should be on a separate network adapter, separate network address, and separate port numbers. These should be hidden from the public, and protected by a firewall.

DNS should have hidden masters, servers with lots of rich functionality, such as automatic DNSSEC zone signing. It should have lightweight exposed slaves, with just enough code to answer queries on the data-plane, and keep synchronized with the master on the control-plane.

But what this post is really about is looking at BIND9's code. It's a nicer than the OpenSSL code and some other open-source projects, but there do appear to be some issues. The bug was in the "dns_message_findname()" function. The function header looks like:

isc_result_t
dns_message_findname(dns_message_t *msg, dns_section_t section,
    dns_name_t *target, dns_rdatatype_t type,
    dns_rdatatype_t covers, dns_name_t **name,
    dns_rdataset_t **rdataset);

The thing you should notice here is that none of the variables are prefixed with const, even though all but one of them should be. A quick grep shows that lack of const correctness is pretty common throughout the BIND9 source code. Every quality guide in the world strongly suggests const correctness -- that's it's lacking here hints at larger problems.

The bug was an assertion failure on the "name" parameter in the code above, as you can see in the picture. An assertion is supposed to double-check internal consistency of data, to catch bugs early. But this case, there was no bug being caught -- it was the assertion itself that was the problem. The programmers are confused by the difference between in, out, and in/out parameters. You assert on the expected values of the in and in/out parameters, but not on write-only out parameters. Since the function doesn't read them, their value is immaterial. If the function wants it to be NULL on input, it can just set it itself -- demanding that the caller do this is just bad.

By the way, assertions are normally enabled only for testing, but not for production code. That's because they can introduce bugs (as in this case), and have performance problems. However, in the long run, aggressive double-checking leads to more reliable code. Thus, I'm a fan of such aggressive checking. However, quickly glancing at the recent BIND9 vulns, it appears many of them are caused by assertions failing. This may be good, meaning that the code was going to crash (or get exploited) anyway, and the assertion caught it early. Or, it may be bad, with the assertion being the bug itself, or at least, that the user would've been happier without the assertion triggering (because of a memory leak, for example). If the later is the case, then it sounds like people should just turn off the assertions when building BIND9 (it's a single command-line switch).

Last year, ISC (the organization that maintains BIND9) finished up their BIND10 project, which was to be a re-write of the code. This was a fiasco, of course. Rewrites of large software project are doomed to failure. The only path forward for BIND is with the current code-base. This means refactoring and cleaning up technical debt on a regular basis, such as fixing the const correctness problem. This means arbitrarily deciding to drop support for 1990s era computers when necessary. If the architecture needs to change (such as separating the data-plane from the control-plane), it can be done within the current code-base -- just create a solid regression test, then go wild on the changes relying upon the regression test to maintain the quality.

Lastly, I want to comment on the speed of BIND9. It's dog slow -- the slowest of all the DNS servers. That's a problem firstly because slow servers should not be exposed to DDoS attacks on the Internet. It's a problem secondly because slow servers should not be written in dangerous languages like C/C++ . These languages should only be used when speed is critical. If your code isn't fast anyway, then you should be using safe languages, like C#, Java, or JavaScript. A DNS server written in these languages is unlikely to be any slower than BIND9.

Conclusion

The point I'm trying to make here is that BIND9 should not be exposed to the public. It has code problems that should be unacceptable in this day and age of cybersecurity. Even if it were written perfectly, it has far too many features to be trustworthy. It's feature-richness makes it a great hidden master, it's just all those feature get in the way of it being a simple authoritative slave server, or a simple resolver. They shouldn't rewrite it from scratch, but if they did, they should choose a safe language and not use C/C++.




Example#2: strcpy()

BIND9 has 245 instances of the horribly unsafe strcpy() function, spread through 94 files. This is unacceptable -- yet another technical debt they need to fix. It needs to be replaced with the strcpy_s() function.

In the file lwresutil.c is an example of flawed thinking around strcpy(). It's not an exploitable bug, at least not yet, but it's still flawed.

lwres_getaddrsbyname(...)
{ unsigned int target_length;

target_length = strlen(name);
if (target_length >= sizeof(target_name))
return (LWRES_R_FAILURE);
strcpy(target_name, name); /* strcpy is safe */
}

The problem here, which I highlighted in bold. The problem is that on a 64-bit machine, an unsigned int is only 32-bits, but string lengths can be longer than a 32-bit value can hold. Thus, a 4-billion byte name would cause the integer to overflow and the length check to fail. I don't think you can get any name longer than 256 bytes through this code path, so it's likely not vulnerable now, but the "4-billion bytes of data" problem is pretty common in other code, and frequently exploitable in practice.

The comment /* strcpy is safe */ is no more accurate than those emails that claim "Checked by anti-virus".

Modern code should never use strcpy(), at all, under any circumstances, not even in the unit-test code where it doesn't matter. It's easy to manage projects by simply grepping for the string "strcpy()" and whether it exists or not, it's hard managing project with some strcpy()s. It's like being some pregnant.






Wednesday, July 22, 2015

Infosec's inability to quantify risk

Infosec isn't a real profession. Among the things missing is proper "risk analysis". Instead of quantifying risk, we treat it as an absolute. Risk is binary, either there is risk or there isn't. We respond to risk emotionally rather than rationally, claiming all risk needs to be removed. This is why nobody listens to us. Business leaders quantify and prioritize risk, but we don't, so our useless advice is ignored.

An example of this is the car hacking stunt by Charlie Miller and Chris Valasek, where they turned off the engine at freeway speeds. This has lead to an outcry of criticism in our community from people who haven't quantified the risk. Any rational measure of the risk of that stunt is that it's pretty small -- while the benefits are very large.

In college, I owned a poorly maintained VW bug that would occasionally lose power on the freeway, such as from an electrical connection falling off from vibration. I caused more risk by not maintaining my car than these security researchers did.

Indeed, cars losing power on the freeway is a rather common occurrence. We often see cars on the side of the road. Few accidents are caused by such cars. Sure, they add risk, but so do people abruptly changing lanes.

No human is a perfect driver. Every time we get into our cars, instead of cycling or taking public transportation, we add risk to those around us. The majority of those criticizing this hacking stunt have caused more risk to other drivers this last year by commuting to work. They cause this risk not for some high ideal of improving infosec, but merely for personal convenience. Infosec is legendary for it's hypocrisy, this is just one more example.

Google, Tesla, and other companies are creating "self driving cars". Self-driving cars will always struggle to cope with unpredictable human drivers, and will occasionally cause accidents. However, in the long run, self-driving cars will be vastly safer. To reach that point, we need to quantify risk. We need to be able to show that for every life lost due to self-driving cars, two have been saved because they are inherently safer. But here's the thing, if we use the immature risk analysis from the infosec "profession", we'll always point to the one life lost, and never quantify the two lives saved. Using infosec risk analysis, safer self-driving cars will never happen.

In hindsight, it's obvious to everyone that Valasek and Miller went too far. Renting a track for a few hours costs less than the plane ticket for the journalist to come out and visit them. Infosec is like a pride of lions, that'll leap and devour one of their members when they show a sign of weakness. This minor mistake is weakness, so many in infosec have jumped on the pair, reveling in righteous rage. But any rational quantification of the risks show that the mistake is minor, compared to the huge benefit of their research. I, for one, praise these two, and hope they continue their research -- knowing full well that they'll likely continue to make other sorts of minor mistakes in the future.