Friday, August 28, 2015

On science literacy...

In this WIRED article, a scientifically illiterate writer explains "science literacy". It's as horrid as you'd expect. He preaches the Aristotelian version of science that Galileo proved wrong centuries ago. His thesis is that science isn't about knowing scientific facts, but being able to think scientifically. He then claims that thinking scientifically is all about building models of how the world works.

This is profoundly wrong. Science is about observation and experimental testing of theories.

For example, consider the following question. If you had two balls of the same size, one made of lead and the other made of wood, and you dropped them at the same time, which would hit the ground first (ignoring air resistance)? For thousands of years Aristotelian scientists claimed that heavier objects fell faster, purely by reasoning about the problem. It wasn't until the time of Galileo that scientists conducted the experiment and observed that these balls hit the ground at the same time. In other words, all objects fall at the same speed, regardless or size or weight (ignoring air resistance). Feathers fall as fast as lead on the moon. If you don't believe me, drop different objects from a building and observe for yourself.

The point here is that science is based on observation, not pure reason. Doing science means either understanding the observations made by previous scientists (i.e. "facts") or making the observations yourself. Doing science means making predictions based on theories, then conducting experiments to see if the prediction is correct. There is no science without observation.

The WIRED writer poses a similar question about a fan pushing an object across a frictionless surface. It's a silly question because, presumably, we are supposed to assume air exists for the fan to work, but that air doesn't exist to slow things down. In any event, you can't really reason about this without first learning the scientific theories of "mass" and Newtonian equations like F=MA, both based on observation. The writer demands that to "do science" means approaching this problem from an Aristotelian method of reasoning, divorced from previous scientific knowledge.

Similarly, he poses the question about the phases of the moon if it were a cube instead of a sphere. Well, this has complications. I doubt the face of the moon would appear to be a square, as my understanding of orbital mechanics suggests that it'd be a corner facing the earth instead of a square side (assuming it would stay tidally locked). But even assuming we got a cubic face, then there are still the problems of inclined orbits and libration. Finally, he poses the question right at the precise moment between when such a side would emerge from the shadow and become lit -- so therefore it's impossible to say whether the side would be dark or lit. It's stupid reasoning about this -- it's something that ought to be observed -- if only with a computer model. I guess the thing you ought to learn is that the entire face of the cube is either all light or all dark, unlike a sphere which gets partially lit

That WIRED writer says science is not about knowing the difference between a "planet" and a "dwarf planet" like Pluto. He's wrong. Pluto is much smaller than the 8 planets. Whereas the 8 planets have nearly circular orbits in the same plane, Pluto has a highly elliptical orbit that takes it sometimes inside the orbit of Neptune and far above the orbital plane of the other planets. Moreover, in recent years, we have observed many other Pluto-sized objects that share these same characteristics with Pluto. Yes, the names that we give these things don't matter, but the observed differences matter a heck of a lot. Science is about knowing these observations. That we teach students the names of planets, but not what what we observe about them, is a travesty that leads to illiteracy.

Science is sadly politicized, such as with issues like Climate-Change/Global-Warming. We are expected to believe Science as some sort of religion, where the common people are unable to read the Latin Bible. We are not expected to understand things like "absorption spectra" or "thermal infrared".  To point out that scientific observations have shown that hurricanes haven't, in fact, gotten worse is considered heresy, because it denies computer models that claim hurricanes will get worse. Climate change is a problem we need to address, but with science rather than current scientific illiteracy.

Scientific literacy starts with understanding what science is, namely that it's based on observation, coming up with theories/hypotheses to explain the observations, then relentlessly testing those theories, trying to prove them wrong. Secondly, scientific literacy means learning the observations made by scientists over the last few hundred years. Believing in Evolution doesn't make you scientifically literate, understanding understand radioisotope dating and rock strata does.

What this WIRED article highlights is that science illiteracy is so pervasive it even infects science writers at major publications. What you should do about this is pick up a book and try to cure your own illiteracy.

Thursday, August 20, 2015

A lesson in BitTorrent

Hackers have now posted a second dump of Ashley-Madison, this time 20-gigabytes worth of data. Many, mostly journalists, are eagerly downloading this next dump. However, at the time of this writing, nobody has finished downloading it yet. None of the journalists have a complete copy, so you aren't seeing any new stories about the contents. It promises the full email spool of the CEO in the file name, but no journalist has yet looked into that mail spool and reported a story. Currently, the most any journalist has is 85% of the dump, slowly downloading the rest at 37-kilobytes/second.

Why is that? Is AshMad doing some sort of counter-attack to stop the downloaded (like Sony did)? Or is it overloaded because too many people are trying to download?

No, it's because it hasn't finished seeding.

BitTorrent is p2p (peer-to-peer). You download chunks from the peers, aka. the swarm, not the original source (the tracker). Instead of slowing down as more people join the swarm to download the file(s), BitTorrent downloads become faster -- the more people you can download from, the faster it goes.

But 9 women can't make a baby in 1 month. The same goes for BitTorrent. You can only download chunks from peers if they've got all the chunks. That's the current problem with the AshMad dump: everyone combined has only 85% of all possible chunks. The remaining 15% of the chunks haven't been uploaded to the swarm yet. Nobody has a complete copy. The original tracker is seeding at a rate of 37-kilobytes/second, handing off the next chunk to a random person in the swarm, who quickly exchanges it with everyone else in the swarm.

Thus, we see something like the following image, where everyone is stuck at 85% download:

It'll take many more hours until this is complete.

I point this out because it's a useful real-world lesson for BitTorrent. Peer-to-peer speeds up downloads in ideal cases, but it can't overcome physics. Physics, in this case, means that nobody yet has a complete 100% copy, so nobody else can download one.

AshMad is prostitution not adultery

The Ashley-Madison website advertises adultery, but that's a lie. I've talked to a lot of users of the site, and none of them used it to cheat on their spouse. Instead, they used it as just a "dating" site -- and even that is a misnomer, since "dating" often just means a legal way to meet prostitutes. According to several users, prostitutes are really the only females they'd consistently meet on Ashley-Madison.

In other words, Ashley-Madison is a prostitution website, not an adultery website. "Cheating" is just the hook, to communicate to the users that they should expect sex, but not a future spouse. And the website is upfront about charging for it.

I point this out because a lot of people have gone over-the-top on the adultery angle, such as this The Intercept piece. That's rather silly since Ashley-Madison wasn't really about adultery in the first place.

Wednesday, August 19, 2015

Trump is right about the 14th Amendment

Trump sucks all the intelligence out of the room, converting otherwise intelligent and educated pundits into blithering idiots. Today's example is the claim that Trump said:
"The 14th Amendment is unconstitutional."
Of course he didn't say that. What he did say is that the 14th Amendment doesn't obviously grant "birthright citizenship" to "anchor babies". And he's completely correct. The 14th Amendment says:
"All persons born or naturalized in the United States, and subject to the jurisdiction thereof, are citizens of the United States"
The complicated bit is in parentheses. If you remove that bit, then of course Trump would be wrong, since it would clearly say that being born in the U.S. grants citizenship.

But the phrase is there, so obviously some babies born in the U.S. aren't guaranteed (by the constitution) citizenship. Which babies are those?

The immigration law 8 U.S.C. § 1408(a) lists some of them: babies of ambassadors, heads of state, and military prisoners. [UPDATE: this appears wrong, I saw it in many Internet posts, but it appears to be untrue. But, it doesn't change the conclusion. I'll update this post again when I figure this out].

It's this law that currently grants babies citizenship, not the constitution. Laws can be changed by Congress. Presumably, "illegal aliens" could easily be added to the list.

This would be challenged, of course, and it'd probably work it's way up to the Supreme Court, at which point they'd rule definitively on whether the Constitution grants all babies citizenship. The point is simply that the Supreme Court hasn't ruled yet. Nobody can cite a Supreme Court decision clearly disproving Trump.

Thus, if you listen to Trump's remarks that everyone is criticizing, you'll see that he's right. Not all babies are granted citizenship (those of foreign ambassadors, heads of state, and military prisoners). Lots of legal scholars believe the same extends to babies of illegal aliens. There is a good chance the Supreme Court would rule in Trump's favor on the issue if current immigration law were changed. (And likewise, a good chance they'd rule against him).

My point is this. Trump is a filthy populist troll. Don't feed the trolls. No really, stop it. It's like Kansas farmer's advice: Never wrestle a pig. The pig loves it, and you'll just get muddy. Trump is going to say lots of crazy things. Just ignore him rather than descending to his level and saying crazy/dumb things back.

The closest we have to a Supreme Court decision on the matter is Plyler v. Doe, which deals with a separate issue. It's discussion of 'jurisdiction' could potentially apply to newborns.

The second closest is US v. Won Kim Ark, which (as the Wikipedia article says), many legal scholars do not think applies to illegal immigrants.

Notes on the Ashley-Madison dump

Ashley-Madison is a massive dating site that claims 40 million users. The site is specifically for those who want to cheat on their spouse. Recently, it was hacked. Yesterday, the hackers published the dumped data.

It appears legit. I asked my twitter followers for those who had created accounts. I have verified multiple users of the site, one of which was a throw-away account used only on the site. Assuming my followers aren't lying, this means the dump is confirmed. Update: one follower verified his last 4 digits of credit-card number and billing address was exposed.

It's over 36-million accounts. That's not quite what they claim, but it's pretty close. However, glancing through the data, it appears that a lot of the accounts are bogus, obviously made up things for people who just want to look at the site without creating a "real" account.

It's heavily men. I count 28-million men to 5 million woman, according to the "gender" field in the database (with 2-million undetermined). However, glancing through the credit-card transactions, I find only male names.

It's full account information. This includes full name, email, and password hash as you'd expect. It also includes dating information, like height, weight, and so forth. It appears to contain addresses, as well as GPS coordinates. I suspect that many people created fake accounts, but with an app that reported their real GPS coordinates.

Passwords hashed with bcrypt. Almost all the records appear to be protected with bcrypt. This is a refreshing change. Most of the time when we see big sites hacked, the passwords are protected either poorly (with MD5) or not at all (in "clear text", so that they can be immediately used to hack people). Hackers will be able to "crack" many of these passwords when users chose weak ones, but users who strong passwords are safe.

Maybe 250k deleted accounts. There are about 250k accounts that appear to have the password information removed. I don't know why, maybe it's accounts that have paid to be removed. Some are marked explicitly as such, others imply that.

Partial credit card data. It appears to have credit card transaction data -- but not the full credit card number. It does have full name and addresses, though. This is data that can "out" serious users of the site.

You can download everything via BitTorrent. The magnet number is
40ae8a90de40ca3afa763c8edb43fc1fc47d75f1. If you've got BitTorrent installed, you can use this to download the data. It's 9.7 gigabytes compressed, so you'll need a good Internet connection.

The hackers call themselves the "Impact Team". Their manifesto is here. They appear to be motivated by the immorality of adultery, but in all probability, their motivation is that #1 it's fun and #2 because they can. They probably used phishing, SQL injection, or re-used account credentials in order to break in.

They deserve some praise. Compared to other large breaches, it appears Ashley-Madison did a better job at cybersecurity. They tokenized credit card transactions and didn't store full credit card numbers. They hashed passwords correctly with bcrypt. They stored email addresses and passwords in separate tables, to make grabbing them (slightly) harder. Thus, this hasn't become a massive breach of passwords and credit-card numbers that other large breaches have lead to. They deserve praise for this.

Josh Duggar. This Gawker article appears correct from my reading of the data.

Some stories in the press:

Thursday, July 30, 2015

A quick review of the BIND9 code

BIND9 is the oldest and most popular DNS server. Today, they announced a DoS vulnerability was announced that would crash the server with a simply crafted query.  I could use my "masscan" tool to blanket the Internet with those packets and crash all publicly facing BIND9 DNS servers in about an hour. A single vuln doesn't mean much, but if you look at the recent BIND9 vulns, you see a pattern forming. BIND9 has lots of problems -- problems that critical infrastructure software should not have.

Its biggest problem is that it has too many feature. It attempts to implement every possible DNS feature known to man, few of which are needed on publicly facing servers. Today's bug was in the rarely used "TKEY" feature, for example. DNS servers exposed to the public should have the minimum number of features -- the server priding itself on having the maximum number of features is automatically disqualified.

Another problem is that DNS itself has some outdated design issues. The control-plane and data-plane need to be separate. This bug is in the control-plane code, but it's exploited from the data-plane. (Data-plane is queries from the Internet looking up names, control-plane is zones updates, key distribution, and configuration). The control-plane should be on a separate network adapter, separate network address, and separate port numbers. These should be hidden from the public, and protected by a firewall.

DNS should have hidden masters, servers with lots of rich functionality, such as automatic DNSSEC zone signing. It should have lightweight exposed slaves, with just enough code to answer queries on the data-plane, and keep synchronized with the master on the control-plane.

But what this post is really about is looking at BIND9's code. It's a nicer than the OpenSSL code and some other open-source projects, but there do appear to be some issues. The bug was in the "dns_message_findname()" function. The function header looks like:

dns_message_findname(dns_message_t *msg, dns_section_t section,
    dns_name_t *target, dns_rdatatype_t type,
    dns_rdatatype_t covers, dns_name_t **name,
    dns_rdataset_t **rdataset);

The thing you should notice here is that none of the variables are prefixed with const, even though all but one of them should be. A quick grep shows that lack of const correctness is pretty common throughout the BIND9 source code. Every quality guide in the world strongly suggests const correctness -- that's it's lacking here hints at larger problems.

The bug was an assertion failure on the "name" parameter in the code above, as you can see in the picture. An assertion is supposed to double-check internal consistency of data, to catch bugs early. But this case, there was no bug being caught -- it was the assertion itself that was the problem. The programmers are confused by the difference between in, out, and in/out parameters. You assert on the expected values of the in and in/out parameters, but not on write-only out parameters. Since the function doesn't read them, their value is immaterial. If the function wants it to be NULL on input, it can just set it itself -- demanding that the caller do this is just bad.

By the way, assertions are normally enabled only for testing, but not for production code. That's because they can introduce bugs (as in this case), and have performance problems. However, in the long run, aggressive double-checking leads to more reliable code. Thus, I'm a fan of such aggressive checking. However, quickly glancing at the recent BIND9 vulns, it appears many of them are caused by assertions failing. This may be good, meaning that the code was going to crash (or get exploited) anyway, and the assertion caught it early. Or, it may be bad, with the assertion being the bug itself, or at least, that the user would've been happier without the assertion triggering (because of a memory leak, for example). If the later is the case, then it sounds like people should just turn off the assertions when building BIND9 (it's a single command-line switch).

Last year, ISC (the organization that maintains BIND9) finished up their BIND10 project, which was to be a re-write of the code. This was a fiasco, of course. Rewrites of large software project are doomed to failure. The only path forward for BIND is with the current code-base. This means refactoring and cleaning up technical debt on a regular basis, such as fixing the const correctness problem. This means arbitrarily deciding to drop support for 1990s era computers when necessary. If the architecture needs to change (such as separating the data-plane from the control-plane), it can be done within the current code-base -- just create a solid regression test, then go wild on the changes relying upon the regression test to maintain the quality.

Lastly, I want to comment on the speed of BIND9. It's dog slow -- the slowest of all the DNS servers. That's a problem firstly because slow servers should not be exposed to DDoS attacks on the Internet. It's a problem secondly because slow servers should not be written in dangerous languages like C/C++ . These languages should only be used when speed is critical. If your code isn't fast anyway, then you should be using safe languages, like C#, Java, or JavaScript. A DNS server written in these languages is unlikely to be any slower than BIND9.


The point I'm trying to make here is that BIND9 should not be exposed to the public. It has code problems that should be unacceptable in this day and age of cybersecurity. Even if it were written perfectly, it has far too many features to be trustworthy. It's feature-richness makes it a great hidden master, it's just all those feature get in the way of it being a simple authoritative slave server, or a simple resolver. They shouldn't rewrite it from scratch, but if they did, they should choose a safe language and not use C/C++.

Example#2: strcpy()

BIND9 has 245 instances of the horribly unsafe strcpy() function, spread through 94 files. This is unacceptable -- yet another technical debt they need to fix. It needs to be replaced with the strcpy_s() function.

In the file lwresutil.c is an example of flawed thinking around strcpy(). It's not an exploitable bug, at least not yet, but it's still flawed.

{ unsigned int target_length;

target_length = strlen(name);
if (target_length >= sizeof(target_name))
strcpy(target_name, name); /* strcpy is safe */

The problem here, which I highlighted in bold. The problem is that on a 64-bit machine, an unsigned int is only 32-bits, but string lengths can be longer than a 32-bit value can hold. Thus, a 4-billion byte name would cause the integer to overflow and the length check to fail. I don't think you can get any name longer than 256 bytes through this code path, so it's likely not vulnerable now, but the "4-billion bytes of data" problem is pretty common in other code, and frequently exploitable in practice.

The comment /* strcpy is safe */ is no more accurate than those emails that claim "Checked by anti-virus".

Modern code should never use strcpy(), at all, under any circumstances, not even in the unit-test code where it doesn't matter. It's easy to manage projects by simply grepping for the string "strcpy()" and whether it exists or not, it's hard managing project with some strcpy()s. It's like being some pregnant.

Wednesday, July 22, 2015

Infosec's inability to quantify risk

Infosec isn't a real profession. Among the things missing is proper "risk analysis". Instead of quantifying risk, we treat it as an absolute. Risk is binary, either there is risk or there isn't. We respond to risk emotionally rather than rationally, claiming all risk needs to be removed. This is why nobody listens to us. Business leaders quantify and prioritize risk, but we don't, so our useless advice is ignored.

An example of this is the car hacking stunt by Charlie Miller and Chris Valasek, where they turned off the engine at freeway speeds. This has lead to an outcry of criticism in our community from people who haven't quantified the risk. Any rational measure of the risk of that stunt is that it's pretty small -- while the benefits are very large.

In college, I owned a poorly maintained VW bug that would occasionally lose power on the freeway, such as from an electrical connection falling off from vibration. I caused more risk by not maintaining my car than these security researchers did.

Indeed, cars losing power on the freeway is a rather common occurrence. We often see cars on the side of the road. Few accidents are caused by such cars. Sure, they add risk, but so do people abruptly changing lanes.

No human is a perfect driver. Every time we get into our cars, instead of cycling or taking public transportation, we add risk to those around us. The majority of those criticizing this hacking stunt have caused more risk to other drivers this last year by commuting to work. They cause this risk not for some high ideal of improving infosec, but merely for personal convenience. Infosec is legendary for it's hypocrisy, this is just one more example.

Google, Tesla, and other companies are creating "self driving cars". Self-driving cars will always struggle to cope with unpredictable human drivers, and will occasionally cause accidents. However, in the long run, self-driving cars will be vastly safer. To reach that point, we need to quantify risk. We need to be able to show that for every life lost due to self-driving cars, two have been saved because they are inherently safer. But here's the thing, if we use the immature risk analysis from the infosec "profession", we'll always point to the one life lost, and never quantify the two lives saved. Using infosec risk analysis, safer self-driving cars will never happen.

In hindsight, it's obvious to everyone that Valasek and Miller went too far. Renting a track for a few hours costs less than the plane ticket for the journalist to come out and visit them. Infosec is like a pride of lions, that'll leap and devour one of their members when they show a sign of weakness. This minor mistake is weakness, so many in infosec have jumped on the pair, reveling in righteous rage. But any rational quantification of the risks show that the mistake is minor, compared to the huge benefit of their research. I, for one, praise these two, and hope they continue their research -- knowing full well that they'll likely continue to make other sorts of minor mistakes in the future.

Monday, July 20, 2015

My BIS/Wassenaar comment

This is my comment I submitted to the BIS on their Wassenaar rules:


I created the first “intrusion prevention system”, as well as many tools and much cybersecurity research over the last 20 years. I would not have done so had these rules been in place. The cost and dangers would have been too high. If you do not roll back the existing language, I will be forced to do something else.

After two months, reading your FAQ, consulting with lawyers and export experts, the cybersecurity industry still hasn’t figured out precisely what your rules mean. The language is so open-ended that it appears to control everything. My latest project is a simple “DNS server”, a piece of software wholly unrelated to cybersecurity. Yet, since hackers exploit “DNS” for malware command-and-control, it appears to be covered by your rules. It’s specifically designed for both the distribution and control of malware. This isn’t my intent, it’s just a consequence of how “DNS” works. I haven’t decided whether to make this tool open-source yet, so therefore traveling to foreign countries with the code on my laptop appears to be a felony violation of export controls.

Of course you don’t intend to criminalize this behavior, but that isn’t the point. The point is that the rules are so vague that they become impossible for anybody to know exactly what is prohibited. We therefore have to take the conservative approach. As we’ve seen with other vague laws, such as the CFAA, enforcement is arbitrary and discriminatory. None of us would have believed that downloading files published on a public website would be illegal until a member of community was convicted under the CFAA for doing it. None of us wants to be a similar test case for export controls. The current BIS rules are so open-ended that they would have a powerful chilling effect on our industry.

The solution, though, isn’t to clarify the rules, but to roll them back. You can’t clarify the difference between good/bad software because there is no difference between offensive and defensive tools -- just the people who use them. The best way to secure your network is to attack it yourself. For example, my “masscan” tool quickly scans large networks for vulnerabilities like “Heartbleed”. Defenders use it to quickly find vulnerable systems, to patch them. But hackers also use my tool to find vulnerable systems to hack them. There is no solution that stops bad governments from buying “intrusion” or “surveillance” software that doesn’t also stop their victims from buying software to protect themselves. Export controls on offensive software means export controls on defensive software. Export controls mean the Sudanese and Ethiopian people can no longer defend themselves from their own governments.

Wassenaar was intended to stop “proliferation” and “destabilization”, yet intrusion/surveillance software is neither of those. Human rights activists have hijacked the arrangement for their own purposes. This is a good purpose, of course, since these regimes are evil. It’s just that Wassenaar is the wrong way to do this, with a disproportionate impact on legitimate industry, while at the same time, hurting the very people it’s designed to help. Likewise, your own interpretation of Wassenaar seems to have been hijacked by the intelligence community in the United States for their own purposes to control “0days”.

Rather than the current open-end and vague interpretation of the Wassenaar changes, you must do the opposite, and create the narrowest of interpretations. Better yet, you need to go back and renegotiate the rules with the other Wassenaar members, as software is not a legitimate target of Wassenaar control. Computer code is not a weapon, if you make it one, then you’ll destroy America’s standing in the world. On a personal note, if you don’t drastically narrow this, my research and development will change. Either I will stay in this country and do something else, or I will move out of this country (despite being a fervent patriot).

Robert Graham
Creator of BlackICE, sidejacking, and masscan.
Frequent speaker at cybersecurity conferences.

Wednesday, July 15, 2015

Software and the bogeyman

This post about the July 8 glitches (United, NYSE, WSJ failed) keeps popping up in my Twitter timeline. It's complete nonsense.

What's being argued here is that these glitches were due to some sort of "moral weakness", like laziness, politics, or stupidity. It's a facile and appealing argument, so scoundrels make it often -- to great applause from the audience. But it's not true.


Layers and legacies exist because working systems are precious. More than half of big software projects are abandoned, because getting new things to work is a hard task. We place so much value on legacy, working old systems, because the new replacements usually fail.

An example of this is the failed BIND10 project. BIND, the Berkeley Internet Name Daemon, is the oldest and most popular DNS server. It is the de facto reference standard for how DNS works, more so than the actual RFCs. Version 9 of the project is 15 years old. Therefore, the consortium that maintains it funded development for version 10. They completed the project, then effectively abandoned it, as it was worse in almost every way than the previous version.

The reason legacy works well is the enormous regression testing that goes on. In robust projects, every time there is a bug, engineers create one or more tests to exercise the bug, then add that to the automated testing, so that from now on, that bug (or something similar) can never happen again. You can look at a project like BIND9 and point to the thousands of bugs it's had over the last decade. So many bugs might make you think it's bad, but the opposite is true: it means that it's got an enormous regression test system that stresses the system in peculiar ways. A new replacement will have just as many bugs -- but no robust test that will find them.

A regression test is often more important than the actual code. If you want to build a replacement project, start with the old regression test. If you are a software company and want to steal your competitors intellectual property, ignore their source, steal their regression test instead.

People look at the problems of legacy and believe that we'd be better off without it, if only we had the will (the moral strength) to do the right thing and replace old system. That's rarely true. Legacy is what's reliable and working -- it's new stuff that ultimately is untrustworthy and likely to break. You should worship legacy, not fear it.

Technical debt

Almost all uses of the phrase "technical debt" call it a bad thing. The opposite is true. The word was coined to refer to a good thing.

The analogy is financial debt. That, too, is used incorrectly as a pejorative. People focus on the negatives, the tiny percentage of bankruptcies. They don't focus on the positives, what that debt finances, like factories, roads, education, and so on. Our economic system is "capitalism", where "capital" just means "debt". The dollar bills in your wallet are a form of debt. When you contribute to society, they are indebted to you, so give you a marker, which you can then redeem by giving back to society in exchange something that you want, like a beer at your local bar.

The same is true of technical debt. It's a good thing, a necessary thing. The reason we talk about technical debt isn't so that we can get rid of it, but so that we can keep track of it and exploit it.

The Medium story claims:
A lot of new code is written very very fast, because that’s what the intersection of the current wave of software development (and the angel investor / venture capital model of funding) in Silicon Valley compels people to do.
This is nonsense. Every software project of every type has technical debt. Indeed, it's open-source that overwhelmingly has the most technical debt. Most open-source software starts as somebody wanting to solve a small problem now. If people like the project, then it survives, and more code and features are added. If people don't like it, the project disappears. By sheer evolution, that which survives has technical debt. Sure, some projects are better than others at going back and cleaning up their debt, but it's something intrinsic to all software engineering.

Figuring out what user's want is 90% of the problem, how the code works is only 10%. Most software fails because nobody wants to use it. Focusing on removing technical debt, investing many times more effort in creating the code, just magnifies the cost of failure when your code still doesn't do what users want. The overwhelmingly correct development methodology is to incur lots of technical debt at the start of every project.

Technical debt isn't about bugs. People like to equate the two, as both are seen as symptoms of moral weakness. Instead, technical debt is about the fact that fixing bugs (or adding features) is more expensive the more technical debt you have. If a section of the code is bug-free, and unlikely to be extended to the future, then there will be no payback for cleaning up the technical debt. On the other hand, if you are constantly struggling with a core piece of code, making lots of changes to it, then you should refactor it, cleaning up the technical debt so that you can make changes to it.

In summary, technical debt is not some sort of weakness in code that needs to be fought, but merely an aspect of code that needs to be managed.


More and more, software ends up interacting with other software. This causes unexpected things to happen.

That's true, but the alternative is worse. As a software engineer building a system, you can either link together existing bits of code, or try to "reinvent the wheel" and write those bits yourself. Reinventing is sometimes good, because you get something tailored for your purpose without all the unnecessary complexity. But more often you experience the rewriting problem I describe above: your new code is untested and buggy, as opposed to the well-tested, robust, albeit complex module that you avoided.

The reality of complexity is that we demand it of software. We all want Internet-connected lightbulbs in our homes that we can turn on/off with a smartphone app while vacationing in Mongolia. This demands a certain level of complexity. We like such complexity -- arguing that we should get rid of it and go back to a simpler time of banging rocks together is unacceptable.

When you look at why glitches at United, NYSE, and WSJ happen, it because once they've got a nice robust system working, they can't resist adding more features to it. It's like bridges. Over decades, bridge builders get more creative and less conservative. Then a bridge fails, because builders were to aggressive, and the entire industry moves back into becoming more conservative, overbuilding bridges, and being less creative about new designs. It's been like that for millennia. It's a good thing, have you even seen the new bridges lately? Sure, it has a long term cost, but the thing is, this approach also has benefits that more than make up for the costs. Yes, NYSE will go down for a few hours every couple years because of a bug they've introduced into their system, but the new features are worth it.

By the way, I want to focus on the author's quote:
Getting rid of errors in code (or debugging) is a beast of a job
There are two types of software engineers. One type avoids debugging, finding it an unpleasant aspect of their job. The other kind thinks debugging is their job -- that writing code is just that brief interlude before you start debugging. The first kind often gives up on bugs, finding them to be unsolveable. The second type quickly finds every bug they encountered, even the most finicky kind. Every large organization is split between these two camps: those busy writing code causing bugs, and the other camp fixing them. You can tell which camp the author of this Medium story falls into. As you can tell, I have enormous disrespect for such people.

"Lack of interest in fixing the actual problem"

The NYSE already agrees that uptime and reliability is the problem, above all others, that they have to solve. If they have a failure, it doesn't mean they aren't focused on failures as the problem.

But in truth, it's not as big a problem as they think. The stock market doesn't actually need to be that robust. It's more likely to "fail" for other reasons. For example, every time a former President dies (as in the case of Ford, Nixon, and Reagan), the markets close for a day in mourning. Likewise, wild market swings caused by economic conditions will automatically shut down the market, as they did in China recently.

Insisting that code be perfect is absurd, and impossible. Instead, the only level of perfection the NYSE needs is so that glitches in code shut down the market less often than dead presidents or economic crashes.

The same is true of United Airlines. Sure, a glitch grounded their planes, but weather and strikes are a bigger problem. If you think grounded airplanes is such an unthinkable event, then the first thing you need to do is ban all unions. I'm not sure I disagree with you, since it seems every flight I've had through Charles de Gaulle airport in Paris has been delayed by a strike (seriously, what is wrong with the French?). But that's the sort of thing you are effectively demanding.

The only people who think that reliability and uptime are "the problem" that needs to be fixed are fascists. They made trains "run on time" by imposing huge taxes on the people to overbuild the train system, then putting guns to the heads of the conductors, making them indentured servants. The reality is that "glitches" are not "the problem" -- making systems people want to use is the problem. Nobody likes it when software fails, of course, but that's like saying nobody likes losing money when playing poker. It's a risk vs. reward, we can make software more reliable but at such a huge cost that it would, overall, make software less desirable.


Security and reliability are tradeoffs. Problems happen not because engineers are morally weak (political, stupid, lazy), but because small gains in security/reliability would require enormous sacrifices in other areas, such as features, cost, performance, and usability. But "tradeoffs" are a complex answer, requiring people to thinki. "Moral weakness" is a much easier, and more attractive answer that doesn't require much thought, since everyone is convinced everyone else (but them) is morally weak. This is why so many people in my Twitter timeline keep mentioning that stupid Medium article.

More ProxyHam stuff

Somebody asked how my solution in the last post differed from the "ProxyGambit" solution. They missed my point. Just because I change the tires on the car doesn't mean I get credit for inventing or building the car. The same thing with this ProxyHam nonsense: nobody is "building a solution". Instead, we are all just using existing products the way they are intended. We are all just choosing a different mix of components.

People get all excited when they see a bare Raspberry Pi board, but the reality is that there's nothing interesting going on here, no more than lifting the hood/bonnet on your car. This is photograph from ProxyGambit:

What ProxyGambit is doing here is using cellular data on the far end rather stealing WiFi from Starbucks or the local library. Their solution looks fancy, but you can do the same thing with off-the-shelf devices for a lot cheaper. Here is the same solution with off-the-shelf products:

This is just a TL-WR703N ($26) router with a 3G USB dongle. You can get these dongles cheap off eBay used, or new for around $17. Combined, they are cheaper than a Raspberry PI. If you want to customize this, you can replace the firmware on the router with dd-wrt/OpenWRT Linux.

Like my solution, they chose Ubquiti's NanoStation. However, they went with the 2.4 GHz version (locoM2 for $49) rather than my choice of 900 MHz (locoM9 for $125). There's also a 5 GHz version one could choose from (locoM5 for $62).

The 900 MHz, 2.4 GHz, and 5 GHz are all unregulated ISM bands. They all require relatively direct line-of-sight. The 5 GHz band requires absolutely no obstructions -- you have to be able to see the other end of the connection with binoculars. The 2.4 band allows some light foliage to be in the way. The 900 MHz band is very forgiving, allowing heavy foliage and possibly a house in the way.

The upshot is that the difference between "ProxyGambit" and my solution is the use of a cellular modem on the far end rather than hitching a ride with Starbucks, and the choice of a 2.4 GHz for the long distance connection rather than 900 MHz. But don't be limited by these choices -- there is a huge range of choices that can be made here. ProxyGambit made some interesting choices -- give it a try yourself and make some different ones.