Sunday, August 30, 2015

About the systemd controversy...

As a troll, one of my favorite targets is "systemd", because it generates so much hate on both sides. For bystanders, I thought I'd explain what that is. To begin with, I'll give a little background.

An operating-system like Windows, Mac OS X, and Linux comes in two parts: a kernel and userspace. The kernel is the essential bit, though on the whole, most of the functionality is in userspace.

The word "Linux" technically only refers to the kernel itself. There are many optional userspaces that go with it. The most common is called BusyBox, a small bit of userspace functionality for the "Internet of Things" (home routers, TVs, fridges, and so on). The second most common is Android (the mobile phone system), with a Java-centric userspace on top of the Linux kernel. Finally, there are the many Linux distros for desktops/servers like RedHat Fedora and Ubuntu -- the ones that power most of the servers on the Internet. Most people think of Linux in terms of the distros, but in practice, they are a small percentage of the billions of BusyBox and Android devices out there.

The first major controversy in Linux was the use of what's known as the microkernel, an idea that removes most traditional kernel functionality and puts it in userspace instead. It was all the rage among academics in the early 1990s. Linus famously rejected the microkernel approach. Apple's Mac OS X was originally based on a microkernel, but they have since moved large bits of functionality back into the kernel, so it's no longer a microkernel. Likewise, Microsoft has moved a lot of functionality from userspace into the Windows kernel (such as font rendering), leading to important vulnerabilities that hackers can exploit. Academics still love microkernels today, but in the real world it's too slow.

The second major controversy in Linux is the relationship with the GNU project. The GNU project was created long before Linux in order to create a Unix-like operating system. They failed at creating a usable kernel, but produced a lot of userland code. Since most the key parts of the userland code in Linux distros comes from GNU, some insist on saying "GNU/Linux" instead of just "Linux". If you are thinking this sounds a bit childish, then yes, you are right.

Now we come to the systemd controversy. It started as a replacement for something called init. A running Linux system has about 20 different programs running in userspace. When the system boots up, it has only one, a program called "init". This program then launches all the remaining userspace programs.

This init system harks back to the original creation of Unix back in the 1970s, and is bit of a kludge. It worked fine back then when systems were small (when 640k of memory was enough for anybody), but works less well on today's huge systems. Moreover, the slight difference in init details among the different Linux distros, as well as other Unix systems like Mac OS X, *BSD, and Solaris, is a constant headache for those of us who have to sysadmin these boxes.

Systemd replaces the init kludge with a new design. It's a lot less kludgy. It runs the same across all Linux distros. It also boots the system a lot a faster.

But on the flip side, it destroys the original Unix way of doing things, becoming a lot more like how the Windows equivalent (svchost.exe) works. The Unix init system ran as a bunch of scripts, allowing any administrator to change the startup sequence by changing a bit of code. This makes understanding the init process a lot easier, because at any point you can read the code that makes something happen. Init was something that anybody could understand, whereas nobody can say for certain exactly how things are being started in systemd.

On top of that, the designers of systemd are a bunch of jerks. Linus handles Linux controversies with maturity. While he derides those who say "GNU/Linux", he doesn't insist that it's wrong. He responds to his critics largely by ignoring them. On the flip side, the systemd engineers can't understand how anybody can think that their baby is ugly, and vigorously defend it. Linux is a big-tent system that accepts people of differing opinions, systemd is a narrow-minded religion, kicking out apostates.

The biggest flaw of systemd is mission creep. It is slowly growing to take over more and more userspace functionality of the system. This complexity leads to problems.

One example is that it's replaced traditional logging with a new journal system. Traditional, text-based logs were "rotated" in order to prevent the disk from filling up. This could be done because each entry in a log was a single line of text, so tools could parse the log files in order to chop them up. The new journal system is binary, so it's not easy to parse, and hence, people don't rotate the logs. This causes the hard drive to fill up, killing the system. This is noticeable when doing things like trying to boot a Raspberry Pi from a 4-gigabyte microSD card. It works with older, pre-systemd versions of Linux, but will quickly die with systemd if something causes a lot of logging on the system.

Another example is D-Bus. This is the core system within systemd that allows different bits of userspace to talk to each other. But it's got problems. A demonstration of the D-Bus problem is the recent Jeep hack by researchers Charlie Miller and Chris Valasek. The root problem was that D-Bus was openly (without authentication) accessible from the Internet. Likewise, the "AllJoyn" system for the "Internet of Things" opens up D-Bus on the home network. D-Bus indeed simplifies communication within userspace, but its philosophy is to put all your eggs in one basket, then drop the basket.

Personally, I have no opinion on systemd. I hate everything. Init was an ugly kludge, and systemd appears to be just as ugly, albeit for difference reasons. But, the amount of hate on both sides is so large that it needs to be trolled. The thing I troll most about is that one day, "systemd will replace Linux". As systemd replaces more and more of Linux userspace, and begins to drive kernel development, I think this joke will one day become true.

Saturday, August 29, 2015

No, this isn't good code

I saw this tweet go by. No, I don't think it's good code:

What this code is trying to solve is the "integer overflow" vulnerability. I don't think it solves the problem well.

The first problem is that the result is undefined. Some programmers will call safemulti_size_t() without checking the result. When they do, the code will behave differently depending on the previous value of *res. Instead, the code should return a defined value in this case, such as zero or SIZE_MAX. Knowing that this sort of thing will usually be used for memory allocations, which you want to have fail, then a good choice would be SIZE_MAX.

The worse problem is integer division. On today's Intel processors, integer multiplication takes a single clock cycle, but integer division takes between 40 and 100 clock cycles. Since you'll be usually dividing by small numbers, it's likely to be closer to 40 clock cycles rather than 100, but that's still really bad. If your solution to security problems is by imposing unacceptable tradeoffs, then you are doing security wrong. If you introduced this level of performance hit, then you might as well be programming in a safer language like JavaScript than in C.

An alternative would be the OpenBSD function reallocarray(), which I'm considering using in all my code as a replacement for malloc(), calloc(), and realloc(). It looks like this:

 * This is sqrt(SIZE_MAX+1), as s1*s2 <= SIZE_MAX
 * if both s1 < MUL_NO_OVERFLOW and s2 < MUL_NO_OVERFLOW
#define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4))

void *
reallocarray(void *optr, size_t nmemb, size_t size)
    if ((nmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) &&
        nmemb > 0 && SIZE_MAX / nmemb < size) {
            errno = ENOMEM;
            return NULL;
    return realloc(optr, size * nmemb);

Firstly, it doesn't call the horrible integer division function (unless one of the parameters is larger than 2-gigs on a 32-bit processor). Secondly, it always has a defined result.

Personally, I would improve upon this function by simply calling a signal(). Virtually no code can recover from a bad memory allocation, so instead of returning NULL, it's better to crash right here.

Friday, August 28, 2015

On science literacy...

In this WIRED article, a scientifically illiterate writer explains "science literacy". It's as horrid as you'd expect. He preaches the Aristotelian version of science that Galileo proved wrong centuries ago. His thesis is that science isn't about knowing scientific facts, but being able to think scientifically. He then claims that thinking scientifically is all about building models of how the world works.

This is profoundly wrong. Science is about observation and experimental testing of theories.

For example, consider the following question. If you had two balls of the same size, one made of lead and the other made of wood, and you dropped them at the same time, which would hit the ground first (ignoring air resistance)? For thousands of years Aristotelian scientists claimed that heavier objects fell faster, purely by reasoning about the problem. It wasn't until the time of Galileo that scientists conducted the experiment and observed that these balls hit the ground at the same time. In other words, all objects fall at the same speed, regardless or size or weight (ignoring air resistance). Feathers fall as fast as lead on the moon. If you don't believe me, drop different objects from a building and observe for yourself.

Likewise, Aristotle taught that men had more teeth than women, you know, because that makes logical sense. Galileo first got in trouble with the "scientists" of his time by actually asking people to open their mouths and counting their teeth. As it turns out, men and women have the same number of teeth.

The point here is that science is based on observation, not pure reason. Doing science means either understanding the observations made by previous scientists (i.e. "facts") or making the observations yourself. Doing science means making predictions based on theories, then conducting experiments to see if the prediction is correct. There is no science without observation.

The WIRED writer poses a similar question about a fan pushing an object across a frictionless surface. It's a silly question because, presumably, we are supposed to assume air exists for the fan to work, but that air doesn't exist to slow things down. In any event, you can't really reason about this without first learning the scientific theories of "mass" and Newtonian equations like F=MA. These theories were developed based on observation. The writer demands that to "do science" means approaching this problem from an Aristotelian method of reasoning, divorced from previous scientific observations.

Similarly, he poses the question about the phases of the moon if it were a cube instead of a sphere. Well, this has complications. I doubt the face of the moon would appear to be a square, as my understanding of orbital mechanics suggests that it'd be a corner facing the earth instead of a square side (assuming it would stay tidally locked). But even assuming we got a cubic face, then there are still the problems of inclined orbits and libration. Finally, he poses the question right at the precise moment between when such a side would emerge from the shadow and become lit -- so therefore it's impossible to say whether the side would be dark or lit. It's stupid reasoning about this -- it's something that ought to be observed -- if only with a computer model. I guess the thing you ought to learn is that the entire face of the cube is either all light or all dark, unlike a sphere which gets partially lit

That WIRED writer says science is not about knowing the difference between a "planet" and a "dwarf planet" like Pluto. He's wrong. Pluto is much smaller than the 8 planets. Whereas the 8 planets have nearly circular orbits in the same plane, Pluto has a highly elliptical orbit that takes it sometimes inside the orbit of Neptune and far above the orbital plane of the other planets. Moreover, in recent years, we have observed many other Pluto-sized objects that share these same characteristics with Pluto (like "Eris", which is more massive than Pluto). Yes, the names that we give these things don't matter, but the observed differences matter a heck of a lot. Science is about knowing these observations. That we teach students the names of planets, but not what what we observe about them, is a travesty that leads to illiteracy.

Science is sadly politicized, such as with issues like Climate-Change/Global-Warming. We are expected to believe Science as some sort of religion, where the common people are unable to read the Latin Bible. We are not expected to understand things like "absorption spectra" or "thermal infrared".  To point out that scientific observations have shown that hurricanes haven't, in fact, gotten worse is considered heresy, because it denies computer models that claim hurricanes will get worse. Climate change is a problem we need to address, but with science rather than current scientific illiteracy and quasi-religious dogma.

Scientific literacy starts with understanding what science is, namely that it's based on observation, coming up with theories/hypotheses to explain the observations, then relentlessly testing those theories, trying to prove them wrong. Secondly, scientific literacy means learning the observations made by scientists over the last few hundred years. We don't have to come up with F=MA or the speed-of-light ourselves, but learn from previous scientists. Believing in Evolution doesn't make you scientifically literate, understanding radioisotope dating and rock strata does.

What this WIRED article highlights is that Aristotelian science illiteracy is so pervasive it even infects science writers at major publications. What you should do about this is pick up a book and try to cure your own illiteracy. Really, any high-school textbook should do.

Thursday, August 20, 2015

A lesson in BitTorrent

Hackers have now posted a second dump of Ashley-Madison, this time 20-gigabytes worth of data. Many, mostly journalists, are eagerly downloading this next dump. However, at the time of this writing, nobody has finished downloading it yet. None of the journalists have a complete copy, so you aren't seeing any new stories about the contents. It promises the full email spool of the CEO in the file name, but no journalist has yet looked into that mail spool and reported a story. Currently, the most any journalist has is 85% of the dump, slowly downloading the rest at 37-kilobytes/second.

Why is that? Is AshMad doing some sort of counter-attack to stop the downloaded (like Sony did)? Or is it overloaded because too many people are trying to download?

No, it's because it hasn't finished seeding.

BitTorrent is p2p (peer-to-peer). You download chunks from the peers, aka. the swarm, not the original source (the tracker). Instead of slowing down as more people join the swarm to download the file(s), BitTorrent downloads become faster -- the more people you can download from, the faster it goes.

But 9 women can't make a baby in 1 month. The same goes for BitTorrent. You can only download chunks from peers if they've got all the chunks. That's the current problem with the AshMad dump: everyone combined has only 85% of all possible chunks. The remaining 15% of the chunks haven't been uploaded to the swarm yet. Nobody has a complete copy. The original tracker is seeding at a rate of 37-kilobytes/second, handing off the next chunk to a random person in the swarm, who quickly exchanges it with everyone else in the swarm.

Thus, we see something like the following image, where everyone is stuck at 85% download:

It'll take many more hours until this is complete.

I point this out because it's a useful real-world lesson for BitTorrent. Peer-to-peer speeds up downloads in ideal cases, but it can't overcome physics. Physics, in this case, means that nobody yet has a complete 100% copy, so nobody else can download one.

AshMad is prostitution not adultery

The Ashley-Madison website advertises adultery, but that's a lie. I've talked to a lot of users of the site, and none of them used it to cheat on their spouse. Instead, they used it as just a "dating" site -- and even that is a misnomer, since "dating" often just means a legal way to meet prostitutes. According to several users, prostitutes are really the only females they'd consistently meet on Ashley-Madison.

In other words, Ashley-Madison is a prostitution website, not an adultery website. "Cheating" is just the hook, to communicate to the users that they should expect sex, but not a future spouse. And the website is upfront about charging for it.

I point this out because a lot of people have gone over-the-top on the adultery angle, such as this The Intercept piece. That's rather silly since Ashley-Madison wasn't really about adultery in the first place.

Wednesday, August 19, 2015

Trump is right about the 14th Amendment

Trump sucks all the intelligence out of the room, converting otherwise intelligent and educated pundits into blithering idiots. Today's example is the claim that Trump said:
"The 14th Amendment is unconstitutional."
Of course he didn't say that. What he did say is that the 14th Amendment doesn't obviously grant "birthright citizenship" to "anchor babies". And he's completely correct. The 14th Amendment says:
"All persons born or naturalized in the United States, and subject to the jurisdiction thereof, are citizens of the United States"
The complicated bit is in parentheses. If you remove that bit, then of course Trump would be wrong, since it would clearly say that being born in the U.S. grants citizenship.

But the phrase is there, so obviously some babies born in the U.S. aren't guaranteed (by the constitution) citizenship. Which babies are those?

The immigration law 8 U.S.C. § 1408(a) lists some of them: babies of ambassadors, heads of state, and military prisoners. [UPDATE: this appears wrong, I saw it in many Internet posts, but it appears to be untrue. But, it doesn't change the conclusion. I'll update this post again when I figure this out].

It's this law that currently grants babies citizenship, not the constitution. Laws can be changed by Congress. Presumably, "illegal aliens" could easily be added to the list.

This would be challenged, of course, and it'd probably work it's way up to the Supreme Court, at which point they'd rule definitively on whether the Constitution grants all babies citizenship. The point is simply that the Supreme Court hasn't ruled yet. Nobody can cite a Supreme Court decision clearly disproving Trump.

Thus, if you listen to Trump's remarks that everyone is criticizing, you'll see that he's right. Not all babies are granted citizenship (those of foreign ambassadors, heads of state, and military prisoners). Lots of legal scholars believe the same extends to babies of illegal aliens. There is a good chance the Supreme Court would rule in Trump's favor on the issue if current immigration law were changed. (And likewise, a good chance they'd rule against him).

My point is this. Trump is a filthy populist troll. Don't feed the trolls. No really, stop it. It's like Kansas farmer's advice: Never wrestle a pig. The pig loves it, and you'll just get muddy. Trump is going to say lots of crazy things. Just ignore him rather than descending to his level and saying crazy/dumb things back.

The closest we have to a Supreme Court decision on the matter is Plyler v. Doe, which deals with a separate issue. It's discussion of 'jurisdiction' could potentially apply to newborns.

The second closest is US v. Won Kim Ark, which (as the Wikipedia article says), many legal scholars do not think applies to illegal immigrants.

Notes on the Ashley-Madison dump

Ashley-Madison is a massive dating site that claims 40 million users. The site is specifically for those who want to cheat on their spouse. Recently, it was hacked. Yesterday, the hackers published the dumped data.

It appears legit. I asked my twitter followers for those who had created accounts. I have verified multiple users of the site, one of which was a throw-away account used only on the site. Assuming my followers aren't lying, this means the dump is confirmed. Update: one follower verified his last 4 digits of credit-card number and billing address was exposed.

It's over 36-million accounts. That's not quite what they claim, but it's pretty close. However, glancing through the data, it appears that a lot of the accounts are bogus, obviously made up things for people who just want to look at the site without creating a "real" account.

It's heavily men. I count 28-million men to 5 million woman, according to the "gender" field in the database (with 2-million undetermined). However, glancing through the credit-card transactions, I find only male names.

It's full account information. This includes full name, email, and password hash as you'd expect. It also includes dating information, like height, weight, and so forth. It appears to contain addresses, as well as GPS coordinates. I suspect that many people created fake accounts, but with an app that reported their real GPS coordinates.

Passwords hashed with bcrypt. Almost all the records appear to be protected with bcrypt. This is a refreshing change. Most of the time when we see big sites hacked, the passwords are protected either poorly (with MD5) or not at all (in "clear text", so that they can be immediately used to hack people). Hackers will be able to "crack" many of these passwords when users chose weak ones, but users who strong passwords are safe.

Maybe 250k deleted accounts. There are about 250k accounts that appear to have the password information removed. I don't know why, maybe it's accounts that have paid to be removed. Some are marked explicitly as such, others imply that.

Partial credit card data. It appears to have credit card transaction data -- but not the full credit card number. It does have full name and addresses, though. This is data that can "out" serious users of the site.

You can download everything via BitTorrent. The magnet number is
40ae8a90de40ca3afa763c8edb43fc1fc47d75f1. If you've got BitTorrent installed, you can use this to download the data. It's 9.7 gigabytes compressed, so you'll need a good Internet connection.

The hackers call themselves the "Impact Team". Their manifesto is here. They appear to be motivated by the immorality of adultery, but in all probability, their motivation is that #1 it's fun and #2 because they can. They probably used phishing, SQL injection, or re-used account credentials in order to break in.

They deserve some praise. Compared to other large breaches, it appears Ashley-Madison did a better job at cybersecurity. They tokenized credit card transactions and didn't store full credit card numbers. They hashed passwords correctly with bcrypt. They stored email addresses and passwords in separate tables, to make grabbing them (slightly) harder. Thus, this hasn't become a massive breach of passwords and credit-card numbers that other large breaches have lead to. They deserve praise for this.

Josh Duggar. This Gawker article appears correct from my reading of the data.

Some stories in the press: