Thursday, January 04, 2018

Some notes on Meltdown/Spectre

I thought I'd write up some notes.

You don't have to worry if you patch. If you download the latest update from Microsoft, Apple, or Linux, then the problem is fixed for you and you don't have to worry. If you aren't up to date, then there's a lot of other nasties out there you should probably also be worrying about. I mention this because while this bug is big in the news, it's probably not news the average consumer needs to concern themselves with.

This will force a redesign of CPUs and operating systems. While not a big news item for consumers, it's huge in the geek world. We'll need to redesign operating systems and how CPUs are made.

Don't worry about the performance hit. Some, especially avid gamers, are concerned about the claims of "30%" performance reduction when applying the patch. That's only in some rare cases, so you shouldn't worry too much about it. As far as I can tell, 3D games aren't likely to see less than 1% performance degradation. If you imagine your game is suddenly slower after the patch, then something else broke it.

This wasn't foreseeable. A common cliche is that such bugs happen because people don't take security seriously, or that they are taking "shortcuts". That's not the case here. Speculative execution and timing issues with caches are inherent issues with CPU hardware. "Fixing" this would make CPUs run ten times slower. Thus, while we can tweek hardware going forward, the larger change will be in software.

There's no good way to disclose this. The cybersecurity industry has a process for coordinating the release of such bugs, which appears to have broken down. In truth, it didn't. Once Linus announced a security patch that would degrade performance of the Linux kernel, we knew the coming bug was going to be Big. Looking at the Linux patch, tracking backwards to the bug was only a matter of time. Hence, the release of this information was a bit sooner than some wanted. This is to be expected, and is nothing to be upset about.

It helps to have a name. Many are offended by the crassness of naming vulnerabilities and giving them logos. On the other hand, we are going to be talking about these bugs for the next decade. Having a recognizable name, rather than a hard-to-remember number, is useful.

Should I stop buying Intel? Intel has the worst of the bugs here. On the other hand, ARM and AMD alternatives have their own problems. Many want to deploy ARM servers in their data centers, but these are likely to expose bugs you don't see on x86 servers. The software fix, "page table isolation", seems to work, so there might not be anything to worry about. On the other hand, holding up purchases because of "fear" of this bug is a good way to squeeze price reductions out of your vendor. Conversely, later generation CPUs, "Haswell" and even "Skylake" seem to have the least performance degradation, so it might be time to upgrade older servers to newer processors.

Intel misleads. Intel has a press release that implies they are not impacted any worse than others. This is wrong: the "Meltdown" issue appears to apply only to Intel CPUs. I don't like such marketing crap, so I mention it.




Statements from companies:










Wednesday, January 03, 2018

Why Meltdown exists

So I thought I'd answer this question. I'm not a "chipmaker", but I've been optimizing low-level assembly x86 assembly language for a couple of decades.





The tl;dr version is this: the CPUs have no bug. The results are correct, it's just that the timing is different. CPU designers will never fix the general problem of undetermined timing.

Let's see if I've got Metldown right

I thought I'd write down the proof-of-concept to see if I got it right.

So the Meltdown paper lists the following steps:

 ; flush cache
 ; rcx = kernel address
 ; rbx = probe array
 retry:
 mov al, byte [rcx]
 shl rax, 0xc
 jz retry
 mov rbx, qword [rbx + rax]
 ; measure which of 256 cachelines were accessed

So the first step is to flush the cache, so that none of the 256 possible cache lines in our "probe array" are in the cache. There are many ways this can be done.

Now pick a byte of secret kernel memory to read. Presumably, we'll just read all of memory, one byte at a time. The address of this byte is in rcx.

Now execute the instruction:
    mov al, byte [rcx]
This line of code will crash (raise an exception). That's because [rcx] points to secret kernel memory which we don't have permission to read. The value of the real al (the low-order byte of rax) will never actually change.

But fear not! Intel is massively out-of-order. That means before the exception happens, it will provisionally and partially execute the following instructions. While Intel has only 16 visible registers, it actually has 100 real registers. It'll stick the result in a pseudo-rax register. Only at the end of the long execution change, if nothing bad happen, will pseudo-rax register become the visible rax register.

But in the meantime, we can continue (with speculative execution) operate on pseudo-rax. Right now it contains a byte, so we need to make it bigger so that instead of referencing which byte it can now reference which cache-line. (This instruction multiplies by 4096 instead of just 64, to prevent the prefetcher from loading multiple adjacent cache-lines).
 shl rax, 0xc

Now we use pseudo-rax to provisionally load the indicated bytes.
 mov rbx, qword [rbx + rax]

Since we already crashed up top on the first instruction, these results will never be committed to rax and rbx. However, the cache will change. Intel will have provisionally loaded that cache-line into memory.

At this point, it's simply a matter of stepping through all 256 cache-lines in order to find the one that's fast (already in the cache) where all the others are slow.


Tuesday, December 19, 2017

Bitcoin: In Crypto We Trust

Tim Wu, who coined "net neutrality", has written an op-ed on the New York Times called "The Bitcoin Boom: In Code We Trust". He is wrong about "code".

Wednesday, December 06, 2017

Libertarians are against net neutrality

This post claims to be by a libertarian in support of net neutrality. As a libertarian, I need to debunk this. "Net neutrality" is a case of one-hand clapping, you rarely hear the competing side, and thus, that side may sound attractive. This post is about the other side, from a libertarian point of view.

Friday, November 24, 2017

A Thanksgiving Carol: How Those Smart Engineers at Twitter Screwed Me

Thanksgiving Holiday is a time for family and cheer. Well, a time for family. It's the holiday where we ask our doctor relatives to look at that weird skin growth, and for our geek relatives to fix our computers. This tale is of such computer support, and how the "smart" engineers at Twitter have ruined this for life.

Thursday, November 23, 2017

Don Jr.: I'll bite

So Don Jr. tweets the following, which is an excellent troll. So I thought I'd bite. The reason is I just got through debunk Democrat claims about NetNeutrality, so it seems like a good time to balance things out and debunk Trump nonsense.

Wednesday, November 22, 2017

NetNeutrality vs. limiting FaceTime

People keep retweeting this ACLU graphic in regards to NetNeutrality. In this post, I debunk the fourth item. In previous posts [1] [2] I debunk other items.

NetNeutrality vs. Verizon censoring Naral

People keep retweeting this ACLU graphic in support of net neutrality. It's wrong. In this post, I debunk the second item. I debunk other items in other posts [1] [4].

NetNeutrality vs. AT&T censoring Pearl Jam

People keep retweeting this ACLU graphic in response to the FCC's net neutrality decision. In this post, I debunk the first item on the list. In other posts [2] [4] I debunk other items.

The FCC has never defended Net Neutrality

This op-ed by a "net neutrality expert" claims the FCC has always defended "net neutrality". It's garbage.

This wrong on its face. It imagines decades ago that the FCC inshrined some plaque on the wall stating principles that subsequent FCC commissioners have diligently followed. The opposite is true. FCC commissioners are a chaotic bunch, with different interests, influenced (i.e. "lobbied" or "bribed") by different telecommunications/Internet companies. Rather than following a principle, their Internet regulatory actions have been ad hoc and arbitrary -- for decades.

Sure, you can cherry pick some of those regulatory actions as fitting a "net neutrality" narrative, but most actions don't fit that narrative, and there have been gross net neutrality violations that the FCC has ignored.