Sunday, June 02, 2013

Haswell and cybersec: it's about the crypto

This last weekend Intel formally released their “Haswell” x86 microarchitecture. I thought I’d discuss the changes that are meaningful to cybersec. The tl;dr version is this:
  • New instructions double the speed of most crypto
  • Transactions enabling code to use more cores
  • Triple graphics speed, programmable with OpenCL
  • Lower power


What is a “microarchitecture”?


There are two names to an Intel x86 processor: the technology codename, and the market segment.

Intel updates its technology every year. The codename for last year’s technology was “Ivy Bridge”, this year’s is “Haswell”, next year’s is “Broadwell”.

Intel targets its chips at different markets. The “Core i3” is for the low-end of the desktop/notebook market, the “Core i7” is for the high-end, and “Xeon” is for servers. The lower end chips are slower and less capable than the higher end chips.

When a geek asks you “what chip do you have in your laptop”, they want to know both the technology version and the market version. If you say “Core i3”, that will annoy them, because they want to know if it’s a “Core i3 Ivy Bridge” or a “Core i3 Haswell”.

This document is about the new features that will appear in the Haswell versions of the Core i3 through Core i7 and Xeon.

New crypto instructions


Previously, Intel added instructions for specific algorithms, such as the “aesinc” instruction to accelerate the AES encryption algorithm (this appeared in “Westmere” in 2010).

With Haswell, Intel adds instructions that should accelerate all encryption algorithms. Some example instructions are:
  • andn” – Combines an “and” and a “not”. The idea behind this and other instructions is to combine simple operations that used to take multiple instructions into a single instruction.
  • rorx” – Same as a “ror”, but it ignores flags. Flags, like “carry” or “zero”, are a hidden dependency in code, preventing instructions from being executed in parallel or out of order. Instructions that ignore flags can be executed four at a time.
  • mulx” – Same as “mul”, but ignores flags, making it extremely useful for RSA calculations, which executes a lot of them.
  • movbe” – Used to byte-swap big-endian data, because most crypto algorithms are big-endian.

The upshot of these instructions is that they will likely double the speed of any arbitrary crypto algorithm. Soon after the release of Haswell, we’ll see these improvements in software libraries like OpenSSL and John-the-Ripper.

Note that Intel’s motivation isn’t necessarily speed but power consumption. When using Haswell in notebooks and tablets, there is a noticeable increase in power consumption when video/audio is protected by DRM. Greater crypto efficiency translates to longer battery life.

Note that the new “Jaguar” chips found in the Xbox and PlayStation will also have these new instructions.

Transactions


Even low-end Intel CPUs come with eight cores, but most software fails to take advantage of more than three cores. Intel hasn’t been adding more cores to processors because software can’t take advantage of them.

To fix this, Intel has added new “transaction” instructions. The problem with multi-core software is that when two threads change the same memory location, they will corrupt it. The solution for this problem is for one thread to stop-and-wait for the other to finish its changes on the small chance that corruption might happen. That’s why software doesn’t scale to many cores, because it spends most of its time waiting.

What “transaction” instructions do is allow code to continue forward without waiting. The hardware tracks if corruption occurs, and when that happens, rolls everything back to the start of the transaction. Such rollbacks are no more expensive then the original stop-and-waits they replace, but will happen much less frequently. This will enable code to scale to many more cores.

These new instructions have a special trick: they are “compatible” with older processors. Intel takes advantage of a NOP (“no operation”) prefix that wasn’t used on older processors. Programmers can add this NOP instruction to code that normally waits every time on older processors, and on Haswell, such code will continue without waiting, with occasional rollbacks.

This is important to us because of lot of what we do in cybersecurity, from IDS to firewalls to crypto, is limited by the speed of the processor. Scaling across many cores will dramatically increase the speed of a lot of security code.

Triple graphics speed


Companies like nVidia and ATI thrived because the default graphics that come with Intel processors has been slow. Games were sluggish without the addition of a graphics processor aka. “GPU”.

Haswell triples the speed of its graphics over last years Ivy Bridge, which in turn was twice as fast as the previous Sandy Bridge. Haswell graphics match the speed of the low-end GPUs, but at half the power. Last year’s MacBooks came with nVidia graphics, the rumor is that this year’s MacBooks will just use Intel’s graphics.

Moreover, Haswell graphics are programmable with OpenCL. Password-cracking tools like Hashcat and John-the-Ripper should be able to exploit this in order to dramatically speed up cracking.

Note that while Haswell graphics catches up with nVidia and ATI for notebook graphics, it’ll still be behind on the desktop. Desktop chips are free to become very large and power hungry, so are much faster than even the fastest notebook chips. Thus, gamers will still want the most expensive desktop cards.

Low power and “System on Chip”


Haswell is Intel’s answer to the encroachment by ARM.

There is myth that ARM has an inherent RISC advantage over Intel x86. This is nonsense. Intel already has many RISC competitors, like SPARC, PowerPC, and MIPS, and data centers aren’t flocking to them for their low power. Instead, the advantage ARM has is that it comes as a “System on Chip” or “SoC”, combining the microprocessor with all the other chips that make up a system.

While Haswell will come as a standalone processor, it will also be shipped as an SoC. This will dramatically change the notebook market, allowing notebooks to last twice as long on battery. It will also make Haswell a great chip for tablets. Rumors are that tablets with Haswell will consume as much electricity as Apple’s iPad does now – while being much faster.

Conclusion


I think the biggest change is Intel’s dedication to crypto. That crypto causes performance and battery problems is one reason it’s adoption has been slow. By fixing this, Intel helps ensure that crypto will be included everywhere.




No comments: