Tuesday, March 16, 2010

Latest Intel processor security features

Intel has released an update version of their "Nehalem" processors, called "Westmere". The flagship processor, "Westmere-EP" has 6 cores running at 3.33 GHz, and can be purchased for desktops (Core i7 980x) or servers (Xeon 5600 series). Low-end Westmere variants are available for notebooks and desktops as well (Core i3).

Westmere contains several security features beyond what Nehalem had, so I thought I'd discuss them here.

AES speed

The press echoes Intel's claims that they speed up AES by 9 times over software implementations, but that's not completely true. It's complicated.

You only get that performance increase when you can encrypt (or decrypt) multiple blocks at a time. That is because the instructions have a high (6 clock cycle) latency. When encrypting a block, each instruction depends upon the results of the previous instruction, so the processor must stop and wait. When encrypting multiple blocks in parallel, the instructions for encrypting different blocks don't depend upon each other, and therefore the instructions can operate in parallel.

Some uses of AES "chain" blocks together. It uses data from the previous block in order to encrypt the next block of data. This prevents the CPU from executing the AES instructions in parallel, and is a huge performance loss. This is the default mode for SSL.

Disk encryption products typically don't chain blocks together, because software often reads from the middle of files ("random access"). You wouldn't want to have to decrypt the file from the start in order to read bytes from the end of the file.

Thus, if want use the new AES instructions for TrueCrypt or BitLocker disk encryption, you'll probably get around a 9 fold increase in encryption performance. However, if you want to use these new processors for SSL website hosting, you are only likely to get a 3 fold increase in encryption performance.

Note that in both cases, AES encryption is only part of the web hosting or disk encryption, so overall performance will not change as much.

AES security

A software implementation accelerates encryption by using lookup tables in memory. Each key causes a different pattern of memory lookups. Hackers can write software such that even though it's running on a different virtual machine, can still detect the pattern of memory access and thus recover part of the AES encryption key.

Intel's new AES instructions prevent this. It uses internal calculations in the CPU rather than lookup tables. The memory access pattern is the same, regardless of the key.

This is largely a theoretical attack. In the typical case of multiple customers sharing hardware for hosting websites, SSL generates a new key for every session, which are too short to make key recovery practical. However, in cryptography, "theoretical" attacks are frequently proven practical. Therefore, the new instructions are an important improvement.

SHA-3 selection

The government is currently looking for a new hash standard to replace SHA-1, which has proven to be weak.

Some have proposed algorithms that can be easily implemented in software, like Skein.

Other have proposed algorithms that are based on the same building blocks as AES. This means that while they may be slower on many processors, they will be faster on the latest Intel processors (and other processors that similarly contain AES features). Experiments with AES-like hash algorithms show that they can be sped up 5 to 10 times with the new Intel instructions.

Now that Intel is shipping these new processors, it might prejudice the SHA-3 selection committee toward one of the AES-based proposals.

Trusted Execution (TXT) and vPro

Intel added TXT features to the previous generation of processors (Core 2), but they were missing from the current generation (Nehalem, Core i7). The Westmere processor now includes the same TXT features as Core 2.

Trusted Execution protects against some specific hacker attacks. For example, "full disk encryption" products require the user to enter a password before the system can boot from the encrypted disk. In theory, a hacker could change the bootloader to first steal the password before booting the system. TXT (in theory) prevents the bootloader from being changed.

Another attack is to hook up a hostile device to the Firewire port that reads the contents of memory to a flash drive, or installs a virus on a running system. In theory, TXT features (VT-d) prevent this by restricting the range of memory the Firewire hardware can access this.

Fiddling with hardware, such as the sound or video card, has been one way that software running on a virtual machine could break into another virtual machine. TXT makes this more secure, by doing a better job of isolating hardware.

A lot of this is "theory". While it certainly makes things harder for hackers, researchers have found ways around some of the technology.

Conclusion

Corporations should take a look at "vPro" laptops and desktops, in particular, with "full disk encryption" in mind. TXT will protect the bootup processor for BitLocker, and the new AES instructions will accelerate encryption.

Web-hosting providers will like the AES encryption acceleration and greater isolation of virtual machines. The 6 cores of "Westemere" over the 4 cores of "Nehelem" processors are also a clear benefit. These processors use the same sockets, so web hosters can easily swap out the old processors for the new ones.

Notes

Intel has a whitepaper called "Intel Advanced Encryption Standard (AES) Instructions Set". It has good information, including sample implementations of AES using the new instructions.

DJB has a paper benchmarking optimized software AES "New AES software speed records". He gets 10.57 clocks-per-byte in his optimized software for the Core 2 processor, compared to (my guess) of 3.75 clocks-per-byte for the new AES instructions (in non-parallel modes like CBC). This implies a 3-fold increase for the new AES instructions.

The paper "The Intel AES Instructions Set and the SHA-3 Candidates" looks at how these new instructions might accelerate SHA-3 candidates. It also guesses that the core AES instructions have a 6-cycle latency. From this, I guess that the new AES instructions will encrypt data at 3.75 cycles per byte (each instruction executes a full AES round, each block requires 10 rounds, and there are 16 bytes per block, thus 6 * 10 / 16 = 3.75).

Security researcher Joanna Rutkowska has found several issues with TXT that are worth reading about. You should also look at the rebuttal to her points as well. Overall, it doesn't mean TXT is worthless; it still makes it harder to hack a system when deployed correctly.

Sniffing my Gmail connection, I see that it negotiates AES in chaining mode:
Secure Socket Layer
TLSv1 Record Layer: Handshake Protocol: Server Hello
Content Type: Handshake (22)
Version: TLS 1.0 (0x0301)
Length: 74
Handshake Protocol: Server Hello
Handshake Type: Server Hello (2)
Length: 70
Version: TLS 1.0 (0x0301)
Random
Session ID Length: 32
Session ID: F71EC579BD9E19F3EA64CAE5F78D3B2...
Cipher Suite: TLS_RSA_WITH_AES_256_CBC_SHA (0x0035)
Compression Method: null (0)

AnandTech and Tom's Hardware have benchmarks of the new AES instructions (using the "Clarksdale" desktop processor instead of "Westmere", but the performance should be essentially the same).

No comments: