Errata Security: This is finally the year of the ARM server

"RISC" was an important architecture from the 1980s when CPUs had fewer than 100,000 transistors. By simplifying the instruction set, they free up transistors for more registers and better pipelining. It meant executing more instructions, but more than making up for this by executing them faster.

But once CPUs exceed a million transistors around 1995, they moved to Out-of-Order, superscalar architectures. OoO replaces RISC by decoupling the front-end instruction-set with the back-end execution. A "reduced instruction set" no longer matters, the backend architecture differs little between Intel and competing RISC chips like ARM. Yet people have remained fixated on instruction set. The reason is simply politics. Intel has been the dominant instruction set for the computers we use on servers, desktops, and laptops. Many instinctively resist whoever dominates. In addition, colleges indoctrinate students on the superiority of RISC. Much college computer science instruction is decades out of date.

For 10 years, the ignorant press has been championing the cause of ARM's RISC processors in servers. The refrain has always been that RISC has some inherent power efficiency advantage, and that ARM processors with natural power efficiency from the mobile world will be more power efficient for the data center.

None of this is true. There are plenty of RISC alternatives to Intel, like SPARC, POWER, and MIPS, and none of them ended up having a power efficiency advantage.

Mobile chips aren't actually power efficient. Yes, they consume less power, but because they are slower. ARM's mobile chips have roughly the same computations-per-watt as Intel chips. When you scale them up to the same amount of computations as Intel's server chips, they end up consuming just as much power.

People are essentially innumerate. They can't do this math. The only factor they know is that ARM chips consume less power. They can't factor into the equation that they are also doing fewer computations.

There have been three attempts by chip makers to produce server chips to complete against Intel. The first attempt was the "flock of chickens" approach. Instead of one beefy OoO core, you make a chip with a bunch of wimpy traditional RISC cores.

That's not a bad design for highly-parallel, large-memory workloads. Such workloads spread themselves efficiently across many CPUs, and spend a lot of time halted, waiting for data to be returned from memory.

But such chips didn't succeed in the market. The basic reason was that interconnecting all the cores introduced so much complexity and power consumption that it wasn't worth the effort.

The second attempt was multi-threaded chips. Intel's chips support two threads per core, so that when one thread halts waiting for memory, the other thread can continue processing what's already stored in cache and in registers. It's a cheap way for processors to increase effective speed while adding few additional transistors to the chip. But it has decreasing marginal returns, which is why Intel only supports two threads. Vendors created chips with as many as 8 threads per core. Again, they were chasing the highly parallel workloads that waited on memory. Only with multithreaded chips, they could avoid all that interconnect nastiness.

This still didn't work. The chips were quite good, but it turns out that these workloads are only a small portion of the market.

Finally, chip makers decided to compete head-to-head with Intel by creating server chips optimized for the same workloads as Intel, with fast single-threaded performance. A good example was Qualcomm, who created a server chip that CloudFlare promised to use. They announced this to much fanfare, then abandoned it a few months later as nobody adopted it.

The reason was simply that when you scaled to Intel-like performance, you have Intel-like liabilities. Your only customers are the innumerate who can't do math, who believe like emperors that their clothes are made from the finest of fabrics. Techies who do the math won't buy the chip, because any advantage is marginal. Moreover, it's a risk. If they invest heavily in the platform, how do they know that it'll continue to exist and keep up with Intel a year from now, two years, ten years? Even if for their workloads they can eke out 10% benefit today, it's just not worth the trouble when it gets abandoned two years later.

Thus, ARM server processors can be summarized by this: the performance and power efficiencies aren't there, and without them, there's no way the market will accept them as competing chips to Intel.

This brings us to chips like Graviton2, and similar efforts at other companies like Apple and Microsoft. I'm pretty sure it is going to succeed.

The reason is the market, rather than the technology.

The old market was this: chip makers (Intel, AMD, etc.) sold to box makers (Dell, HP, etc.) who sold to Internet companies (Amazon, Rackspace, etc.).

However, this market has been obsolete for a while. The leading Internet companies long ago abandoned the box vendors and started making their own boxes, based on Intel chips.

Making their own chips, making the entire computer from the ground up to their specs, is the next logical evolution.

This has been going on for some time, we just didn't notice. Most all the largest tech companies have their own custom CPUs. Apple has a custom ARM chip in their iPhone. Samsung makes custom ARM chips for their phones. IBM has POWER and mainframe chips. Oracle has (or had) SPARC. Qualcomm makes custom ARM chips. And so on.

In the past, having your own CPU meant having your own design, your own instruction set, your own support infrastructure (like compilers), and your own fabs for making such chips. This is no longer true. You get CPU designs from ARM, then have a fab like TSMC manufacture the chip. Since it's ARM, you get for free all the rest of the support infrastructure.

Amazon's Graviton1 chip was the same CPU core (ARM Cortex A72) as found in the Raspberry Pi 4. Their second generation Graviton2 chip has the same CPU core (ARM Cortex A76) as found in Microsoft's latest Windows Surface notebook computer.

Amazon doesn't care about instruction set, or whether a chip is RISC. It cares about the rest of the feature of the chip. For example, their chips support encrypted memory, a feature that you might want in a cloud environment that hosts content from many different customers.

Recently, Sony and Microsoft announced their next-gen consoles. Like their previous generation, these are based on custom AMD designs. Gaming consoles have long been the forerunners of this new market: shipping in high enough volumes that they can get a custom design for their chip. It's just that Amazon, through its cloud instances, is now of sufficient scale, that they can sell as many instances as game consoles.

The upshot is that custom chips are becoming less and less a barrier, just like custom boxes became less of a barrier a decade ago. More and more often, the world's top tech companies will have their own chip. Sometimes, this will be in partnership with AMD with an x86 chip. Most of the time, it'll be the latest ARM design, manufactured on TSMC or Samsung fabs. IBM will still have POWER and mainframe chips for their legacy markets. Sometimes you'll have small microcontroller designs, like Western Digital's RISC-V chips. Intel's chips are still very good, so their market isn't disappearing. However, the market for companies like Dell and HP is clearly a legacy market, to be thought of in the same class as IBM's still big mainframe market.

Errata Security

Friday, December 13, 2019

This is finally the year of the ARM server

2 comments: