Friday, August 28, 2009

Intel’s Atom vs. Cybersecurity


Intel has two new exciting CPUs: the low-powered "Atom" and the fast "Nehalem" aka. Core i7. I thought I'd cover some points related to the Atom processor.

WHAT MAKES IT DIFFERENT

The Atom sacrifices performance for power efficiency. It's roughly 1/10th as fast as the fastest desktop processor, but consumes 1/100th the electrical power.

It's a completely new design. Intel's current processors (like the Nehalem/Core-i7 and the Core2) are derived from the line of processors first shipped in 1998 as the "Pentium Pro" or "P6". The major difference in the designs is that the mainstream processors are "out-of-order", whereas the Atom is "in-order/hyper-threaded". That means for single-threaded applications, the Atom is roughly half as fast in comparison.

The major competitor to the Atom is the "CULV" or "Consumer Ultra Low Voltage" processors from Intel. You'll see equivalent netbook/notebook designs from manufacturers like Asus, Acer, or MSI that look otherwise identical except for the processor: either a 1.6-GHz Atom or a 1.4-GHz Core2-Solo/CULV. Because of the in-order vs. out-of-order, the single threaded tasks will be half as fast on the Atom machines. On the other hand, in applications that can take advantage two threads, the Atom machine is just as fast the CULV machine.

DISPOSABLE COMPUTING

In my pentests, I need computers that I can damage, lose, or deliberately throw away. The Atom forms the basis for more cheap $200 "netbook" computers. This is less than our hourly consulting rate, so fits the bill perfectly.

These are great for "wired" assessments, where I'm running tools like Nessus to scan behind the firewall or sniff packets from a (100-mbps) connection.

These are even better for "wireless" assessments, where I need to leave a computer outside a building scanning, or setting up an "evil twin" to trick employees. Maybe somebody will have discovered the computer and taken it, maybe it gets rained on -- it's only $200, so it's not a big deal.

The devices are also extremely small and portable. We can travel with a bunch of them on the plane in our carry-on luggage. They are also damn sexy: I've never been one to mess up my laptop with stickers and trinkets, but it's fun to decorate the cheap netbooks.

This story is apparently about a pentest/hack where the perp sent netbooks to an office appearing from HP, but likely containing malware.

VIRUS ANALYSIS

I'm infecting my Windows netbooks with viruses. It's pretty easy to clone a small system, infect it with a virus, then restore the cloned image.

I prefer doing this because I get a more "real" assessment of the virus. A lot of them check for VMware, a lot of them check for "known" IP addresses. I can take a netbook to a public cafe, log on there, infect my computer, then sniff the traffic with a second computer. It simulates a much more "real" environment for the virus.

LOW POWER

Like all such geeks, I have a large test lab running many operating systems and servers. These systems run 24-hours a day. This causes a large electricity bill. I've converted most of these to Atom processor systems, such as the Eee Box desktop computer (typically 15 watts), netbooks (10 watts), and I'm thinking of the Acer easyStore home server.

This is has had a noticeable effect on my server room, drastically reducing temperatures. It's a big drop from a system running over 100-watts at idle to one running 15-watts.

Note that the Atom processor itself run at just a couple watts, but the remaining chips in the system run at 10 to 15 watts. I notice that on the lowest power system I have, it's less than 1 watt difference between "sleep" mode and "password cracking" mode.

FULL FEATURE

The Atom processor line supports all the recent major features of Intel processors, such as "virtualization", "NX" bit, SSE3, 64-bit, hyper-threading, and so on.

Strangely, there isn't a single version of the processor that supports all these features at the same time. The ones that support 64-bit don't support the VT virtualization extensions (although you can still do the older form of virtualization). According to this website, a guy is running ESXi on a Dell Mini 9.

Intel has a nice site for comparing features of the Atom processor.

PASSWORD CRACKING

One of the biggest changes in the Core2 processor (vs. the older Pentium M and Pentium 4) is that the SSE instructions ran at the full 128-bit. Prior to that, while SSE registers were 128-bits wide, they would only process the first 64-bits in one clock cycle, then the second 64-bits in the next clock cycle. Thus, the Core2 represented an 2x increase in SSE speed.

That was one of my biggest questions for the Atom: is their SSE implementation like the old processors or the new processors? I couldn't find this documented anywhere, so I had to benchmark my password cracking code (which uses SSE instructions).

I assumed the worst, but was pleasantly surprised: the Atom processor executes a full 128-bits in a single clock cycle. That means that for SSE code, a 1.6-GHz Atom will be faster than a 1.4-GHz Core2-solo/CULV at password cracking. This is indeed the results that I get. Likewise, my dual-core Atom 330 system (Eee Box) is as fast as my dual-core MacBook Air 1.86-GHz Core 2 Duo (faster, even, because the cooling often kicks in throttling the CPU).

Note that the processors require different optimizations. The Atom requires a very simple code that can be easily hyperthreaded. The Core2 requires manually interleaving two streams of instructions that run in a single thread.

Since 100% CPU usage is roughly the same electrical power usage as 0%, I leave password cracking running in the background on Atom servers.

SMALL DEVICES


These netbooks use close to the same power as other devices in my home. My WRT54G uses 8-Watts, my Acer Aspire uses 12-Watts (picture on right) with screen turned off and battery removed (while running password cracker at 100% CPU). The WRT54G is a WiFi access-point/router from Cisco that is famous for hackers replacing the firmware with their own special Linux distros. With only 4-megs of flash and 16-megs of RAM, it's much more limited than netbooks that start at 4-GIGS of flash and 512-megs of RAM.

You can install "soft APs" to convert a netbook into an access-point, and install other goodies like intrusion-detection systems and firewalls. While they are far from perfect, they can make nice little home devices.

X86 VS ARM

In theory, RISC processors (especially ARM) should be a better solution for low-powered, highly-functional devices. There are lots of nice ARM solutions (like this wallplug computer or bigger devices like this one). The new ARM Cortex 9 looks extremely sexy.

Yet, these don't turn out so well in practice. These ARM devices don't work like computers I'm familiar with. I can't simply stick in a CD or USB drive, boot the machine, and install my favorite distro with my favorite developer tools. Instead, I have to install ARM cross compilers on my Linux box and go from there. It's very annoying. I'd be willing to go through the effort if I'm developing a special device to sell to customers, but I'm not willing to bother if I just want to create a device for myself. It's just easier to get a $200 netbook.

There is also some value with familiarity of the x86 instruction set. While Atom's in-order design is a radical departure from previous Intel CPUs, old rules for optimizations generally apply. More importantly, things like SSE behave the same, and work elegantly, whereas in the ARM process, multimedia instructions are a bit weird.

CONCLUSION

I like the Atom because I can now throw a cheap computer at a problem and solve it, especially my ever hotter server room.

4 comments:

David Andersen said...

Thanks for the cool (such as it was) post, Robert -- it's fun to see more uses for the Atom, and now it's clear I have to find someone to do some comprehensive power efficiency benchmarks of the Atom vs. the Nehalem for the password cracking benchmark you mentioned. Thanks! On a related note, we're using these processors (and friends) for building data-processing clusters that we call FAWNs (fast arrays of wimpy nodes). They turn out to be pretty amazing in terms of gigabytes/watt and megabytes per second per watt for data-intensive processing.

Anonymous said...

One important thing is that CPUs using two operands per instruction, that means SSE instructions performed as destination = destination [operation] source while GPUs using three operands per instruction, so it's destination = source1 [operation] source2. It may looks not that important but actually it means that we need to perform 2 more MOVE instructions per each subfunction (one for logic, one for cyclic rotation). And with such low (only 10) instructions overall it produces +20% penalty for CPUs. So we're ends with 752 instructions with 32-bit integers required to perform one MD5_Transform when there no three operands instructions available.

Another important thing is that for hash cracking we don't need to perform full 64 iterations. As hash divided into 4 32-bit values it's enough to perform 61 iterations to fully compute one of this values. And abandon 3 more iterations if this value doesn't match hash we're looking for. It's pretty common optimization done by almost every hash cracker.

Johnny Blaze
Sill trying to crack the Electronic Cigarette Code

alicejane said...
This comment has been removed by a blog administrator.
alicejane said...
This comment has been removed by a blog administrator.