Sunday, October 23, 2022

The RISC Deprogrammer

I should write up a larger technical document on this, but in the meanwhile is this short (-ish) blogpost. Everything you know about RISC is wrong. It's some weird nerd cult. Techies frequently mention RISC in conversation, with other techies nodding their head in agreement, but it's all wrong. Somehow everyone has been mind controlled to believe in wrong concepts.

An example is this recent blogpost which starts out saying that "RISC is a set of design principles". No, it wasn't. Let's start from this sort of viewpoint to discuss this odd cult.

What is RISC?

Because of the march of Moore's Law, every year, more and more parts of a computer could be included onto a single chip. When chip densities reached the point where we could almost fit an entire computer on a chip, designers made tradeoffs, discarding unimportant stuff to make the fit happen. They made tradeoffs, deciding what needed to be included, what needed to change, and what needed to be discarded.

RISC is a set of creative tradeoffs, meaningful at the time (early 1980s), but which were meaningless by the late 1990s.

The interesting parts of CPU evolution are the three decades from 1964 with IBM's System/360 mainframe and 2007 with Apple's iPhone. The issue was a 32-bit core with memory-protection allowing isolation among different programs with virtual memory. These were real computers, from the modern perspective: real computers have at least 32-bit and an MMU (memory management unit).

The year 1975 saw the release of Intel 8080 and MOS 6502, but these were 8-bit systems without memory protection. This was at the point of Moore's Law where we could get a useful CPU onto a single chip.

In the year 1977 we saw DEC release it's VAX minicomputer, having a 32-bit CPU w/ MMU. Real computing had moved from insanely expensive mainframes filling entire rooms to less expensive devices that merely filled a rack. But the VAX was way too big to fit onto a chip at this time.

The real interesting evolution of real computing happened in 1980 with Motorola's 68000 (aka. 68k) processor, essentially the first microprocessor that supported real computing.

But this comes with caveats. Making microprocessor required creative work to decide what wasn't included. In the case of the 68k, it had only a 16-bit ALU. This meant adding two 32-bit registers required passing them twice through the ALU, adding each half separately. Because of this, many call the 68k a 16-bit rather than 32-bit microprocessor.

More importantly, only the lower 24-bits of the registers were valid for memory addresses. Since it's memory addressing that makes a real computer "real", this is the more important measure. But 24-bits allows for 16-megabytes of memory, which is all that anybody could afford to include in a computer anyway. It was more than enough to run a real operating system like Unix. In contrast, 16-bit processors could only address 64-kilobytes of memory, and weren't really practical for real computing.

The 68k didn't come with a MMU, but it allowed an extra MMU chip. Thus, the early 1980s saw an explosion of workstations and servers consisting of a 68k and an MMU. The most famous was Sun Microsystems launched in 1982, with their own custom designed MMU chip.

Sun and its competitors transformed the industry running Unix. Many point to IBM's PC from 1982 as the transformative moment in computer history, but these were non-real 16-bit systems that struggled with more than 64k of memory. IBM PC computers wouldn't become real until 1993 with Microsoft's Windows NT, supporting full 32-bits, memory-protection, and pre-emptive multitasking.

But except for Windows itself, the rest of computing is dominated by the Unix heritage. The phone in your hand, whether Android or iPhone, is a Unix computer that inherits almost nothing from the IBM PC.

These 32-bit Unix systems from the early 1980s still lagged behind DEC's VAX in performance. The VAX was considered a mini-supercomputer. The Unix workstations were mere toys in comparison. Too many tradeoffs were made in order to fit everything onto a single chip, too many sacrifices made.

Some people asked "What if we make different tradeoffs?"

Most people thought the VAX was the way of the future, and were all chasing that design. The 68k CPU was essentially a cut down VAX design. But history had anti-VAX designs that worked very differently, notably the CDC 6600 supercomputer from the 1960s and the IBM 801/ROMP processor from the 1970s.

It's not simply one tradeoff, but a bunch of inter-related tradeoffs. They snowball -- each choice you make changes the costs-vs-benefit analysis of other choices, changing them as well.

This is why people can't agree upon a single definition of RISC. It's not one tradeoff made in isolation, but a long list of tradeoffs, each part of a larger scheme.

In 1987, Motorola shipped its 68030 version of the 68k processor, chasing the VAX ideal. By then, we had ARM, SPARC, and MIPS processors that significantly outperformed it. Given a budget of roughly 100,000 transistors allowed by Moore's Law of the time, the RISC tradeoffs were better than VAX-like tradeoffs.

So really, what is RISC?

Let's define things in terms of 1986, comparing the [ARM, SPARC, MIPS] processors called "RISC" to the [68030, 80386] processors that weren't "RISC". They all supported full 32-bit processing, memory-management, and preemptive multitasking operating systems like Unix.

The major ways RISC differed were:

  • fixed-length instructions (32-bits or 4-bytes each)
  • simple instruction decoding
  • horizontal vs. vertical microcode
  • deep pipelines of around 5 stages
  • load/store aka reg-reg
  • simple address modes
  • compilers optimized code
  • more registers

If you are looking for the one thing that defines RISC, it's the thing that nobody talks about: horizontal microcode.

The VAX/68k/x86 architecture decoded external instructions into internal control ops that were pretty complicated, supporting such things as loops. Each external instruction executed an internal microprogram with a variable number of such operations.

The classic RISC worked differently. Each external instruction decoded into exactly 4 internal ops. Moreover, each op had a fixed purpose:

  1. read from two registers into the ALU (arithmetic-logic unit)
  2. execute a math operation in the ALU
  3. access memory (well, the L1 cache)
  4. write results back into one register

(This explanation has been fudged and simplified, btw).

This internal detail was expressed externally in the instruction set, simplifying decoding. The external instructions specified two registers to read, an ALU opcode, and one register to write. All of this was fit into a constant 32-bits. In contrast, the [68k/x86/VAX] model meant a complex decoding of instructions with a large ROM containing microprograms.

Roughly half (50%) of the 68000's transistors contained this complex decoding logic and ROM. In contrast, for RISC processors, it was closer to 1%. All those transistors could be dedicated to other things. See how tradeoffs snowball? Saving so many transistors involved in instruction decoding meant being able to support other features elsewhere. It's not clear this is a benefit, however. This meant that RISC needed multiple instructions to do the same thing as a single [68k/x86/VAX] instruction.

This meant instructions could be deeply pipelined. Instructions could be overlapped. When reading registers for the current instruction, we can simultaneously be fetching the next, and performing the ALU calculation on the previous instruction. The classic RISC pipeline had 5 stages (the 4 mentioned above plus 1 for fetching the next instruction). Each clock cycle would execute part of 5 instructions simultaneous, each at a different stage in the pipeline.

This was called scalar operation, In previous processor, it would take a variable number of clock cycles for an instruction to complete. In RISC, every instruction had 5 clock cycle latency from beginning to end. And since execution was overlapped/pipelined, executing 5 instructions at a time, the throughput was one instruction per clock cycle.

All CPUs are pipelined to some extent, but they need complex interlocks to prevent things from colliding with each other, such as two pipelined instructions trying to read registers at the same time. RISC removed most of those interlocks, by strictly regulation what an instruction could do in each stage of the pipeline. Removing these interlocks reduced transistor count and sped things up. This could be one possible definition of RISC that you never hear of: it got rid of all these interlocks found in other processors.

Some pipeline conflicts were worse. Because pipelining, the results of an instruction won't be available until many clock cycles later. What if one instruction writes its results to register #5 (r5), and the very next instruction attempts to read from register #5 (r5)? It's too soon, it has to wait more clock cycles for the result.

The answer: don't do that. Assembly language programmers need to know this complication, and are told to simply not write code that does this, because then the program won't work.

This was anathema of the time. Throughout history to this point, each new CPU architecture also had a new operating-system written in assembly language, with many applications written in assembly language. Thus, a programmer-friendly assembly language was considered one of the biggest requirements for any new system. Requiring programmers to know such quirks lead to buggy code was simply unacceptable. Everybody knew that programmer-hostile instruction-sets would never work in the market, even if they performed faster and cheaper.

But technology is littered with what everybody knowing being wrong. In this case, by 1980 we had the  C programming language that was essentially a "portable assembly language" and the Unix operating system written in C. The only people who needed to know about a quirky assembly language were the compiler writers. They would take care of all such problems.

That's why the history lesson above talks about Unix and real computing. Without Unix and C, RISC wouldn't have happened. An operating-system written in a high-level language was a prerequisite for RISC. It's as import an innovation as Moore's Law allowing 100,000 transistors to fit on a chip.

Because of the lack of complex decoding logic, the transistor budget was freed up to support such things as more registers. The Intel x86 architecture famously had 8 registers, while the RISC competitors typically had as many as 32. The limitation was decode space. It takes 5 bits to specify one of 32 possibilities. Given that most every instructions specified two registers to read from and one register to write to, that's 15 bits, or half of the instruction space, leaving 17 bits for other purposes.

The creators of the RISC, Hennessy and Patterson, wrote a textbook called a "Computer Architecture: A Quantitative Approach". It's horrible. It imagines a world where people need to be taught tradeoffs and transistor budgets. But there is no other approach than a quantitative one, it's like an economics textbook "Economics: A Supply And Demand Approach". While the textbook has a weird obsession with quantitative theory, it misses non quantitative tradeoffs, like the fact that RISC couldn't happen without C and Unix. 

Among the snowballing tradeoffs is the load/store architecture, while at the same time, having fewer addressing modes. It's here that we need to go back and discuss history -- what the heck is an "addressing mode"????

In the beginning, computers had only a single general purpose register, called the accumulator. All calculations, like adding two numbers together, involved reading the second value from memory and combining with the first value already in the accumulator. All calculations, whether arithmetic (add, subtract) or logical (AND, OR, XOR) involved one value already in the register, and another value from memory.

Addresses have to be calculated. For example, when accessing elements in a table, we have to take the row number, multiply it by the size of the table, add an offset into the row for desired column, then add all that to the address at the start of the table. Then after calculating this address, we often want to increment the index to fetch the next row.

If the table base address and row index are held in registers, we might get a complex instructions like the following. This calculates and address using two registers r10 and r11, fetches that value from memory, then adds it into register r9.

 ADD r9, [r10 + r11*8 + 4]

Such calculations embedded in the instruction-set were necessary for such early computers. While they had only a single general purpose register (the accumulator), they still had multiple special purpose registers used this way for address calculations. 

For complex computers like the VAX, such address modes imbedded in instructions were no longer necessary, but still desirable. Half the work of the computer is in calculating memory addresses. It's very tedious for programmers to do it manually, easier when the instruction-set takes care of common memory access patterns (like accessing cells within a table).

This leads us to the load/store issue.

With many registers, we no longer need to read another value from memory (a reg-mem calculation). We can instead perform the calculation using two registers (reg-reg). The VAX had such reg-reg instructions, but programmers still mostly used the reg-mem instructions with the complex address calculations.

RISC changed this. Calculations were now exclusively reg-reg, where math operations like addition could only operate on registers. To add something from memory, you needed first to load it from memory into a register, using a separate, explicit instruction. Likewise, writing back to memory required an explicit store operation.

This architecture can be called either reg-reg or load/store, with the second name being more popular.

With RISC, addressing modes were still desirable, but now they applied to only the two load and store instructions.

The available addressing modes were constrained by the limited RISC pipeline and limited 32-bit fixed-length instructions. Since the pipeline allowed for the reading of two registers at the start, adding two registers together to form the address was allowed. The example shown above was too complex, though.

What you are supposed to be reading from all of this is that all of these tradeoffs are linked. Each decision that diverges from the ideal VAX-like architecture snowballed into other decisions that drifted further and further from this ideal, until what we had was something that looked nothing like a VAX.

The upshot of these decisions was being able to reduce a 32-bit MMU CPU into roughly a single chip because it needed fewer transistors, while at the same time performing much faster. It required maybe twice as many instructions to perform the same tasks (mostly due to needing more complex address calculations due to lack of addressing modes), but performed them at maybe 5 times faster, for a significant speed up.

At the time, the VAX was the standard benchmark target. When Sun shipped it's first SPARC RISC systems (the Sun-4), they benchmarked about twice as fast as the latest VAX systems, while being considerably cheaper.

The end of RISC

By the late 1980s, everybody knew that RISC was the future. Sure, Intel continued with its x86 and Motorola with it's 68000, but that's because the market wanted backwards compatibility with legacy instruction-sets. Both attempted to build their own RISC alternatives, but failed. When backwards compatibility wasn't required, everybody created RISC processors, because for 32-bit MMU real computing, they were  better. And everybody knew it.

But of course, everybody was eventually wrong. Even as early as the 80486 in 1989, Intel was converting the innards of the processor into something that looked more RISC-like.

The nail in the coffin came in 1995 with Intel's Pentium Pro processor that supported out-of-order (or OoO) processing. Again, it wasn't really a new innovation. Out-of-order instructions first appeared on 1960s era supercomputers from CDC and IBM. This was the first time that transistor budgets allowed it to be practically used on single-chip microprocessors.

Transistor budgets were so high that designers no longer had to make basic painful tradeoffs. The decisions necessary trying to cram everything into 100,000 transistors were longer meaningful when you had more than 1-million transistors to work with. Instruction-set decoding requiring 20k transistors is important with small budgets, but meaningless with large budgets.

With OoO, the microarchitecture inside the chip looks roughly the same, regardless if it's an Intel x86, ARM, SPARC, or whatever.

This was proven in benchmarks. When Intel released its out-of-order Pentium Pro in 1995, it beat all the competing in-order RISC processors on the market.

Everybody was wrong -- RISC wasn't the future, the future was OoO.

One way of describing the Pentium Pro is that it "translates x86 into RISC-like micro-ops". What that really means is that instead of vertical microcode, it translated things into horizontal, pipelined micro-ops. Most of the typical math operations were split into two micro-ops, one a load/store operation, and the other a reg-reg operation. (Some x86 instructions need even more micro-ops: address calculation, then load/store, then ALU op).

Intel still has an "x86 tax" decoding complex instructions. But in terms of pipeline stages, that tax only applies to the first stage. Typical OoO processors have at least 10 more stages after that. Even RISC instruction-set processors like ARM must translate external instructions into internal micro-ops.

The only significant difference left is the fact that Intel's instructions are variable length. The fixed length instructions of RISC means that multiple can be fetched at once, and decoded all in parallel. This is impossible with Intel x86, they must at least partially be decoded serially, one before the next. You don't know where the next instruction starts until you've figured out the length of the current instruction.

Intel and AMD find crafty ways to get around this. For example, AMD has often put hints in its instruction cache (L1I) to so that decoders can know the length of instructions. Intel has played around with "loop caches" (so-called because they are most useful for loops) that track instructions after they've been decoded, so they don't need to be decoded again.

The upshot is that for most code, there's no inherent difference between x86 and RISC, they have essentially the same internal architecture for out-of-order (OoO) processors. No instruction-set has an inherent advantage over the other.

And it's been that way since 1995.

I mention this because bizarrely this cult has persisted for the last 30 years after OoO replaced RISC for high-end real computers. It ceased being a useful technical distinction, so what are techies still discussing it?

They persist in believing dumb things, for which no amount of deprogramming is possible. For example, they look at mobile (battery powered) devices and note that they use ARM chips to conserve power. They make the assumption that there must be some sort of inherent power-efficiency advantage.

This isn't true. These chips consume less power by simply being slower. Fewer transistors mean less power consumption. This meant while desktops/servers used power-hungry OoO processors, mobile phones went back to the transistor budgets of yesteryear, meaning back to in-order RISC.

But as Moore's Law turned, transistors got smaller, to the point where even mobile phones got OoO chips. They use clever tricks to keep that OoO chip powered down most of the time, often including an in-order chip that runs slower on less power for minor tasks.

We've reached the point where mobile and laptops now use the same chips, your MacBook uses (essentially) the same chip as your iPhone, which is the same chip as Apple desktops.

Now Apple's M1 ARM (and hence RISC) processor is much better at power consumption than it's older Intel x86 chip, but this isn't because it's RISC. Apple did a good job at analyzing what people do on mobile devices like laptops and phones and optimized for that. For example, they added a lot of great JavaScript features, cognizant of the ton of online and semi-offline apps that are written in JavaScript. In contrast, Intel attempts to optimize a chip simultaneously for laptops, desktops, and servers, leading poorly optimizations for laptops.

Apple also does crazy things like putting a high end GPU (graphics processor) on the same chip. This has the effect of making their M1 ARM CPU crazy good for desktops for certain applications, those requiring the sorts of memory-bandwidth normally needed by GPUs.

But overall, x86 chips from AMD and Intel are still faster on desktops and servers.

In addition to the fixed-length instructions providing a tiny benefit, ARM has another key advantage, but it has nothing to do with RISC. When they upgraded their instruction-set to support 64-bit instead of just 32-bit, they went back and redesigned it from scratch. This allowed them to optimize the new instruction-set for the OoO pipeline, such as removing some dependencies that slow things down.

This was something that Intel couldn't do. When it came time to support 64-bit, AMD simply extended the existing 32-bit instructions. A long sequence of code often looks identical between the 32-bit and 64-bit versions of the x86 instruction-sets, whereas they look completely different on ARM 32-bit vs. 64-bit.

What about RISC-V and ARM-on-servers?

We've reached the point in tech where the instruction-set doesn't matter. It's not simply that code is written in high-level language. It's mostly that micro-architectural details have converged.

Take byte-order, for example. Back in the 1980s, most of the major CPUs in the world were big-endian, while Intel bucked the trend being little-endian. The reason is that some engineer made a simple optimization back when the 8008 processor was designed for terminals, and because of backwards compatibility, the poor decision continues to plague x86 today.

Except when it annoys programmers debugging memory dumps, byte-order doesn't matter. Therefore, all the RISC processors allowed a simple bit to be set to switch processors from big-endian to little-endian mode.

Over time, that has caused everyone to match Intel's little-endianess, driven primarily by Linux. The kernel itself supports either mode, but a lot of drivers depend upon byte-order, and user-mode programs developed on x86 sometimes have byte-order bugs. As it was ported to architectures like ARM or PowerPC, most of the time it was done in little-endian mode. (You can get PowerPC Linux in big-endian, but the preference is little-endian, because drivers).

The same effect happens even in things that aren't strictly CPU related, like memory and I/O. The tech stack has converged so that processors look more and more alike except for the instruction-set.

The convergence of architecture is demonstrated most powerfully by Apple's M1 transition, where they stopped using Intel's processors in their computers in favor of their custom ARM processor they created for the iPhone.

The MacBook Air M1 looks identical on the outside compared to the immediately preceding x86 MacBook. But more the the point, it performs almost identically running x86 code -- it runs x86 code at native x86 speeds but on an ARM CPU. The processors are so similar architecturally that instruction-sets could be converted on the fly -- it simply reads the x86 program, converts to ARM transparently on the fly, then runs the ARM version. Previous code translation attempts have incurred massive slowdowns to account for architectural differences, but the M1 cheated by removing any differences that weren't instruction-set related, allowing smooth translation of the instructions.

Technically, instruction-sets don't matter, but for business reasons, they still do. Intel and AMD control x86, and prevent others from building compatible processors. ARM lets others build compatible processors (indeed, making no CPUs themselves), but charges them a license fee.

Especially for processors on the low-end, people don't want to pay license fees.

For that reason, RISC V has become popular. For low-end processors (in-order microcontrollers competing against ARM Cortex Ms) in the 100,000 transistor range, it matters that an instruction-set be RISC. The only free alternative is the aging MIPS. It has annoying quirks, like "delay slots", which are fixed by RISC V. Since RISC V is an open standard, free of license fees, those designing their own low end processor have adopted it.

For example, nVidia uses RISC V extensively throughout its technology. GPUs contain tiny embedded CPUs to manage things internally. They have ARM licenses, but they don't want to pay the pennies it would cost for every unit that ARM charges. Likewise, Western Digital (a big hard-drive maker) designed a RISC V core for its drives. 

There are a lot of RISC V fans due to the RISC cult who insist it should go everywhere, but it's not going anywhere for high-end processors. At the high-end, you are going to pay licensing fees for designs anyway. In other words, while big companies have the resources to design small in-order processors, they don't have the resources to design big OoO processors, and would therefore buy designs from others.

Amazon's AWS Graviton is a good example (ARM-based servers). They aren't licensing the instruction-set from ARM so much as the complete OoO CPU design. They include the ARM cores on a chip of Amazon's design, having memory, I/O, security features tailored to AWS use cases. Neither the instruction-set architecture or micro-architecture particularly matter to Amazon compared to all the other features of their chips.

Lots of big companies are getting into the custom CPU game, licensing ARM cores. Big tech companies tend to have their own programming language, their own operating systems, their own computer designs, and nowadays their own CPUs. This includes Microsoft, Google, Apple, Facebook, and so on. The advantage of ARM processors (or in the future, possibly RISC V processors) isn't their RISC nature, or their instruction-sets, but the fact they are big processor designs that others can included with their own chips. There is no inherent power efficiency or speed benefit -- only the business benefit.

Conclusion

This blogpost is in reaction to that blogpost I link above. That writer just recycles old RISC rhetoric of the past 30 years, like claiming it's a "design philosophy". It, it was a set of tradeoffs meaningful to small in-order chips -- the best way of designing a chip with 100,000 transistors.

The term "RISC" has been obsolete for 30 years, and yet this nonsense continues. One reason is the Penisy textbook that indoctrinates the latest college students. Another reason is the political angle, people hating whoever is dominant (in this case, Intel on the desktop). People believe in RISC, people evangelize RISC. But it's just a cult, it's all junk. Any conversation that mentions RISC can be improved by removing the word "RISC".

OoO has replaced RISC as the dominant architecture for CPUs, and it did so in 1995, and ever since then, the terminology "RISC" is obsolete. The only thing you care about when looking at chips is whether it's an in-order design or an out-of-order design. Well, that's if you care about theory. If you care about practice, you care about whether it supports your legacy tooling and code. In the real world, whether you use x86 or ARM or MIPS or PowerPC is simply because of legacy market conditions. We still launch rockets to Mars using PowerPC processors because that's what the market for radiation-hardened CPUs has always used.

Sunday, July 03, 2022

DS620slim tiny home server

In this blogpost, I describe the Synology DS620slim. Mostly these are notes for myself, so when I need to replace something in the future, I can remember how I built the system. It's a "NAS" (network attached storage) server that has six hot-swappable bays for 2.5 inch laptop drives.

That's right, laptop 2.5 inch drives. It makes this a tiny server that you can hold in your hand.

The purpose of a NAS is reliable storage. All disk drives eventually fail. If you stick a USB external drive on your desktop for backups, it'll eventually crash, losing any data on it. A failure is unlikely tomorrow, but a spinning disk will almost certainly fail some time in the next 10 years. If you want to keep things, like photos, for the rest of your life, you need to do something different.

The solution is RAID, an array of redundant disks such that when one fails (or even two), you don't lose any data. You simply buy a new disk to replace the failed one and keep going. With occasional replacements (as failures happen) it can last decades. My older NAS is 10 years old and I've replaced all the disks, one slot replaced twice.

Monday, January 31, 2022

No, a researcher didn't find Olympics app spying on you

For the Beijing 2022 Winter Olympics, the Chinese government requires everyone to download an app onto their phone. It has many security/privacy concerns, as CitizenLab documents. However, another researcher goes further, claiming his analysis proves the app is recording all audio all the time. His analysis is fraudulent. He shows a lot of technical content that looks plausible, but nowhere does he show anything that substantiates his claims.

Tuesday, December 07, 2021

Journalists: stop selling NFTs that you don't understand

The reason you don't really understand NFTs is because the journalists describing them to you don't understand them, either. We can see that when they attempt to sell an NFT as part of their stories (e.g. AP and NYTimes). They get important details wrong.

The latest is Reason.com magazine selling an NFT. As libertarians, you'd think at least they'd get the technical details right. But they didn't. Instead of selling an NFT of the artwork, it's just an NFT of a URL. The URL points to OpenSea, which is known to remove artwork from its site (such as in response to DMCA takedown requests).

If you buy that Reason.com NFT, what you'll actually get is a token pointing to:

https://api.opensea.io/api/v1/metadata/0x495f947276749Ce646f68AC8c248420045cb7b5e/0x1F907774A05F9CD08975EBF7BF56BB4FF0A4EAF0000000000000060000000001

This is just the metadata, which in turn contains a link to the claimed artwork:

https://lh3.googleusercontent.com/8Q2OGcPuODtCxbTmlf3epFGOqbfCbs4fXZ2RcIMnLpRdTaYHgqKArk7uETRdSZmpRAFsNE8KB4sFJx6czKE5cBKB1pa7ovc4wBUdqQ

If either OpenSea or Google removes the linked content, then any connection between the NFT and the artwork disappears.

It doesn't have to be this way. The correct way to do NFT artwork is to point to a "hash" instead which uniquely identifies the work regardless of where it's located. That $69 million Beeple piece was done this correct way. It's completely decentralized. If the entire Internet disappeared except for the Ethereum blockchain, that Beeple NFT would still work.

This is an analogy for the entire blockchain, cryptocurrency, and Dapp ecosystem: the hype you hear ignores technical details. They promise an entirely decentralized economy controlled by math and code, rather than any human entities. In practice, almost everything cheats, being tied to humans controlling things. In this case, the "Reason.com NFT artwork" is under control of OpenSea and not the "owner" of the token.

Journalists have a problem. NFTs selling for millions of dollars are newsworthy, and it's the journalists place to report news rather than making judgements, like whether or not it's a scam. But at the same time, journalists are trying to explain things they don't understand. Instead of standing outside the story, simply quoting sources, they insert themselves into the story, becoming advocates rather than reporters. They can no longer be trusted as an objective observers.

From a fraud perspective, it may not matter that the Reason.com NFT points to a URL instead of the promised artwork. The entire point of the blockchain is caveat emptor in action. Rules are supposed to be governed by code rather than companies, government, or the courts. There is no undoing of a transaction even if courts were to order it, because it's math.

But from a journalistic point of view,  this is important. They failed at an honest description of what actually the NFT contains. They've involved themselves in the story, creating a conflict of interest. It's now hard for them to point out NFT scams when they themselves have participated in something that, from a certain point of view, could be viewed as a scam.



Sunday, November 07, 2021

Example: forensicating the Mesa County system image

Tina Peters, the election clerk in Mesa County (Colorado) went rogue and dumped disk images of an election computer on the Internet. They are available on the Internet via BitTorrent [Mesa1][Mesa2], The Colorado Secretary of State is now suing her over the incident.

The lawsuit describes the facts of the case, how she entered the building with an accomplice on Sunday, May 23, 2021. I thought I'd do some forensics on the image to get more details.

Specifically, I see from the Mesa1 image that she logged on at 4:24pm and was done acquiring the image by 4:30pm, in and (presumably) out in under 7 minutes.

In this blogpost, I go into more detail about how to get that information.


The image

To download the Mesa1 image, you need a program that can access BitTorrent, such as the Brave web browser or a BitTorrent client like qBittorrent. Either click on the "magnet" link or copy/paste into the program you'll use to download. It takes a minute to gather all the "metadata" associated with the link, but it'll soon start the download:

What you get is file named EMSSERVER.E01. This is a container file that contains both the raw disk image as well as some forensics metadata, like the date it was collected, the forensics investigator, and so on. This container is in the well-known "EnCase Expert Witness" format. EnCase is a commercial product, but its container format is a quasi-standard in the industry.

Some freeware utilities you can use to open this container and view the disk include "FTK Imager", "Autopsy", and on the Linux command line, "ewf-tools".

However you access the E01 file, what you most want to look at is the Windows operating-system logs. These are located in the directory C:\Windows\system32\winevtx. The standard Windows "Event Viewer" application can load these log files to help you view them.

When inserting a USB drive to create the disk image, these event files will be updated and written to that disk before the image was taken. Thus, we can see in the event files all the events that happen right before the disk image happens.


Disk image acquisition

Here's what the event logs on the Mesa1 image tells us about the acquisition of the disk image itself.

The person taking the disk image logged in at 4:24:16pm, directly to the console (not remotely), on their second attempt after first typing an incorrect password. The account used was "emsadmin". Their NTLM password hash is 9e4ec70af42436e5f0abf0a99e908b7a. This is a "role-based" account rather than an individual's account, but I think Tina Peters is the person responsible for the "emsadmin" roll.

Then, at 4:26:10pm, they connected via USB a Western Digital  "easystore™" portable drive that holds 5-terabytes. This was mounted as the F: drive.

The program "Access Data FTK Imager 4.2.0.13" was run from the USB drive (F:\FTK Imager\FTK Imager.exe) in order to image the system. The image was taken around 4:30pm, local Mountain Time (10:30pm GMT).

It's impossible to say from this image what happened after it was taken. Presumably, they immediately hit "eject" on the drive, logged off, and disconnected the hard drive. Thus from beginning to end, it took about 7 minutes to take the image once they sat down at the computer.


Dominion Voting Systems

The disk image is that of a an "EMS Server" part of the Dominion Voting Suite. This is a server on an air-gapped network (not connected to any other network) within the count offices.

Most manuals for Colorado are online, though some bits and pieces are missing, and can be found in documents posted to other state's websites (though each state does things a little different, so such cross referencing can't be completely trusted).

The locked room with an air-gapped network  you see in the Mesa County office appears to look like the following, an "EMS Standard" configuration (EMS stands for Election Management System).


This small network is "air gapped", meaning there is no connection from this network to any other network in the building, nor out to the Internet. By looking at the logs from the Mesa1 image, we can see what this network looks like:
  • The EMS Server is named "EMSERVER" with IP address 192.168.100.10 and MAC address 44-A8-42-30-01-5D. The hard drive matches Dominion's specs: a 1-terabyte boot drive (C:) and a 2-terabyte data drive (D:) that is shared with the rest of the network as \\EMSERVER\NAS. This also acts as the network's DHCP and DNS server.
  • At least one network printer, model Dell E310dw.
  • Two EMS Workstations (EMSCLIENT01 and EMSCLIENT02). This is where users spend most of their time, before an election to create the ballots, and after all the ballots have been counted to construct the final tally.
  • Four ImageCast Central (ICC) (ICC01 - ICC04) scanners, for automatically scanning and tabulating ballots.
  • Two Adjudication Workstations (ADJCLIENT01 and ADJCLIENT03). These are used when the scanners reject ballots, such as when somebody does a write-in candidate, or marks two candidates. Humans need to get involved to make the final judgement on what the ballot actually says.

Note this isn't the machines you'd expect to see in a precinct when you vote (which would be "ballot marking devices" predominantly). These are the machines in the back office that count the votes and store the official results.


Conclusion

What we see here is that "system logs" can tell us a lot of interesting things about the system. There's good reason to retain them in the future.

On the other hand, they generally can't answer the most important question: whether the system was hacked and votes flipped.

Mike Lindell claims to have "Absolute Proof" that Chinese hackers flipped votes throughout the country, including Maricopa County. If so, this would've been the system that the Chinese hackers would've hacked. Yet, in the system image, there is no evidence of this. By this, I mean the Mesa1 image, the one from before the system logs were deleted (obviously, there would be nothing in the Mesa2 image).

This lack of hacking evidence in the logs isn't proof that it didn't happen, though. The fact is, the logs aren't comprehensive enough to record most hacks, and the hackers could've deleted the logs anyway. That's why system logs aren't considered "election records" and that laws don't mandate keeping them: they could have some utility, as I've shown above, but they really wouldn't show the things that we most want to know.






Sunday, October 31, 2021

Debunking: that Jones Alfa-Trump report

The Alfa-Trump conspiracy-theory has gotten a new life. Among the new things is a report done by Democrat operative Daniel Jones [*]. In this blogpost, I debunk that report.

If you'll recall, the conspiracy-theory comes from anomalous DNS traffic captured by cybersecurity researchers. In the summer of 2016, while Trump was denying involvement with Russian banks, the Alfa Bank in Russia was doing lookups on the name "mail1.trump-email.com". During this time,  additional lookups were also coming from two other organizations with suspicious ties to Trump, Spectrum Health and Heartland Payments.

This is certainly suspicious, but people have taken it further. They have crafted a conspiracy-theory to explain the anomaly, namely that these organizations were secretly connecting to a Trump server.

We know this explanation to be false. There is no Trump server, no real server at all, and no connections. Instead, the name was created and controlled by Cendyn. The server the name points to for transmitting bulk email and isn't really configured to accept connections. It's built for outgoing spam, not incoming connections. The Trump Org had no control over the name or the server. As Cendyn explains, the contract with the Trump Org ended in March 2016, after which they re-used the IP address for other marketing programs, but since they hadn't changed the DNS settings, this caused lookups of the DNS name.

This still doesn't answer why Alfa, Spectrum, Heartland, and nobody else were doing the lookups. That's still a question. But the answer isn't secret connections to a Trump server. The evidence is pretty solid on that point.

Sunday, October 24, 2021

Review: Dune (2021)

One of the most important classic sci-fi stories is the book "Dune" from Frank Herbert. It was recently made into a movie. I thought I'd write a quick review.

The summary is this: just read the book. It's a classic for a good reason, and you'll be missing a lot by not reading it.

But the movie Dune (2021) movie is very good. The most important thing to know is see it in IMAX. IMAX is this huge screen technology that partly wraps around the viewer, and accompanied by huge speakers that overwhelm you with sound. If you watch it in some other format, what was visually stunning becomes merely very pretty.