If I ever get around to writing a computer book, one of the first I would write would be about the lore of hexadecimal. We teach children the basics of mathematics by starting out with addition and subtraction. The equivalent in hacking is hexadecimal (or simply, "hex"). Hex is the starting point for most hacking.
The reason I'd want to write an entire book about it is that most people don't fully grok hex. Indeed, some very smart people can demonstrate a lot of expertise in hacking without quite understanding hex. It's one of those things that you can safely skip most of the time, but you can't quite skip it all the time. Among those at the very top of our industry, groking hex is required. There are some things that can only be fully explained by analyzing a raw hex dump.
I think one problem people have is that hexadecimal is introduced to people as a "base-16 numbering system". This mathematical explanation is unsatisfying because hackers rarely add/subtract hex numbers. We may occasionally add '0x9' to '0x41' to get '0x4A', but this is rare. Instead, hackers are mostly concerned that '0x41' represents the ASCII letter 'A', the x86 instruction 'inc ecx', or the binary value 01000001.
Hackers are also interested in a deeper lore. Many intrusion detection systems trigger on a sequence of bytes having the value 0x41 because that is a common buffer filler in proof-of-concept exploits. A significant amount of hacker literature ends with a demonstration of a computer crashing because it tried to exploit code at location 0x41414141 - which meant that the hacker was able to redirect execution of computer code with the contents of a buffer. This is "meaning" to the hex data far beyond any mathematics.
Another part of hex lore is seeing data structures. Data inside a computer has structure. In other words, some of the bytes hold the data itself, and other bytes tell us how to interpret the data. For example, text within a computer is sometimes "nul-terminated" and sometimes "length encoded". The 'nul' byte has the value of zero, or 0x00. The "ABC" represented as a nul-terminated string would look like:
41 42 43 00
A length-encoded string would look different. The first byte would have a value that would indicate how long the rest of the string is:
03 41 42 43
Back in 2006, Dave Maynor caused a bit of controversy by claiming that Apple had bugs in their WiFi drivers. The bugs were when the SSID (name of the access point) was longer than 96 bytes, or the number of speeds (11-mbps, 54-mpbs, etc.) was larger than 17. The controversy started when Apple's PR machine claimed that Dave Maynor hadn't found these bugs. Whether you believe Apple or Dave Maynor is largely determined by whether you've seen the hex dump of a WiFi packet. The WiFi standard says that an SSID should not be longer than 32-bytes, and that there aren't more than 17 possible speeds that an access point can have. However, WiFi "length encodes" these fields with a single byte, meaning they can be as long as 255-bytes. Therefore, if a hacker creates a packet with lengths longer than the code expects, they can cause a problem.
This issue is obvious to anybody who has looked at the packets in hex, but a mystery to everybody else. Hence, the controversy.
Groking hex gives you psychic abilities. For example, one of my favorite computer games of all time is Diablo 2. I completed hacked that program to get an advantage over other players. There are two basic ways of hacking online games: (1) hack the packets and (2) hack the code. I started by looking at the packets. I noticed that they consisted of purely random bytes with the occasional hex sequence of FF FF FF. From this data, I immediately concluded that they were "compressed using Huffman encoding", and quickly found the tables involved within the code.
It would take a chapter of a book to explain the above conclusion - and I think that would be a good basis for a book. Another chapter could explain WiFi packets. Another chapter could dissect a hex dump of a virus. My favorite chapter would be a discussion of the "Witty" worm: everyone believes it was launched with a "hit" list, but that can be disproved by analyzing the hex in the slack area of the packet.
If anybody has similarly interesting tales of taking raw hex information and turning it into useful information, I'd love to hear about them. Please send me mail, or add a comment to this blog.