Friday, January 18, 2008

Hex.lore

If I ever get around to writing a computer book, one of the first I would write would be about the lore of hexadecimal. We teach children the basics of mathematics by starting out with addition and subtraction. The equivalent in hacking is hexadecimal (or simply, "hex"). Hex is the starting point for most hacking.

The reason I'd want to write an entire book about it is that most people don't fully grok hex. Indeed, some very smart people can demonstrate a lot of expertise in hacking without quite understanding hex. It's one of those things that you can safely skip most of the time, but you can't quite skip it all the time. Among those at the very top of our industry, groking hex is required. There are some things that can only be fully explained by analyzing a raw hex dump.

I think one problem people have is that hexadecimal is introduced to people as a "base-16 numbering system". This mathematical explanation is unsatisfying because hackers rarely add/subtract hex numbers. We may occasionally add '0x9' to '0x41' to get '0x4A', but this is rare. Instead, hackers are mostly concerned that '0x41' represents the ASCII letter 'A', the x86 instruction 'inc ecx', or the binary value 01000001.

Hackers are also interested in a deeper lore. Many intrusion detection systems trigger on a sequence of bytes having the value 0x41 because that is a common buffer filler in proof-of-concept exploits. A significant amount of hacker literature ends with a demonstration of a computer crashing because it tried to exploit code at location 0x41414141 - which meant that the hacker was able to redirect execution of computer code with the contents of a buffer. This is "meaning" to the hex data far beyond any mathematics.

Another part of hex lore is seeing data structures. Data inside a computer has structure. In other words, some of the bytes hold the data itself, and other bytes tell us how to interpret the data. For example, text within a computer is sometimes "nul-terminated" and sometimes "length encoded". The 'nul' byte has the value of zero, or 0x00. The "ABC" represented as a nul-terminated string would look like:

41 42 43 00

A length-encoded string would look different. The first byte would have a value that would indicate how long the rest of the string is:

03 41 42 43

Back in 2006, Dave Maynor caused a bit of controversy by claiming that Apple had bugs in their WiFi drivers. The bugs were when the SSID (name of the access point) was longer than 96 bytes, or the number of speeds (11-mbps, 54-mpbs, etc.) was larger than 17. The controversy started when Apple's PR machine claimed that Dave Maynor hadn't found these bugs. Whether you believe Apple or Dave Maynor is largely determined by whether you've seen the hex dump of a WiFi packet. The WiFi standard says that an SSID should not be longer than 32-bytes, and that there aren't more than 17 possible speeds that an access point can have. However, WiFi "length encodes" these fields with a single byte, meaning they can be as long as 255-bytes. Therefore, if a hacker creates a packet with lengths longer than the code expects, they can cause a problem.

This issue is obvious to anybody who has looked at the packets in hex, but a mystery to everybody else. Hence, the controversy.

Groking hex gives you psychic abilities. For example, one of my favorite computer games of all time is Diablo 2. I completed hacked that program to get an advantage over other players. There are two basic ways of hacking online games: (1) hack the packets and (2) hack the code. I started by looking at the packets. I noticed that they consisted of purely random bytes with the occasional hex sequence of FF FF FF. From this data, I immediately concluded that they were "compressed using Huffman encoding", and quickly found the tables involved within the code.

It would take a chapter of a book to explain the above conclusion - and I think that would be a good basis for a book. Another chapter could explain WiFi packets. Another chapter could dissect a hex dump of a virus. My favorite chapter would be a discussion of the "Witty" worm: everyone believes it was launched with a "hit" list, but that can be disproved by analyzing the hex in the slack area of the packet.

If anybody has similarly interesting tales of taking raw hex information and turning it into useful information, I'd love to hear about them. Please send me mail, or add a comment to this blog.

9 comments:

Chris Rohlf said...

A long time ago my first introduction to hex and computer security came in the form of a packet sniffer. The sniffer I was using at the time did not have decoding support for many protocols. Even though there were better sniffers available, I looked up the RFC to the particular protocol I was looking at and manually parsed the packet. It was pretty neat to see that seemingly random data come to life into a real valid structure with meaning. Recognizing valid data/structures without that RFC or any prior knowledge comes with experience. Those first lessons have been a tremendous help when examining undocumented protocols.

kowsik said...

0x0806 and 0x0800 are 2 good magic numbers if you are debugging network packets within a sniffer, ID/PS or working on the network stack of a kernel. The first one is the ethernet type for ARP and the second one is for IP. Finally finding 0x45 somewhere in the packet usually tells you where the IP header begins. 4's the IP version and 5 is the header length (with no options).

Robert Graham said...

Chris: I divide the world in to two types of people: those who have manually parsed a hexdump using an RFC, and those who haven't :-).

As for "admin"s comment above, did you notice in my post that I link to a hexdump of a Witty Worm packet? Did you see the 08 00 45 in that packet? Discuss.

awing said...

Take for example, embedded programming on Microchip PIC micro-controllers (and other CPUs). Say PORTA is a collection of 8 pins. These 8 pins are bits. So say PORTA=0x0C, you get 00001100 on the actual pins of the processor in logic (+V/GND) levels.

Also, any single hex digit maps to 4 bits. Two hex digits are byte and so on. Now, for me, thinking in hex is just binary in 4 bit blocks.

Jeff said...

I would really encourage you to write such a book. Interpretting hex is one of those black arts that just isn't taught anymore, and is somewhat difficult to learn on your own. I'll admit it's certainly one of my weak areas... Sure, I can figure out how to write common tcpdump filters for packet types and such, but usually when I see a long string of hex, I'm in for a lot of time with documents, pen/paper, and a confused look on my face. I'd love for there to be a reference demystifying this somewhat.

Jeff said...

I would really encourage you to write such a book. Interpretting hex is one of those black arts that just isn't taught anymore, and is somewhat difficult to learn on your own. I'll admit it's certainly one of my weak areas... Sure, I can figure out how to write common tcpdump filters for packet types and such, but usually when I see a long string of hex, I'm in for a lot of time with documents, pen/paper, and a confused look on my face. I'd love for there to be a reference demystifying this somewhat.

Jolly said...

I'd personally love such a book.

ae said...

Great post; great idea. Hex *is* some strange kind of magic that's given me the power to amaze and enthrall otherwise smarter people than myself. I'll be releasing a whitepaper tangentially related to this magic in a few months, and presenting on it in April (at RSA no less, lol).

I first learned the beauty of hex when debugging horribly written and deployed webapps at an ecommerce venture... I still remember the magic of seeing that our software was trying to connect to table names across the wire (DNS/WINS/NBB locate), as opposed to trying to connect to the DB instance & host name as it should have. Ah, and the ensuing spray of NB broadcasts.

I got me a pro grade sniffer and sat around on the weekends for months learning about the traffic and the magic of hex, and soon after became enthralled with snort for a year or so. (the place I was at was an ISS shop and I just hated the RS product, since it FP'd all the time and back in those days you couldn't get under the hood and figure out what it was doing.)

My next hex evolution came when I noticed people gluing all sorts of crap together in their monolithic webapp projects, and cannonicalizing all sorts of crazy things in crazy ways (see any large european PHP portal project)....and the magic of hex breathed new life into my pen testing.

Today I work with WhiteHat Security's Sentinel platform, and it performs a significant number of encoded and layered-encoding attacks, and flags interesting/suspicious cannonicalization so we can explore it by human for exploitable transcoding conditions. Just this morning we found some very weird malformed UTF-7 cannonicalization to US Latin ASCII allowing us punch certain control characters right past the UTF-7-friendly input validation.

Great stuff man. Cheers. -ae

John said...

What resources would you recommend for someone who's interested in learning hex? I probably don't have enough time to wait for your book.