Thursday, January 29, 2009

Coding Heresy: shellcode in software developement

One of the odd things about coding is that you are taught to follow some sort of moral code. Your professors are high priests that try to teach you right from wrong. For example, “global variables” are a well-known “evil”, and it’s morally wrong to use them.

Another heresy is using “shellcode” in the course of normal software development. Shellcode is the opaque array of bytes that hackers include in the buffers that they overflow, which give them a remote command shell on the victim machine. However, you can also use the idea to execute bits of assembly language code in a portable manner within software projects.

Today, you have the same x86 processor underneath Windows, Linux, and Macintosh. However, there is no easy way to write assembly language for all these systems. The syntax ‘gcc’ uses to assemble code is different than the Microsoft syntax. What you write as “mov %ebx, %eax” under gcc becomes “mov eax, ebx” in the Microsoft syntax.

The obvious solution to this problem is to assemble the code only once, then dump the resulting bytes into your project.

I’m currently writing a password cracker. I want to run the password cracking code on all the machines in my home, and I have Linux, Windows, and Mac OS X systems. I also want to it to use SSE instructions, which are 5 times faster than normal code. SSE code must be programmed in assembly language.

To write the assembly language functions, I use Microsoft’s assembler. I don’t use the resulting functions, but instead dump their raw bytes. I then use those raw bytes in the code.

This is shown below with a simple function that returns ‘1’ if SSE is supported by the processor, and ‘0’ if SSE is not supported. This uses the assembly language instruction "cpuid".
unsigned cpu_supports_sse2()
{
static unsigned char cpu_supports_sse2_shellcode[] = {
0x53, /*push ebx*/
0x33, 0xC0, /*xor eax,eax*/
0x0F, 0xA2, /*cpuid*/
0x83, 0xF8, 0x01, /*cmp eax,1*/
0x73, 0x04, /*jae continue*/
0x33, 0xC0, /*xor eax,eax*/
0xEB, 0x0F, /*jmp end*/
/*continue:*/
0xB8, 0x01,0x00,0x00,0x00, /*mov eax,1*/
0x0F, 0xA2, /*cpuid*/
0x8B, 0xC2, /*mov eax,edx*/
0xC1, 0xE8, 0x1A, /*shr eax,1Ah*/
0x83, 0xE0, 0x01, /*and eax,1*/
/*end:*/
0x5B, /*pop ebx*/
0xC3 /*ret*/
};
return ((unsigned (*)())cpu_supports_sse2_shellcode)();
}


As you can see, this code will compile and run the same whether or not it’s a gcc compiler for Mac OS X and Linux, or Microsoft’s compilers for Windows. As far as the compilers can figure out, it’s just an array of bytes. (I include the original assembly language as comments, but these are ignored by the compiler).

Elsewhere in my code, I do the same thing for the SSE version of the password cracking routine. In one file, I call the following function:
crypto_ntmd4_sse(data_block);

In another file, I define the symbol as an array of bytes:
unsigned char crypto_ntmd4_sse[] = {
0x55,0x8b,0x6c,0x24, 0x08,0x53,0x56,0x57,
0x66,0x0f,0x6f,0x45, 0x90,0x8b,0x45,0x90,
0x66,0x0f,0x6f,0x4d, 0xa0,0x8b,0x5d,0xa0,
...


This function is over 2-kilobytes long, so I don’t hand disassemble it like I did the last one, but rest assured, those bytes are x86 code.

The heresy works because C is an “unmanaged” language. It knows the difference between “code” and “data”, but you can easily confuse it to think that “data” is “code”. It will happily jump to your data.

Code auditing tools should know the difference. They should be able to detect that something heretical is going on and put up warnings. Likewise, security tools might notice that the program is executing code within the “data” segment.

As a result of this heresy, I’ve got a nice little program that compiles cleanly on all the systems, without any hassle (such as the hell that is Cygwin).

What happens if I want to change the assembly language routine? The first answer is “I usually don’t”. That’s what makes assembly language different from other source. Your frequently edit C software, but you avoid touching assembly language routines. If you want to make a change, you usually end up rewriting the routine from scratch. In any event, I still have the assembly source, it’s just conditionally compiled into an unused function. If I want to change it, I’ll just manually copy the bytes out again. Here is the cpuid function source:
#ifdef _MSC_VER
unsigned __declspec(naked) msvc_cpu_supports_sse2()
{
_asm {
push ebx
xor eax, eax
cpuid
cmp eax, 1
jae xcontinue
xor eax, eax
jmp end
xcontinue:
mov eax, 1
cpuid
mov eax, edx
shr eax,1Ah
and eax,1
end:
pop ebx
ret
}
}
#endif


The last question is how this works with non-x86 processors. The answer to that is easy. For the ‘cpuid’ function that tests for SSE, you replace it with a version that simply does a “return 0;”. For the actual cracking function, you simply use the non-SSE version written in normal C. Thus, my program runs just fine on a PowerPC version of the Macintosh, but a lot slower. I’ll have to figure out an AltiVec version eventually, probably sometime after I get the GPU version up and running.

4 comments:

Unknown said...

Very interesting. I hope that last part about a GPU version wasn't a joke, that'd make for a neat blog post as well! :-D

Sam Mason said...

If you use an operating system that sets the NX feature of modern processors this isn't going to work nearly as easily. You'll end up with lots of untidy calls to mmap(PROT_EXEC) (or the equivalent in Windows) on your arrays.

@nbrito said...

Awesome post about programs with embedded shellcodes.

I just want to point out that there is another approach that could be used to take advantages of shellcode instead of the "de facto" 'char shellcode[] ='.

Someone could use an external binary file, wich is arelady compiled and has all the assembly instructions in binary format.

So even doing a binary audit the program's user would not be able to identify the malicious shellcode.

The file could be compiled and release under '.DLL' or '.VXD' extensions, which are not so suspicious as '.BIN', '.EXE' or even '.DAT'.

So image the 'PROGRAM.EXE' which requires a 'GOODSTUFF.[DLL|VXD]'.

Anytime the 'PROGRAM.EXE', and at this point I don't need to say that we could perform such bad thing under any other OS, it should 'fopen("GPOODSTUFF.DLL", "r")' and then reads all the contents to perform bad things.

We have a good example with LDS Research Group's 'WASM.DAT'.

This approach will have as such successful portability as the simplest mode 'char shellcode[] =' would have... And one will release it under binary format...

My 2 cents.

-nb

Capt. Awesome said...

uhm.... gcc -masm??? It's documented for use with -S for outputting asm, but it works just fine for compilation as well. I think part of the reason you wouldn't have found this when Googling around, is that the two syntax (syntaxes??) are called Intel and AT&T not Microsoft and GCC.