Update: Peter was kind enough to whip up some legit web 2.0-ish graphing with some IDAPython to visualize the read() function referenced in this blog post. Check it out here (its draggable, and stuff).
Quite often at the ZDI we receive submissions that go something like this:
"When this fuzzed file is loaded into process X it causes an access violation. Here is the assembly at the point of the crash and a call stack:
0:030> g (530.758): Access violation - code c0000005 (first chance) First chance exceptions are reported before any exception handling. This exception may be expected and handled. eax=020108d8 ebx=015d0178 ecx=02012bf0 edx=41414141 esi=020108d0 edi=015d0000 eip=7c82a99f esp=015afd20 ebp=015afdf0 iopl=0 nv up ei ng nz na pe cy cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010287 ntdll!RtlFreeHeap+0x4bd: 7c82a99f 8902 mov dword ptr [edx],eax ds:0023:41414141=???????? 0:030> kv ChildEBP RetAddr Args to Child 015afdf0 78134c39 015d0000 00000000 020108d8 ntdll!RtlFreeHeap+0x4bd 015afe3c 0042f18c 020108d8 00000000 00000000 MSVCR90!free+0xcd 015aff58 009737e6 01c663c0 01bb2380 020108d8 x+0x2f18c 015aff78 781329bb 01c663c0 8113a173 00000000 x+0x846fa 015affb0 78132a47 00000000 77e64829 015d4c88 MSVCR90!_callthreadstartex+0x1b 015affb8 77e64829 015d4c88 00000000 00000000 MSVCR90!_threadstartex+0x66 015affec 00000000 781329e1 015d4c88 00000000 kernel32!BaseThreadStart+0x34 Please make an offer on this information, !exploitable says its exploitable. I do not have the original non-fuzzed file anymore due to disk space problems."
Now, this is obviously heap corruption as the backtrace shows us that a heap chunk's metadata was probably corrupted due to some prior operation and is being used in a free or coalesce. What we are seeing are the effects of the corruption... and unfortunately this doesn't give us too much information that will help us locate the root cause of the bug. Trying to debug from this point is what I usually refer to as "bottom-up" reversing and its usually a bit trickier than reversing "top-down"--that is, reversing from the point of user input to the problematic code.
The process of reversing from user input can be a tedious one and it can be sped up by making use of hardware breakpoints. The idea is that if we find where our user data is first read into the process, we can set a memory breakpoint on it and find the first operation that acts upon it. Now, after you've done this on hundreds of bugs it might occur to you that this can be automated. Indeed, we can automate a lot of this using a debugger, however we'd have to use software breakpoints (int 3) and those require a context switch which can be inefficient.
Summary In this blog post I will walk through an alternate way to perform this debugging process that can be abstracted to solve a bunch of different problems. Additionally, the example code we'll use here requires simply Python and it's built-in ctypes module.
First off, let's start with an overview of the technique we're going to code the functionality for. The following example will be automating the discovery of the point at which a program reads in user data using MSVCR90.dll!read. This just so happens to be the way that Adobe Shockwave reads in DIR files via Internet Explorer, so we'll use that as an example.
So, first off... we need to disassemble MSCVR90.dll and check out it's read() function. Here's a screenshot of it:
Now, we'd like to hook this function at two points. One to catch the arguments that were passed to the function, and the other to search the data that was read in for the value we want to find.
Hook Point #1 When we hook, we are going to need to clobber 5 bytes worth of instructions. This is because what we'll effectively be doing is patching in a jump (5 opcode bytes) that points to our hook code that we want to run.
Here is an ideal location to hook:
.text:78586EB5 8B C8 mov ecx, eax .text:78586EB7 C1 F9 05 sar ecx, 5 .text:78586EBA 8D 1C 8D A0 D6 5B 78 lea ebx, ___pioinfo[ecx*4]
We choose this location for several reasons. Firstly, its near the function prologue and the arguments that were pushed to read should be easily retrieved from the stack. Secondly, within 5 bytes from 0x78586EB5 there are no jump instructions. If we were to lift this code out and it had a jump instruction, it would be a near jump. Thus if we relocated it, we'd have to promote the branch to a jmp in order for it to disassemble properly (and thats out of scope for this post).
Hook Point #2 The second ideal location to hook is the following:
.text:78586F35 C7 45 FC FE FF FF FF mov [ebp+ms_exc.disabled], 0FFFFFFFEh
This location is ideal as it is near the end of the function and thus the user data would have been read into the destination buffer that was pushed to read() by this point. So, what we'll end up doing is patching in another jump here that will jump to our code that will search that buffer for the value we care about.
Side-note: We don't actually need 2 hook points to accomplish this task, but I'm showing how to do 2 so the ideas can be abstracted for other purposes...
Overview of Injection Plan Here's the plan... we are going to patch those code locations with jumps. But, jumps to what?
Basically, we're going to set up an "arena" in the heap in which to stuff our custom code to run as well as the code we "lifted" from MSVCR90!read (we want our code to run, then we want to run the original code, and then return execution back to MSVCR90). So, here's what the memory arena will look like:
Offset Contains 0x00 our code to grab read()'s arguments off the stack followed by a jump to offset 0x60 0x20 our code to search the destination buffer for our value followed by a jump to offset 0x80 0x60 the original code from MSVCR90!read (0x78586EB5) that we are "lifting" to here 0x80 the original code from MSVCR90!read (0x78586F35) that we are "lifting" to here
The offsets were chosen arbitrarily (I just ensured they were large enough to contain the code we're going to put there).
So, the first thing we need to do is write the code that we want to run at the first hook point. In other words, the code that will grab the arguments off the stack (specifically the destination buffer pointer and the size passed to read). Here's what I cobbled together:
[BITS 32] ; save registers we'll use push ecx push esi mov ecx, [esp+0x10] ; grab the size passed to read mov esi, [esp+0x0c] ; grab the destination pointer mov [0x41414141], ecx ; write it to memory at 0x41414141 mov [0x42424242], esi ; write it to memory at 0x42424242 pop esi pop ecx
Now, you'll notice I used two static addresses there, 0x41414141 and 0x42424242. I will replace those addresses with Python when we inject this code.
At this point we need to assemble that into its opcodes. We'll use nasm for that task:
[deft@host v90]$ nasm -f bin -o grab_args grab_args.asm [deft@host v90]$ xxd grab_args 0000000: 5156 8b4c 2408 8b74 2404 890d 4141 4141 QV.L$..t$...AAAA 0000010: 8935 4242 4242 5e59 .5BBBB^Y [deft@host v90]$
In Python-speak, this will be:
args_hook = "\x51\x56\x8b\x4c\x24\x10\x8b\x74\x24\x0c\x89\x0d" args_hook += saved_size + "\x89\x35" + saved_dst + "\x5e\x59"
We obviously dont want to write those values to those static addresses (0x41414141 and 0x42424242), so what we'll end up doing is allocating some memory to hold them. We'll get to that later but you'll note we have two variables, saved_dst and saved_size.
The second hook is a bit more complicated because it will actually be searching for a 4-byte value in the memory that read() wrote user data to. Here it is:
[BITS 32] ; save registers+flags pushad pushfd ; ecx = the size passed to read (retrieved from our first hook) mov ecx, 0x61616161 ; esi = the address of the buffer that ; read wrote to (as retrieved by our first hook) mov esi, 0x62626262 ; divide for 4-byte search shr ecx, 2 ; eax = the value we want to find ; this will be changed in our python mov eax, 0x41414141 TEXT .loop cmp [esi], eax jnz .increment jz .success .increment lea esi, [esi+4] dec ecx jz .fail jmp .loop .success ; on success, throw an int3 to the debugger int3 ; and esi will point to the value in memory jmp .exit .fail ; explicitness ;) jmp .exit .exit popfd popad
And here's what it looks like assembled:
[deft@host v90]$ nasm -f bin -o search search.asm [deft@host v90]$ xxd search 0000000: 609c b961 6161 61be 6262 6262 c1e9 02b8 `..aaaa.bbbb.... 0000010: 4141 4141 3906 7502 7408 8d76 0449 7408 AAAA9.u.t..v.It. 0000020: ebf2 cce9 0500 0000 e900 0000 009d 61 ..............a [deft@host v90]$
...and in Python:
search_hook = "\x60\x9c\x8b\x0d" + "A"*4 + "\x8b\x35" + "B"*4 + "\xc1\xe9\x02\xb8" + needle search_hook += "\x39\x06\x75\x02\x74\x08\x8d\x76\x04\x49\x74\x08\xeb\xf2\xcc\xe9\x05\x00" search_hook += \x00\x00\xe9\x00\x00\x00\x00\x9d\x61"
where needle is the 4 byte value we are looking for.
Injection with Python In order to accomplish this code injection fu, we're going to need the ability to allocate memory in a remote process as well as write to its memory space (and change memory permissions where needed).
We are going to use Python's ctypes module which allow us to load DLLs and call their functions. The module we're going to need is kernel32.dll. Specifically, we're going to use the following functions:
To utilize the allocation routines and WriteProcessMemory in a remote process, we're going to need a handle to it. So, we need to start off with a call to OpenProcess. Here's the relevant Python:
import ctypes kernel32 = ctypes.WinDLL('kernel32.dll') def get_handle(pid): PROCESS_VM_OPERATION = 0x0008 PROCESS_VM_READ = 0x0010 PROCESS_VM_WRITE = 0x0020 PROCESS_SET_INFORMATION = 0x0200 PROCESS_QUERY_INFORMATION = 0x0400 PROCESS_INFO_ALL = PROCESS_QUERY_INFORMATION|PROCESS_SET_INFORMATION PROCESS_VM_ALL = PROCESS_VM_OPERATION|PROCESS_VM_READ|PROCESS_VM_WRITE res = kernel32.OpenProcess(PROCESS_INFO_ALL | PROCESS_VM_ALL, False, pid) print "Returning handle %d" % res return res
Once we have a handle, we can allocate memory and change memory permissions. Our first step at this point is to allocate our heap arena to hold MSVCR90's lifted code as well as our hooks. To do this we'll need to use VirtualAllocEx:
def allocate(handle, size): MEM_COMMIT = 0x1000 MEM_RESERVE = 0x2000 PAGE_EXECUTE_READWRITE = 0x40 count = size perms = MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE res = kernel32.VirtualAllocEx(handle, 0x0, count * 0x1000, perms) print "Allocated memory for handle %d at 0x%08x" % (handle, res) return res
Then, we'll want to write our hooks and lifted code to the address returned by allocate. But first, we want to make two more allocations to hold the saved destination pointer and the saved size:
saved_size = struct.pack("L", allocate(handle, 4)) saved_dst = struct.pack("L", allocate(handle, 4))
Now, when we inject out hooks we can use those addresses and thus dynamically change our injected opcodes. Also, let's go ahead and define the "needle", or the four bytes we are going to look for. In this case, we're going to search for the first 4 bytes of any DIR file (Shockwave director) which is "RIFX" as can be seen in this partial hexdump of a DIR file:
0000h: 52 49 46 58 00 00 DC 50 4D 56 39 33 69 6D 61 70 RIFX..�PMV93imap 0010h: 00 00 00 18 00 00 00 01 00 00 00 2C 00 00 07 3A ...........,...: 0020h: 00 00 00 00 00 00 00 00 00 00 00 00 6D 6D 61 70 ............mmap 0030h: 00 00 0F E0 00 18 00 14 00 00 00 CA 00 00 00 97 ...�.......�...,
Here is the dynamic replacement we perform on our hook's opcodes:
needle = "RIFX" args_hook = "\x51\x56\x8b\x4c\x24\x10\x8b\x74\x24\x0c\x89\x0d" + saved_size + args_hook += "\x89\x35" + saved_dst + "\x5e\x59" search_hook = "\x60\x9c\x8b\x0d" + saved_size + "\x8b\x35" + saved_dst search_hook += "\xc1\xe9\x02\xb8" + needle search_hook += "\x39\x06\x75\x02\x74\x08\x8d\x76\x04\x49\x74\x08" search_hook += "\xeb\xf2\xcc\xe9\x05\x00\x00\x00\xe9\x00\x00\x00\x00\x9d\x61"
Also, let's go ahead and define the original opcode bytes for the code we are lifting out:
# .text:78586EB5 8B C8 mov ecx, eax # .text:78586EB7 C1 F9 05 sar ecx, 5 # .text:78586EBA 8D 1C 8D A0 D6 5B 78 lea ebx, ___pioinfo[ecx*4] hook1_orig = "\x8b\xc8\xc1\xf9\x05\x8d\x1c\x8d\xa0\xd6\x5b\x78" # .text:78586F35 C7 45 FC FE FF FF FF mov [ebp+ms_exc.disabled], 0FFFFFFFEh hook2_orig = "\xc7\x45\xfc\xfe\xff\xff\xff"
Now that we have all the opcodes ready for injection, let's review the offsets we're going to use. Remember:
Offset Contains 0x00 our code to grab read()'s arguments off the stack followed by a jump to offset 0x60 0x20 our code to search the destination buffer for our value followed by a jump to offset 0x80 0x60 the original code from MSVCR90!read (0x78586EB5) that we are "lifting" to here 0x80 the original code from MSVCR90!read (0x78586F35) that we are "lifting" to here
And after each of those chunks of code we're going to want some jumps. The order of execution should be something like this:
MSVCR90!read is called. At 0x78586EB5 the execution will hit a jump we will patch in. This will jump to our heap arena at offset 0x00. After our hook executes, it will jump to our heap arena at offset 0x60 to execute the original code we lifted from 0x78586EB5. After that executes, it will jump back to read() at 0x78586EC1 (right after what we lifted). Execution will continue normally until our hook at 0x78586F35 is hit. Then, the process will hit our patched jump that will go to our heap arena at offset 0x20. After our hook executes, we will jump to the original lifted code at our heap arena offset 0x80. After that's done, it will either throw an INT 3 if it finds our needle, or it will jump back to read() at 0x78586F3C (right after what we lifted). Make sense?
So let's make this easier on ourselves by writing some functions to make these jumps (opcode 0xE9 is a jump instruction):
def makejump(start, target, length): print "Asked to make a jump from 0x%08x to 0x%08x" % (start, target) if start < target: buf = "\xe9" + struct.pack("L", target-start-5) buf += "\x90"*(length-len(buf)) else: buf = "\xe9" + struct.pack("L", target-start-5) buf += "\x90"*(length-len(buf)) return buf def patchjump(handle, x, y, length): opcodes = makejump(x, y, length) dst = ctypes.cast(x, ctypes.c_char_p) src = ctypes.c_char_p(opcodes) print "Patching jump from 0x%08x to 0x%08x" % (x, y) res = ctypes.windll.kernel32.WriteProcessMemory(handle, dst, src, length, 0x0) print "WriteProcessMemory returned 0x%08x" % res return res
At this point we should have the ability to allocate memory and to write our opcodes to it. Now we can actually start "doing stuff".
To begin, we should change the permissions on MSCVR90.dll's .text segment to ensure we can actually write our jump there. Its default permissions are the following (from WinDBG):
0:021> !address 0x78586EB5 78520000 : 78521000 - 00096000 Type 01000000 MEM_IMAGE Protect 00000020 PAGE_EXECUTE_READ State 00001000 MEM_COMMIT Usage RegionUsageImage FullPath C:\WINDOWS\system32\MSVCR90.dll We need to change the Protect flags to 0x40 (PAGE_EXECUTE_READWRITE).
This can be accomplished with a call to VirtualProtectEx:
def vprotect(handle, address): PAGE_EXECUTE_READWRITE = 0x40 crap = ctypes.byref(ctypes.create_string_buffer("\x00"*4)) res = kernel32.VirtualProtectEx(handle, address, 0x1000, PAGE_EXECUTE_READWRITE, crap) print "VirtualProtecEx returned 0x%08x" % res return res
Once we've run this on the process, we can verify with WinDBG that the permissions were changed:
0:021> !address 0x78586EB5 78520000 : 78586000 - 00001000 Type 01000000 MEM_IMAGE Protect 00000040 PAGE_EXECUTE_READWRITE State 00001000 MEM_COMMIT Usage RegionUsageImage FullPath C:\WINDOWS\system32\MSVCR90.dll
Now, let's allocate our memory and then write our jumps into read()'s code:
addr = allocate(handle, 1024) # .text:78586EB5 8B C8 mov ecx, eax # .text:78586EB7 C1 F9 05 sar ecx, 5 # .text:78586EBA 8D 1C 8D A0 D6 5B 78 lea ebx, ___pioinfo[ecx*4] hook1_orig = "\x8b\xc8\xc1\xf9\x05\x8d\x1c\x8d\xa0\xd6\x5b\x78" # patch a jump from hook1 to addr patchjump(handle, hook1, addr+0x00, 12) # .text:78586F35 C7 45 FC FE FF FF FF mov [ebp+ms_exc.disabled], 0FFFFFFFEh hook2_orig = "\xc7\x45\xfc\xfe\xff\xff\xff" # patch a jump from hook2 to addr+0x20 patchjump(handle, hook2, addr+0x20, 7)
At this point we can verify that our jumps were patched in to the process properly:
0:001> u 0x78586EB5 MSVCR90!_read+0x5e 78586eb5 e946918c95 jmp +0xde4ffff (0de50000) 78586eba 90 nop 78586ebb 90 nop 78586ebc 90 nop 0:001> u 78586F35 MSVCR90!_read+0xde: 78586f35 e9e6908c95 jmp +0xde5001f (0de50020) 78586f3a 90 nop 78586f3b 90 nop
You should notice that the first hook jumps to offset 0x00 and the second jumps to offset 0x20, which makes sense.
Now, let's implement a quick function to write to memory so that we can copy our opcodes to our heap arena:
def writemem(handle, mem, data): src = ctypes.c_char_p(data) dst = ctypes.cast(mem, ctypes.c_char_p) length = ctypes.c_int(len(data)) res = ctypes.windll.kernel32.WriteProcessMemory(handle, dst, src, length, 0x0) return res
And to use it:
# write the lifted code to our heap arena writemem(handle, addr+0x60, hook1_orig) writemem(handle, addr+0x80, hook2_orig)
And now we can write our jumps from the end of our hooks to the lifted code as well as the jumps back from the lifted code back to read():
# write jumps after our hooks that goes to the lifted code jmp_hook1 = patchjump(handle, addr+0x00+len(args_hook), addr+0x60, 5) jmp_hook2 = patchjump(handle, addr+0x20+len(search_hook), addr+0x80, 5) # write our hooks to our arena writemem(handle, addr, args_hook) writemem(handle, addr+0x20, search_hook) # write in some patches from the lifted code back to read() patchjump(handle, addr+0x60+len(hook1_orig), hook1+12, 5) patchjump(handle, addr+0x80+len(hook2_orig), hook2+7, 5)
At this point we should be done and we can verify with WinDBG:
0:001> u 0de50000 L9 +0xde4ffff: 0de50000 51 push ecx 0de50001 56 push esi 0de50002 8b4c2410 mov ecx,dword ptr [esp+10h] 0de50006 8b74240c mov esi,dword ptr [esp+0Ch] 0de5000a 890d0000c205 mov dword ptr [+0x5c1ffff (05c20000)],ecx 0de50010 89350400c205 mov dword ptr [+0x5c20003 (05c20004)],esi 0de50016 5e pop esi 0de50017 59 pop ecx 0de50018 e943000000 jmp +0xde5005f (0de50060) 0:001> u 0de50000+0x20 L13 +0xde5001f: 0de50020 60 pushad 0de50021 9c pushfd 0de50022 8b0d0000c205 mov ecx,dword ptr [+0x5c1ffff (05c20000)] 0de50028 8b350400c205 mov esi,dword ptr [+0x5c20003 (05c20004)] 0de5002e c1e902 shr ecx,2 0de50031 b852494658 mov eax,58464952h 0de50036 3906 cmp dword ptr [esi],eax 0de50038 7502 jne +0xde5003b (0de5003c) 0de5003a 7408 je +0xde50043 (0de50044) 0de5003c 8d7604 lea esi,[esi+4] 0de5003f 49 dec ecx 0de50040 7408 je +0xde50049 (0de5004a) 0de50042 ebf2 jmp +0xde50035 (0de50036) 0de50044 cc int 3 0de50045 e905000000 jmp +0xde5004e (0de5004f) 0de5004a e900000000 jmp +0xde5004e (0de5004f) 0de5004f 9d popfd 0de50050 61 popad 0de50051 e92a000000 jmp +0xde5007f (0de50080) 0:001> u 0de50000+0x60 L4 +0xde5005f: 0de50060 8bc8 mov ecx,eax 0de50062 c1f905 sar ecx,5 0de50065 8d1c8da0d65b78 lea ebx,MSVCR90!__pioinfo (785bd6a0)[ecx*4] 0de5006c e9506e736a jmp MSVCR90!_read+0x6a (78586ec1) 0:001> u 0de50000+0x80 L2 +0xde5007f: 0de50080 c745fcfeffffff mov dword ptr [ebp-4],0FFFFFFFEh 0de50087 e9b06e736a jmp MSVCR90!_read+0xe5 (78586f3c)
Now, let's test it out by loading a director file into Internet Explorer with a debugger attached:
0:001> g ModLoad: 01a00000 01a0d000 C:\WINDOWS\system32\Adobe\Shockwave 11\xtras\Speech.x32 ModLoad: 01a20000 01a4d000 C:\WINDOWS\system32\Adobe\Shockwave 11\xtras\Multiusr.x32 ModLoad: 01a10000 01a16000 C:\WINDOWS\system32\Adobe\Shockwave 11\DYNAPLAYER.DLL ModLoad: 69000000 69108000 C:\WINDOWS\system32\Adobe\Shockwave 11\IML32.dll ModLoad: 68000000 681ad000 C:\WINDOWS\system32\Adobe\Shockwave 11\DIRAPI.dll ModLoad: 6c100000 6c119000 C:\WINDOWS\system32\Adobe\Shockwave 11\SwMenu.dll ModLoad: 01ab0000 01ad7000 C:\WINDOWS\system32\Adobe\Shockwave 11\xtras\Netfile.x32 ModLoad: 71ad0000 71ad9000 C:\WINDOWS\system32\WSOCK32.dll ModLoad: 03b10000 03b17000 C:\WINDOWS\system32\Adobe\Shockwave 11\xtras\CBrowser.x32 (11ec.1910): Break instruction exception - code 80000003 (first chance) eax=58464952 ebx=785bd6a0 ecx=00002000 edx=7c90e514 esi=039e2a94 edi=000000c0 eip=0de50044 esp=0160ba3c ebp=0160ba90 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00200246 Missing image name, possible paged-out or corrupt data. Missing image name, possible paged-out or corrupt data. +0xde50033: 0de50044 cc int 3
As can be seen above, an int3 was thrown at 0x0de50044 (which is inside our heap arena). At this point, we can verify that at ESI is our needle value.
0:008> dc esi 039e2a94 58464952 50dc0000 3339564d 70616d69 RIFX...PMV93imap 039e2aa4 18000000 01000000 2c000000 3a070000 ...........,...: 039e2ab4 00000000 00000000 00000000 70616d6d ............mmap 039e2ac4 e00f0000 14001800 ca000000 97000000 ................ 039e2ad4 90000000 ffffffff 68000000 58464952 ...........hRIFX 039e2ae4 50dc0000 00000000 00000100 00000000 ...P............ 039e2af4 70616d69 18000000 0c000000 00000100 imap............ 039e2b04 24d9af0a 70616d6d e00f0000 2c000000 ...$mmap.......,
At this point we can set a memory breakpoint on that location and it should tell us where Shockwave first decides to parse our data:
0:008> ba r1 039e2a94 0:008> g Breakpoint 0 hit eax=00000052 ebx=0000000c ecx=0000000c edx=0160bb6c esi=039e2a94 edi=0160bb6c eip=69007295 esp=0160ba94 ebp=0160baa0 iopl=0 nv up ei pl nz na po nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00200202 IML32!Ordinal9231+0x7295: 69007295 83c601 add esi,1 0:008> ub IML32!Ordinal9231+0x727b: 6900727b 3bc1 cmp eax,ecx 6900727d 7567 jne IML32!Ordinal9231+0x72e6 (690072e6) 6900727f 8b0dfcbc0b69 mov ecx,dword ptr [IML32!Ordinal2344+0x32fac (690bbcfc)] 69007285 8b7d08 mov edi,dword ptr [ebp+8] 69007288 8b750c mov esi,dword ptr [ebp+0Ch] 6900728b f7c707000000 test edi,7 69007291 740f je IML32!Ordinal9231+0x72a2 (690072a2) 69007293 8a06 mov al,byte ptr [esi] 0:008> t eax=00000052 ebx=0000000c ecx=0000000c edx=0160bb6c esi=039e2a95 edi=0160bb6c eip=69007298 esp=0160ba94 ebp=0160baa0 iopl=0 nv up ei pl nz na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00200206 IML32!Ordinal9231+0x7298: 69007298 8807 mov byte ptr [edi],al ds:0023:0160bb6c=a8 0:008> t eax=00000052 ebx=0000000c ecx=0000000c edx=0160bb6c esi=039e2a95 edi=0160bb6c eip=6900729a esp=0160ba94 ebp=0160baa0 iopl=0 nv up ei pl nz na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00200206 IML32!Ordinal9231+0x729a: 6900729a 83c701 add edi,1 0:008> ba r1 @edi 0:008> g Breakpoint 1 hit eax=58464952 ebx=00000000 ecx=00000000 edx=058273a8 esi=03af32c0 edi=00000000 eip=68002ffc esp=0160bb5c ebp=0160bb98 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00200246 DIRAPI+0x2ffc: 68002ffc 3d58464952 cmp eax,52494658h 0:008> .formats @eax Evaluate expression: Hex: 58464952 Decimal: 1481001298 Octal: 13021444522 Binary: 01011000 01000110 01001001 01010010 Chars: XFIR Time: Mon Dec 05 23:14:58 2016 Float: low 8.72073e+014 high 0 Double: 7.31712e-315
So, we can see it first copies it to some other buffer. We set a breakpoint on that new buffer they copied to and when its next hit we can see they are comparing it to RIFX. This is the place we should begin reversing to discover more about their file format and corresponding parsing.
This was how Logan and I accomplished a lot of the Shockwave reversing that we talked about during our CanSecWest presentation (PPTX).
The above code can be snagged here in one .py file: thePublicHooker.py.
Expect some future posts on other tricks that can be accomplished using injection in this manner.
As Recon 2011 in Montreal (July 8-10) is fast approaching we wanted to let ZDI researchers know there is a training being offered by two of the ZDI team members: Bug Hunting and Analysis 0x65.
Some of the case studies offered on day 2 of the training will be submissions that were patched and disclosed through the ZDI. Many researchers have been interested in our analysis of their submissions. Recon is a great place to discuss these cases. If you will be attending and want to request a case study of one of your *patched/disclosed* submissions, send a request via email. We probably have time to get one or two more done before the training, so get your request over ASAP if you're interested.
As always things are better with free stuff, so:
Any new or existing ZDI researchers who wish to attend this class will recieve a 5,000 pt reward bonus credited to their researcher account.
Bug Hunting and Analysis 0x65
This 3 day course is structured to impart upon the students the skills necessary to effectively utilize debuggers, disassemblers, and other tools to discover vulnerabilities in binary code. The curriculum will begin by introducing students to the tools and generic techniques that will enable them to actively participate in reversing applications during the rest of the course.
After gaining a basic understanding of the tools involved, the instructors will spend day 2 walking students through case studies from patched vulnerabilities. That is, we will be choosing specific vulnerabilities and walking the students through the methodology used to verify them (debugging) and how the discoverer likely found them (fuzzing, static reverse engineering, dynamic instrumentation, etc). As each flaw is dissected, we will focus on how the student's arsenal of techniques can be extended to more easily debug applications and eventually discover similar bugs going forward.
On day 3 we will begin focusing on automating our tools to build a checklist that we can use to more efficiently reverse engineer a binary code base. We will walk through a complete audit of a default installation (latest version) of a popular enterprise server application culminating in the discovery of a remote pre-authentication 0day vulnerability. Students will be required to sign a minimal NDA in order to participate in this portion of the training.
Instructors: Aaron Portnoy and Zef Cekaj
Dates: 5-7 July 2010
Availability: 18 Seats
Dates: 11-13 July 2010 Availability: 18 Seats
Price 2600$ CAD before May 15, 3200$ CAD after.
Anyone who utilizes IDA Pro is very likely familiar with the concept of subviews, the window panes that give a reverser the ability to view and query many characteristics of a binary stored in IDA's database. The default views available in IDA are great for displaying generic characteristics of any disassembled object. However, as is quite often the case, one may wish to collect data about an application (or otherwise) that may be more specifically targeted.
For example, we often find ourselves reversing a file format parser for one reason or another. Many of them support various FourCC-based formats (like MOV, RealMedia, Shockwave, PNG, ICC, ...) and as such usually have code that resembles this:
These compares are useful to locate as they will usually be dealing with an input file and given some complementary runtime data, these locations can help us understand a bit about how the parser may work. So, as Elias Bachaalany demonstrated on the Hex-Rays blog (http://www.hexblog.com/?p=119) IDA supports custom viewer creations. I'll show how to implement a quick, clickable interface to FourCC compares within a binary.
First, we need code to find FourCC compares:
fourccs = 
# loop functions
for func in Functions():
# loop instructions
for instr in FuncItems(func):
# only care about compares
disasm = GetDisasm(instr)
if "cmp" in disasm and "offset" not in disasm:
# ensure immediate compare
if GetOpType(instr, 1) == 5 and GetOpType(instr, 0) == 1:
# ensure at least 3 bytes of ASCII in the immediate
opval = GetOpnd(instr, 1)
opval = int(opval[:-1], 16)
opval = struct.pack('>L', opval)
oplen = len(opval)
opstrlen = len("".join((x for x in opval if x in string.printable)))
if (opstrlen >= 3):
# change it to character representation in IDA
print "0x%08x: %s" % (instr, GetOpnd(instr, 1))
This should output something like the following:
Now, we can build a custom viewer and populate it with our data:
def __init__(self, data):
self.fourccs = data
print "Launching FourCC subview..."
title = "FourCCs"
comment = idaapi.COLSTR("; Double-click to follow", idaapi.SCOLOR_BINPREF)
comment = idaapi.COLSTR("; Hover for preview", idaapi.SCOLOR_BINPREF)
for item in self.fourccs:
addy = item
immvalue = item
address_element = idaapi.COLSTR("0x%08x: " % addy, idaapi.SCOLOR_REG)
value_element = idaapi.COLSTR("%s" % immvalue, idaapi.SCOLOR_STRING)
line = address_element + value_element
def OnDblClick(self, something):
line = self.GetCurrentLine()
if "0x" not in line: return False
# skip COLSTR formatting, find address
addy = int(line[2:line.find(":")], 16)
def OnHint(self, lineno):
if lineno < 2: return False
else: lineno -= 2
line = self.GetCurrentLine()
if "0x" not in line: return False
# skip COLSTR formatting, find address
addy = int(line[2:line.find(":")], 16)
disasm = idaapi.COLSTR(GetDisasm(addy) + "\n", idaapi.SCOLOR_DREF)
return (1, disasm)
Now we just need to instantiate an instance of FourCCViewer and it should pop up a new subview that looks like so (assuming you've already run the fourCC finding code above):
foo = FourCCViewer(fourccs)
Pretty simple stuff, but it can come in handy when you've got an arsenal of extra analysis runs you perform prior to reversing a complex binary. A slightly more interesting use case for this was when displaying data scraped from a run time histogram of Shockwave's memory manager as Logan and I spoke about at CanSecWest this year:
That's all for now, hopefully we can get some more MindshaRE posts in the pipeline.
BlackHole exploit kit is yet another in an ongoing wave of attack toolkits flooding the underground market. The kit first appeared on the crimeware market in September of 2010 and ever since then has quickly been gaining market share over its vast number of competitors. In fact, many antivirus vendors now claim that this is one of the most prevalent exploit kits used in the wild. Even Malware Domain List is showing quite a few domains infected with the BlackHole exploit kit. So what is it that makes this attack toolkit stand out above the rest?
While the number of reported infections by BlackHole kit is indeed impressive we think there is nothing truly revolutionary about this exploit kit. Part of its newfound success can be attributed to its rich feature set which it shares in common with myriad of other recent exploit kits such as Siberia Exploit Kit. The other major factor contributing to its success is its flexible pricing scheme. Unlike some other kits out there BlackHole uses a timed licensing plan. Users can purchase the annual license for $1500, semi-annual license for $1000, or just a quarterly license for $700. The license includes free software updates for the duration of the contract. For those malicious users with a commitment phobia the makers of the kit offer yet another solution. You can rent the kit (on the author’s servers) for $50 for 24 hours, $200 for 1 week, $300 for 2 weeks, $400 for 3 week, and $500 for 4 weeks. A domain name comes included with the rental agreement, but should you desire to change it you need to pay another $35. There’s also an array of other “services” such as changing the encryption method, which can be purchased by users on demand should the need arise. Another popular service is an AV checker that allows you to scan your payload files to make sure they’re not detected by any AV vendors. This service is very similar to VirusTotal, except its aimed for criminals because uploaded test files are not reported to any AV vendors. Yet another paid service is a domain change. Should your domain get blacklisted by any security vendors you can pay a small fee to have it changed.
One highly touted feature of BlackHole toolkit is its TDS or Traffic Direction Script. While this is not an entirely new concept in attack toolkits the TDS included her is much more sophisticated and powerful than those in other kits. A TDS is basically an engine that allows redirection of traffic through a set of rules. For example, a user can set up a set of rules that redirect flow to different landing pages on their domain. These rules could be based on operating system, browser, country of origin, exploit, files, etc. One rule might redirect traffic to page A for all users that are running Windows OS from XP to Vista and running IE 8, while another rule can redirect Windows 7 users to page B. Those were just simple example rules. More advanced rules could set expiration dates for certain payloads and replace them with new ones when the date is reached. The TDS included in BlackHole even goes the extra step and allows you to create traffic flows based on these rules and provides management interface for the flows. A savvy malicious user with a lot of experience could easily utilize this rule engine to increase their infection numbers.
From a web application standpoint BlackHole is built just like other kits, consisting of a PHP and MySQL backend. Since the majority of web servers run on the LAMP stack this enabled for very easy application deployment. The user interface for this kit is a cut about the rest, and it definitely looks nicer than almost any other attack kit we’ve analyzed. It resembles some of the best legitimate web apps we see in the world of commercial software.
Here’s a screenshot that shows various payloads delivered by this kit instance along with hit stats and other details:
Here’s another screenshot that shows the security module of BlackHole. This allows you to blacklist any addresses that you don’t want poking around your exploit kit.
As with any exploit kit this one comes pre-packaged with a bunch of exploits. Below is a list of CVEs that correspond to exploits packaged with BlackHole:
- CVE-2010-1885 HCP
- CVE-2010-1423 Java argument injection vulnerability in the URI handler in Java NPAPI plugin
- CVE-2010-0886 Java Unspecified vulnerability in the Java Deployment Toolkit component in Oracle Java SE
- CVE-2010-0842 Java JRE MixerSequencer Invalid Array Index Remote Code Execution Vulnerability
- CVE-2010-0840 Java trusted Methods Chaining Remote Code Execution Vulnerability
- CVE-2009-1671 Java buffer overflows in the Deployment Toolkit ActiveX control in deploytk.dll
- CVE-2009-0927 Adobe Reader Collab GetIcon
- CVE-2008-2992 Adobe Reader util.printf
- CVE-2007-5659 Adobe Reader CollectEmailInfo
- CVE-2006-0003 IE MDAC
Some of these exploits are pretty old, one even dating back to 2006 but that doesn’t mean that they’re not still effective. Take a look at these infection screens taken from the stats module.
Here’s a screenshot showing infection statistics sorted by exploit:
And here we have infection rates sorted by browser:
The following shows infection statistics organized by operating system:
As you can see some of the infection rates recorded in BlackHole kits are very high. Its worth noting that most of the CVEs found in BlackHole are also found in other exploit toolkits. However, what is interesting in this toolkit is the fact it uses Java OBE (in form of a JAR file) to serve up Java exploits. Java OBE (Open Business Engine) is a flexible, modular, standards-compliant open source Java workflow engine. This is something we haven’t seen before as other toolkits have not used open source projects like the OBE in the past. The exploits served by Java OBE are CVE-2010-0840 and CVE-2010-0842.
The attack typically works as follows. A victim visits an infected domain which has an iFrame pointing to the server hosting the exploit kit. BlackHole’s TDS, which we talked earlier, automatically directs the traffic to an exploit that would be most likely to work on the victim’s machine. That could be the Java OBE, IE, Adobe Reader, or any other exploit included. Chances are one of the Java exploits will be used, especially given the fact that 50% of exploits in this kit are Java based. If exploitation is successful malicious payload is delivered to the machine. The payload most often downloaded is the Carberp trojan. Carberp is a very dangerous trojan often compared to the likes of Zeus and has been gaining in popularity lately. Once Carberp is successfully installed on the machine it starts to talk to its C&C server from which it downloads additional modules. It downloads the following:
- - stopav.plug - Disables the antivirus if any is installed on the victim’s computer.
- - miniav.plug – Checks for the presence of other Trojans, such as Zeus, and if found, deletes them. This is very similar to SpyEye behavior.
- - passw.plug – Key logger module. It hooks the export table of a number of WININET.dll and USER32.dll functions and will log all username/password combinations, as well as any URLs visited.
Afterwards more malware is installed to the victim’s computer, including Trojan.Zefarch and FakeAV. This trend of using Java exploits is becoming increasingly more common. It seems that Java exploits are becoming the weapon of choice by attack toolkit writers because of its cross-platform nature. Fortunately for you, if you’re using a TippingPoint IPS you are well protected. DVLabs has filters for 100% of CVEs found in this toolkit. Here are the filters as well as the matching CVEs for the BlackHole exploit kit:
- CVE-2010-1885 9889
- CVE-2010-1423 9697,9698
- CVE-2010-0886 9697
- CVE-2010-0842 9651
- CVE-2010-0840 10985
- CVE-2009-1671 10919
- CVE-2009-0927 6255
- CVE-2008-2992 6833
- CVE-2007-5659 6435,6436
- CVE-2006-0003 4244
On top of these filters we are also engaged in ongoing research projects where we continuously collect and analyze samples of newest exploit kits (and other types of malware) in order to quickly create filters that protect our customers. We’ve analyzed over two dozen web exploit kits in the past 6 months and are continuously working on acquiring new kits as soon as they’re released. We’ll keep you posted with further blogs as we discover new and interesting exploit toolkits.
Background behind the problem
Cloud computing has quickly evolved from a hot industry buzz word into a multi-billion dollar emerging market, with all the big names striving to grab a piece of the pie. Amazon, with its Amazon Elastic Computer Cloud (EC2), is arguably the dominant leader of the cloud services market. Even the video streaming giant Netflix moved its operation into Amazon's EC2, opting out of building out its own data centers. With such a high growth technology sector it's no wonder we are starting to see more and more malicious activity spreading in the cloud ecosystem.
Let us quickly go over a typical usage scenario from the view point of an EC2 user, in order to better explain how this attack works. Users sign up for Amazon Web Services because they want to host something on the cloud, be that a web site, web service, or just a data backup. More often than not they want to host a server in the cloud in order to offset the cost of purchasing and maintaining their own hardware. So how does this process work? Well, when a user wants to create a virtual web server, referred to as an instance, they usually have two choices. They can either create a software image to install with, or they can use one of the pre-existing software images available from EC2. These images are called AMIs, or Amazon Machine Images. They're stacks of software created to help users deploy servers quickly. For example, a typical AMI might consist of a SuSE Linux image with Apache web server and MySQL database already installed and configured. Other images may have WordPress, Joomla, Drupal or a number of other content management systems pre-installed for you. You get the picture. The AMIs allow you to deploy a server, for whatever purpose you want to use them for, rather quickly. A user just has to select the type of instance they want to deploy (number of CPU cores + memory) and then pick the image they want to deploy with. The whole process takes less than 5 minutes.
There two major types of AMIs to work with. There are images built by Amazon itself, of which there are 170 at the time of this writing. But then there are also public images, created by the other members of the EC2 community. There are about 7390 of these, all created by other users of Amazon's cloud services and then shared with the rest of the community to help others. Or so you would assume ...
Certified Pre-owned AMI
On April 8th, 2011, Amazon sent out the following email to its Elastic Compute Cloud customers acknowledging the presence of compromised images in their community. In this email they notified the members using the compromised AMIs of the danger they're facing and the necessary course of action to remediate the threat. This is one such email:
The email identifies the affected AMI and warns against any continued use of it. It suggests, and rightfully so, that any server instance running an infected image should be for all intents and purposes considered 100% compromised. All services running on those instances should be migrated to new (clean) images. This is definitely the right recommendation to make to its customers, as nobody can guarantee that the affected servers have not been compromised already. Naturally, they do not disclose the number of instances that have been affected from this AMI but simply judging from the sheer size of EC2 and the number of servers hosted there we can't help but assume the number is not very small. The infected image is comprised of Ubuntu 10.4 server, running Apache and MySQL along with PHP. This is a pretty typical LAMP server setup so we're assuming a lot of users opted to use it, especially if they are hosting a web site. To make things worse the image appears to have been published in October of 2010, which is 6 months ago and we are only hearing about this problem now.
So what exactly happened here? An EC2 user that goes by the name of guru created this image, with the software stack he uses most often and then published it to the Amazon AMI community. This would all be fine and dandy if it wasn't for one simple fact. The image was published with his SSH key still on it. This means that the image publisher, in this case guru, could log into any server instance running his image as the root user. The keys were left in /root/.ssh/authorized_keys and /home/ubuntu/.ssh/authorized_keys. We refer to the resulting image as 'certified pre-owned'. The publisher claims this was purely an accident, a mere result of his inexperience. While this may or may not be true, this incident exposes a major security hole within the EC2 community.
While the ability to publish and use community created AMIs is a nice addition to EC2 it also leaves its community widely exposed to a wide range of security vulnerabilities. A foreign SSH key injected into ~/.ssh/authorized_keys like in the one described in Amazon's email is something that most users would probably never notice. Vast majority of users will simply select the AMI they want to use, pick their instance type and off they go. They will not take time to do a security audit of the system. They might not even know how to do a security audit of the system. Most business are rushing to get their products to market, and cloud is just another shortcut to getting their servers up and running and not having to worry about machine maintenance. To them, the more time they save configuring the systems the more time they have to get their product out of the door quicker. In which case it makes perfect sense to
them to select a community released AMI if it meets all their software requirements. This is exactly why we think this type of attack on the EC2 community can prove to be particularly effective.
Tip of the iceberg
Whether this was an attack or not is uncertain. I suspect it was just an honest mistake, but that is besides the point. What we do know is that if something this transparent took 6 months to get caught than the cloud community is due for a rude awakening. This exposes a real security vulnerability present within the community that needs to addressed as soon as possible. Currently there is no true way for customers of EC2 to trust any of the publicly published AMIs. How could they? These images are created by unknown 3rd parties and there's no real way of knowing how the images were created. A truly malicious user can easily put in a truly hidden backdoor that might never get discovered. They could recompile ssh-daemon with their own backdoor, that would allow their user to be permitted access to the system. Such a backdoor would be near impossible to detect. An attacker could also put a backdoor into the kernel itself, install a rootkit, create a trojan that phones home, etc. The possibilities are endless and the security measures are unfortunately too few. The point here is that any AMI could have a backdoor that is so hard to detect that it would probably never be detected by the community. This sort of vulnerability allows for a lot of blanketed attacks. Much like the AMI mentioned earlier in this email was comprised of widely used software so could a truly malicious AMI created by an attacker. Malicious entities could potentially flood the EC2 community with hundreds of infected AMIs with the most popular software stacks such as LAMP, CMS servers, video/audio streaming servers, etc. These AMIs would then get consumed by the unsuspecting cloud users and their machines compromised.
Cloud security is no small issue, and the problem with compromised AMIs is not one to take lightly. We hope this incident proves to be a tipping point (no pun intended) for the way EC2 manages AMIs. Hopefully Amazon EC2 Security Team finds a way to deal with these challenges. One possibly solution would be for Amazon to become prolific in creating AMIs that the community is demanding. Perhaps have a form where customers can demand certain types of images and Amazon employees create them. Another solution might be create a method of easily creating images from scratch. Perhaps some sort of automated install scripts that create software stacks based on customer needs. Yet another solution would be to create an open community that does peer reviews of published AMIs although this could prove to be an insurmountable task. As it currently stands the EC2 community AMIs are unsafe for use. We suggest all EC2 users to carefully check their running instances for problems such as the ones we described, and optimally they should create and install their own AMI images.
Professionalism in the Underground
It’s no secret to those who study illicit (shadow) economies that things change rapidly in order to meet supply and demand. Profit (regardless of how you define it) remains supreme; loss the enemy. This is true in all markets legal or illegal with cybercriminal markets being no exception. Take botnets for example. The market for botnets changes at amazing rate. The purpose, style, functionality, models for acquisition (do I rent or do I own?), size, and effectiveness are dynamic and evolving. Often advanced marketing campaigns (some more formal than others) are employed which showcase the botnet (and author’s) vision and dedication to their products. Many times in the course of these marketing campaigns information such as:
- Service Level Agreements
- Technical Assistance Centers (TAC)
- Price guarantees
- Competitive Analysis Intelligence
Winds of Change: ZeuS and SpyEye
No better example of this comes to mind than that of the infamous ZeuS botnet also known as the Zeus banking Trojan (Zbot, PRG, Wsnpoem, Gorhax and Kneber). Not long ago what initially looked like a hostile takeover involving the authors of the SpyEye Trojan and the authors of the ZeuS banking Trojan was underway. The upstart authors of the SpyEye Trojan made international headlines in 2010 when it was discovered that the Trojan had the capability of automatically searching for and removing ZeuS from compromised hosts before installing itself. The team behind SpyEye (called the ‘ZeuS Killer’ by its author) also made sweeping allegations regarding the inefficiencies of their competitor while touting their strengths. Then something odd occurred. The underground forums rang like cathedral bells when it was made known that a Russian hacker known by the handles “Slavik” and “Monstr” had no future plans for maintaining the now ubiquitous crimeware kit. Instead, according to numerous hacker forums and IRC channels the author decided to transfer the original source code of his Trojan to the authors of the SpyEye Trojan.
Figure1a: Spyeye Advertisement
Figure1b: Spyeye Advertisement
Figure 1c: Spyeye Advertisement
Figure 1d: Spyeye Advertisement
A Possible New Variant of ZeuS?
That this sort of activity is occurring in the underground is occurring is not surprising but it does make me wonder whether or not the authors of ZeuS sold to only one buyer. I believe that they did not based on the following information gather from open sources:
Figure2: New ZeuS Variant
Figure 3: New ZeuS Variant
Figure 4: New ZeuS Variant
Figure 5: New ZeuS Variant
The advent of this new variant may partially explain the uptick in activity that we and our peers are seeing our research. You’ll not that in Figure 5 which is a data graph provided by abuse.ch Zeus Tracker, that there appears to be an uptick in ZeuS activity beginning right about the same time when this latest variant was made public. In speaking with researchers in Latin America, and Europe this correlates with the data we at HP DVLabs have collected. You’ll note that in Figures 6 and 7 respectively that the light green bar represents unique source IP addresses while the light blue represents unique destination IP addresses.
Figure 6: abuse.ch Zeus Tracker Statistics for February and March 2011
Figure 7: ZeuS Botnet Command and Control Phone Home RequestFigure 7 depicts a phone home attempt (indicative of a backdoor C&C model) made by a compromised host infected with the ZeuS Trojan (botnet). What is interesting to note is the valley occurred between March 13 through the 15 of 2011 as that correlates with the alleged ‘transition’ period of ZeuS source code from ‘Slavik’ to ‘Harderman’, author of the SpyEye Trojan (botnet).
Figure 8: Spyeye Botnet Command and Control Phone Home Request
Similarly Figure 8 depicts a phone home attempt (indicative of a backdoor C&C model) made by a compromised host infected with the SpyEye Trojan (botnet). Note the uptick comparable uptick in activity that closely parallels that seen in our research and that of our peers. As mentioned earlier in this blog, SpyEye is a cleverly crafted mobilized Trojan that has the ability to among other things to:
- Enumerate target hosts for the presence of ZeuS and remove it prior to installing itself
- Monitor keystrokes
- Record username / password combinations
- Harvest credit card numbers
- Upload all acquired data
- Once it has concluded harvesting the data to remote servers for storage and collection
As it stands we will continue to monitor ZeuS’s evolution in concert with SpyEye and independent of it as our findings demonstrate that it remains alive and well. This latest variant of ZeuS is being offered for approximately $5500.00 USD payable via a number of means. We predict continued growth and the potential for expansion with respect to this botnet and will monitor its activity moving forward.