As a malware analyst and reverse engineer, I am often faced with reversing some type of cryptography algorithm or decompression routine that can take hours, days, months, or even years to fully understand. I am often tasked with understanding: What is the blob of data that is used by the malware?
Answering the “what” is always the challenging part and I usually don’t have a lot of time to fully reverse some crypto routine. I simply need to answer the question: This data is a configuration file that is used by the malware to do XYZ or I simply don’t know what this data is (I don’t like to give this answer, but it happens).
There are several different approaches one can take to decrypt/decompress data from malware. You can run the malware and dump memory segments (dump strings on each sample afterwards), debug the malware in a debugger, place hooks on decryption/decompression routines and dump return vales, static analysis, etc. While all these approaches are good and will provide you the desired answers they can be somewhat time consuming. What if you have several blobs of data you need decoded/decompressed? Wouldn’t it be great if you could take the assembly code directly from the malware’s decompression/decoding routine, put it in a compiler such as Visual Studio, compile it to a dynamic link library (DLL), and then call into it using your favorite scripting language such as Python? This blog will show a technique that can be used to achieve just this. A link to the finished tool hosted on Unit 42’s public tools GitHub repository, which decompresses a data blob within Reaver used as a database lookup for API calls and strings can be found here.
The Scenario
For this example, I was tasked with trying to identify the compression algorithm used as part of our recent analysis into the Reaver malware family and determine if the strings within the malware could be decompressed from the binary without running it. The keyword here being “without”.
During my analysis of the Reaver malware family, it appears to implement a modified Lempel-Ziv-Welch (LZW) compression algorithm. The decompression algorithm from the Reaver malware for this example was found at address: 0x100010B2 and is approximately 200 lines of assembly. Decompressed routine example is in Figure 1, below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
; void __thiscall decompress(_DWORD *this, int nstream, int output, int zero, int zero2, int zero3) decompress proc near ; CODE XREF: decompressingData+5A↓p nstream = dword ptr 8 output = dword ptr 0Ch zero = dword ptr 10h zero2 = dword ptr 14h zero3 = dword ptr 18h push ebp mov ebp, esp push ebx push esi push edi mov esi, ecx push 16512 ; unsigned int call Malloc pop ecx mov edi, eax mov ecx, 1020h xor eax, eax mov [esi], edi xor ebx, ebx rep stosd |
Figure 1 Reaver decompression routine
For brevity the entire code from the malware function is not shown. The important parts to take aware from this are:
- Calling convention is __thiscall (indicates C++)
- Function takes five arguments
- The function is called once from the malware (number of cross references identified in IDA Pro)
Here is what the function looks like when being called:
1 2 3 4 5 6 7 8 9 10 |
xor eax, eax mov ecx, [ebp+v6] push eax push eax push eax movzx eax, word ptr [ebx+24] push dword ptr [edx] ; output lea eax, [eax+ebx+26] push eax call decompress |
Figure 2 Calling Reaver decompression routine
Here’s an overview of calling the decompressed function:
- Clears EAX register, so EAX is zero
- Pointer to the object is stored in ECX (Thiscall)
- The three pushes of EAX indicate that the last three parameters to the decompressed routine are always zero.
- Parameter two is a pointer to a destination buffer.
- Parameter one is a pointer to the compressed data.
The compressed data is:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
08 00 A5 04 01 12 03 06 8C 18 36 7A 04 21 62 25 ..¥.....Œ.6z.!b% 08 94 24 33 64 B8 20 C3 86 4D 03 05 02 09 1A 8C .”$3d¸ ÆM.....Œ 71 A3 C7 91 32 74 AA CC 29 23 C7 49 98 36 65 82 q£Ç‘2tªÌ)#ÇI˜6e‚ 5C CC 58 F0 20 8E 1E 52 CA 9C 19 C2 E6 CD C8 25 \ÌXð Ž.RÊœ.ÂæÍÈ% 65 F2 AC 1C D8 32 46 0E 98 32 9F C0 29 E3 06 67 eò¬.Ø2F.˜2ŸÀ)ã.g 9E 22 78 54 62 E4 69 50 06 0C A0 33 E5 94 09 43 ž"xTbäiP.. 3å”.C A7 8C 51 A4 4A 59 36 8D 01 75 0A 48 2B 61 D8 D4 §ŒQ¤JY6..u.H+aØÔ 29 83 75 A7 46 18 32 64 40 25 52 86 0D C8 32 60 )ƒu§F.2d@%R†.È2` C5 A6 34 DB 52 C6 0C 85 64 D4 D4 99 43 87 CA 9B Ŧ4ÛRÆ.…dÔÔ™C‡Ê› 35 44 A1 C8 49 63 27 8D DB 33 65 E6 D0 6D 4A A3 5D¡ÈIc'.Û3eæÐmJ£ 07 93 37 7F EB C0 11 4C D8 B0 4C B8 61 C7 66 65 .“7.ëÀ.LØ°L¸aÇfe 8A B6 46 0F A1 81 E5 BC 19 93 78 8E 5F C0 6E 16 Š¶F.¡.å¼.“xŽ_Àn. A3 4D 38 85 4E 18 39 74 BC CA 29 4C 7A F3 59 19 £M8…N.9t¼Ê)LzóY. |
Figure 3 Reaver compresed data
For brevity the entire contents of the compressed data are not shown. The entire size is: ~45,115 bytes.
Bytes 1-7 (08 00 A5 04 01 12 03) appear to be a magic header for the compression routine and was found in all Reaver malware variants.
Armed with this knowledge we can now turn our focus to the inner working of the decompression routine.
Note: From here one could simply monitor the return from the call and dump the contents of the destination buffer, which would contain the decompressed data, but this would require running the code from a debugger. Remember our goal is to not run the sample.
At this point we have enough general information that we can begin to create a DLL, so start up Visual Studio or any compiler that handles compiling assembly (NASM/MASM). Create a new empty DLL project and add a new header file. For example, I created a header with the following information:
1 2 3 4 5 6 7 8 9 10 11 12 |
#pragma once #ifndef _DEFINE_LZWDecompress_DLL #define _DEFINE_LZWDecompress_DLL #ifdef __cplusplus extern "C" { #endif __declspec(dllexport) BOOL Decompress(char *src, char *dst); #ifdef __cplusplus } #endif BOOL Decompress(char *src, char *dst); #endif |
Figure 4 C Header file
The above code creates a single export named “Decompress” and accepts two arguments. Why two and not five? Since the other three arguments will always be zero there is no need to define them. The return type for our function is a Boolean.
For your source file (.cpp or .c), take the assembly from IDA Pro or your debugger and add it to your source file. Here is what my source file looks like (after I fixed it up):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 |
#include <windows.h> #include <stdio.h> #include "TestDLL.h" BOOL Decompress(char *src, char *dst) { //Use calloc vs malloc. Temp buffer is for the dictionary void *pTmpbuff; pTmpbuff = (int*) calloc(0x4080u, sizeof(unsigned int)); if (src && dst) { __asm { xor ebx, ebx; //Need to clear ebx register SUB ESP, 0x40; //Need to subtract stack, so we don’t overwrite some Ctypes return data MOV ESI, ESP; PUSH EAX; POP EDI; //Our Temp Buffer PUSH[EBP + 8]; //Source Buffer POP EAX; PUSH[EBP + 0xC]; //Destination Buffer POP EDX; LEA ECX, DWORD PTR DS : [EAX + 1]; //Where we start. Get the 1st DWORD of the compressed data appears to be magic value MOV DWORD PTR DS : [ESI], EDI;//Temp buffer address MOV DWORD PTR DS : [ESI + 0x1C], EDX;//Destination address MOV DWORD PTR DS : [ESI + 0x18], ECX;//Compressed Data MOV BYTE PTR DS : [ESI + 0x20], BL;//0 MOV CL, BYTE PTR DS : [EAX];//08 PUSH 1; POP EAX; MOV BYTE PTR DS : [ESI + 0x22], CL; SHL EAX, CL; MOV DWORD PTR DS : [ESI + 0x30], EBX; MOV WORD PTR DS : [ESI + 8], AX; INC EAX; MOV WORD PTR DS : [ESI + 0xA], AX; MOV EAX, DWORD PTR SS : [EBP + 0x10]; MOV DWORD PTR DS : [ESI + 0x2C], EAX; LEA EAX, DWORD PTR DS : [EAX * 8 + 0x1F]; SHR EAX, 5; SHL EAX, 2; CMP BYTE PTR SS : [EBP + 0x18], BL; MOV DWORD PTR DS : [ESI + 0x38], EAX; SETE AL; DEC EAX; AND AL, 1; ADD EAX, 0x0FF; CMP AL, BL; MOV BYTE PTR DS : [ESI + 0xC], AL; JNZ SHORT check3; MOV EAX, DWORD PTR SS : [EBP + 0x14]; MOV DWORD PTR DS : [ESI + 0x14], EDX; MOV DWORD PTR DS : [ESI + 0x28], EAX; MOV DWORD PTR DS : [ESI + 0x34], EBX; check3: MOV ECX, ESI; CALL check4; check26: MOV ECX, ESI; CALL check10; MOV EDI, EAX; CMP DI, WORD PTR DS : [ESI + 0xA]; JE Finished; CMP DI, WORD PTR DS : [ESI + 8]; JNZ SHORT check22; MOV ECX, ESI; CALL check4; check24: MOV ECX, ESI; CALL check10; MOV EDI, EAX CMP DI, WORD PTR DS : [ESI + 8] JNZ SHORT check23; JMP SHORT check24; check22: CMP DI, WORD PTR DS : [ESI + 0X24] JNB SHORT check25; PUSH EDI JMP SHORT check27; check25: PUSH EBX; check27: MOV ECX, ESI; CALL check28; MOVZX AX, AL; PUSH EAX; PUSH EBX; MOV ECX, ESI; CALL check31; PUSH EDI; MOV ECX, ESI; CALL check35; MOV EBX, EDI; JMP SHORT check26; check10: MOVZX EAX, BYTE PTR DS : [ECX + 0x20]; PUSH EBX; PUSH ESI; PUSH EDI; MOVZX EDI, BYTE PTR DS : [ECX + 0x23]; ADD EAX, EDI; CMP EAX, 8; JA SHORT Check6; MOV EDX, DWORD PTR DS : [ECX + 0x18]; MOVZX ESI, BYTE PTR DS : [EDX]; JMP SHORT Check8; Check6: MOV EDX, DWORD PTR DS : [ECX + 0x18]; CMP EAX, 0x10; JA SHORT Check7; MOVZX ESI, WORD PTR DS : [EDX]; JMP SHORT Check8; Check7: MOVZX ESI, BYTE PTR DS : [EDX + 2]; MOVZX EBX, WORD PTR DS : [EDX]; SHL ESI, 0X10; OR ESI, EBX; Check8: MOV EBX, EAX; PUSH 0x20; SHR EBX, 3; ADD EBX, EDX; MOV DL, AL; AND DL, 7; MOV DWORD PTR DS : [ECX + 0X18], EBX; MOV BYTE PTR DS : [ECX + 0X20], DL; POP ECX; SUB ECX, EAX; MOV EAX, ESI; PUSH 0x20; SHL EAX, CL; POP ECX; SUB ECX, EDI; POP EDI; POP ESI; POP EBX; SHR EAX, CL; RETN; check28: MOV EAX, DWORD PTR DS : [ECX]; MOV EDX, DWORD PTR SS : [ESP + 4]; check30: MOVZX ECX, DX; MOV CX, WORD PTR DS : [EAX + ECX * 4]; CMP CX, 0x0FFFF; JE SHORT check29; MOV EDX, ECX; JMP SHORT check30; check29: MOVZX ECX, DX; MOV AL, BYTE PTR DS : [EAX + ECX * 4 + 2]; RETN 4; check31: MOVZX EDX, WORD PTR DS : [ECX + 0x24]; LEA EAX, DWORD PTR DS : [ECX + 0x24]; PUSH ESI; MOV ESI, DWORD PTR DS : [ECX]; PUSH EDI; MOV DI, WORD PTR SS : [ESP + 0xC]; MOV WORD PTR DS : [ESI + EDX * 4], DI; MOV ESI, DWORD PTR DS : [ECX]; MOVZX EDX, WORD PTR DS : [EAX]; MOV DI, WORD PTR SS : [ESP + 0x10]; MOV WORD PTR DS : [ESI + EDX * 4 + 2], DI; INC WORD PTR DS : [EAX]; MOV AX, WORD PTR DS : [EAX]; POP EDI; CMP AX, 8; POP ESI; JE SHORT check32; CMP AX, 0x10; JE SHORT check32; CMP AX, 0x20; JE SHORT check32; CMP AX, 0x40; JE SHORT check32; CMP AX, 0x80; JE SHORT check32; CMP AX, 0x100; JE SHORT check32; CMP AX, 0x200; JE SHORT check32; CMP AX, 0x400; JE SHORT check32; CMP AX, 0x800; JNZ SHORT check33; check32: INC BYTE PTR DS : [ECX + 0x23]; check33: RETN 8; check4: MOV EDX, ECX; PUSH EDI; MOV ECX, 0x1000; OR EAX, 0xFFFFFFFF; MOV EDI, DWORD PTR DS : [EDX] REP STOS DWORD PTR ES : [EDI]; XOR EAX, EAX; POP EDI; CMP WORD PTR DS : [EDX + 8], AX; JBE SHORT check1; PUSH ESI; MOV ESI, DWORD PTR DS : [EDX]; check2: MOVZX ECX, AX; MOV WORD PTR DS : [ESI + ECX * 4 + 2], AX; INC EAX; CMP AX, WORD PTR DS : [EDX + 8]; JB SHORT check2; POP ESI; check1: MOV AX, WORD PTR DS : [EDX + 0xA]; INC AX; MOV WORD PTR DS : [EDX + 0x24], AX; MOV AL, BYTE PTR DS : [EDX + 0x22]; INC AL; MOV BYTE PTR DS : [EDX + 0x23], AL; RETN; check23: PUSH EDI; MOV ECX, ESI; CALL check35; MOV EBX, EDI; JMP SHORT check26; check35: PUSH EBP; MOV EBP, ESP; PUSH ESI; PUSH EDI; MOV ESI, ECX; NOP; MOV AX, WORD PTR SS : [EBP + 8]; CMP AX, WORD PTR DS : [ESI + 8]; JNB SHORT check36; NOP; MOV ECX, DWORD PTR DS : [ESI]; MOV EDX, DWORD PTR DS : [ESI + 0x1C]; MOV EDI, DWORD PTR DS : [ESI + 0x30]; MOVZX EAX, AX; MOV AL, BYTE PTR DS : [ECX + EAX * 4 + 2]; MOV BYTE PTR DS : [EDX + EDI], AL; INC DWORD PTR DS : [ESI + 0x30]; NOP; MOV EAX, DWORD PTR DS : [ESI + 0x30]; CMP EAX, DWORD PTR DS : [ESI + 0x2C]; JNZ SHORT FuncRetn; MOV ECX, ESI; CALL check37; NOP; JMP SHORT FuncRetn; check36: MOVZX EDI, AX; MOV EAX, DWORD PTR DS : [ESI]; MOV ECX, ESI; SHL EDI, 2; MOV AX, WORD PTR DS : [EDI + EAX]; PUSH EAX; CALL check35; NOP; MOV EAX, DWORD PTR DS : [ESI]; MOV ECX, ESI; MOV AX, WORD PTR DS : [EDI + EAX + 2]; PUSH EAX; CALL check35; NOP; NOP; POP EDI; POP ESI; POP EBP; RETN 4; check38: MOVZX EDX, AL; MOVZX EDX, BYTE PTR DS : [EDX + ECX + 0xD]; ADD DWORD PTR DS : [ECX + 0x34], EDX; MOV EDX, DWORD PTR DS : [ECX + 0x34]; CMP EDX, DWORD PTR DS : [ECX + 0x28]; JB SHORT FuncRetrn2; INC AL; CMP AL, 4; MOV BYTE PTR DS : [ECX + 0xC], AL; JNB SHORT Frtn; MOVZX EAX, AL; MOVZX EAX, BYTE PTR DS : [EAX + ECX + 0xD]; SHR EAX, 1; MOV DWORD PTR DS : [ECX + 0x34], EAX; FuncRetrn2: MOV EAX, DWORD PTR DS : [ECX + 0x38]; MOV EDX, DWORD PTR DS : [ECX + 0x14]; IMUL EAX, DWORD PTR DS : [ECX + 0x34]; SUB EDX, EAX; MOV DWORD PTR DS : [ECX + 0x1C], EDX; Frtn: RETN; FuncRetn: NOP; POP EDI; POP ESI; POP EBP; RETN 4; check37: MOV AL, BYTE PTR DS : [ECX + 0xC]; AND DWORD PTR DS : [ECX + 0x30], 0; CMP AL, 0x0FF; JNZ SHORT check38; MOV EAX, DWORD PTR DS : [ECX + 0x38]; SUB DWORD PTR DS : [ECX + 0x1C], EAX; RETN; Finished: MOV ESP,EBP; POP EBP; //Debug VS Release build have different stack sizes. The following is needed for the return parameters and CTYPES #ifdef _DEBUG ADD ESI, 0x120; #else ADD ESI, 0x58; //Need for Pythnon CTypes return parameters! #endif RETN; } } return TRUE; } |
Figure 5 Our decompression routine
Taking assembly from IDA Pro or a disassembler such as Immunity Debugger isn’t one to one as it does require some work on your part. Unfortunately, you can’t take the assembly and expect things to just magically work. One area that requires special attention are the function calls made within your code block. Each assembly call needs a name (label) and all the code needs to be arranged in the proper calling order, otherwise you will receive unexpected results or crash. Also, it’s important that you copy the assembly for each function call that is made. In this sample, I used the word “check” to represent function names or jump locations, as I was quickly working my way through this.
Since LZW encodes data using an index into a dictionary the first thing the decompression routine does is allocate a buffer of memory 16,512 bytes (0x4080) to create the dictionary. From the assembly, it uses the C++ API malloc to allocate the buffer and then sets the buffer to NULL (this is how malloc works). A simpler and more efficient way is to use calloc function which reduces the number of instructions and allocates the buffer for you.
We start by coding this in C++ and then switch to Visual Studio inline assembly using the __asm keyword. The code block within the __asm keyword is where you will place your assembly instructions and make the necessary adjustments; not only for the code to compile, but also to ensure that the stack is aligned properly. In studying the decompression routine, the following instructions were necessary before we can begin to start executing the decompression routine.
- Set EBX to zero.
- Subtract 64 bytes (0x40) from the stack. Necessary to prevent us from overwriting any stack data
- Save our stack pointer into ESI
- EDI needs to point to our dictionary buffer created via calloc
- EAX needs to point to our source data
- EDX needs to point to our destination buffer
The following nine lines were manually added in-order to satisfy the requirements for the decompression algorithm. The remaining code was copied directly from Immunity Debugger.
1 2 3 4 5 6 7 8 9 |
xor ebx, ebx; //Need to clear ebx register SUB ESP, 0x40; //Need to subtract stack, so we don’t overwrite some Ctypes return data MOV ESI, ESP; PUSH EAX; POP EDI; //Our Temp Buffer PUSH[EBP + 8]; //Source Buffer POP EAX; PUSH[EBP + 0xC]; //Destination Buffer POP EDX; |
Figure 6 Setting up decompression routine requirements
At this point, all it takes is to update the assembly calls and jumps with meaningful names and arrange them in the correct order. Now the code should compile and run, but when our routine is finished you must restore the stack back, so it returns to the proper caller in this case Python ctypes. The following code was added:
1 2 3 4 5 6 7 8 9 10 11 |
Finished: MOV ESP,EBP; POP EBP; //Debug VS Release build have different stack sizes. The following is needed for the return parameters and CTYPES #ifdef _DEBUG ADD ESI, 0x120; #else ADD ESI, 0x58; //Need for CTypes return parameters!!!! #endif RETN; } |
Figure 7 Adjusting stack for return
Here we are restoring the stack pointer and base pointer and adding 0x120 or 0x58 to ESI depending if the DLL is a VS debug build or release build.
Now that we have a DLL we can begin to call into it and pass it data via Python and ctypes. The following Python script uses our DLL to decompresses Reaver data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
#------------------------------------------------------------------------------- # Name: LzwDecompression # Purpose: # # Author: Mike Harbison Unit 42 # # Created: 11/11/2017 #------------------------------------------------------------------------------- from ctypes import * import sys import os.path import argparse import re,struct import subprocess, random # MAP types to ctypes LPBYTE = POINTER(c_ubyte) LPCSTR = LPCTSTR = c_char_p BOOL = c_bool if os.name != 'nt': print ("Script can only be run from Windows") sys.exit("Sorry Windows only") def assert_success(success): if not success: raise AssertionError(FormatError()) def LzwDecompress(hdll,data): inbuf = create_string_buffer(data) outbuf= create_string_buffer(len(data)) success = hdll.Decompress(inbuf,outbuf) assert_success(success) return outbuf.raw def CabExtract(match,pargs,data): offset = match.start() CabHeaderMagicValue = offset + 124 CabSizeStart = offset + 132 CabFileNameStart = offset + 184 CabFileNameEnd = data[CabFileNameStart:].find('\0') CabName = data[CabFileNameStart:CabFileNameStart+CabFileNameEnd] CabSize = struct.unpack("L",data[CabSizeStart:CabSizeStart+4])[0] CabData = data[CabHeaderMagicValue:CabHeaderMagicValue+CabSize] FileName=pargs.input_file #Add magic value Cab="4D534346".decode('hex')+CabData[4:] print "Found our CAB Data at file offset-->{}".format(offset) CabDir=os.path.splitext(FileName)[0] if not os.path.exists(CabDir): os.makedirs(CabDir) else: CabDir+='_'+str(random.randint(1111,9999)) os.makedirs(CabDir) CabFile=os.path.basename(FileName).split('.')[0]+".cab" with open(CabDir+"\\"+CabFile,"wb") as fp: fp.write(Cab) print "Wrote CAB File-->%s"%CabDir+"\\"+CabFile print "Expanding CAB File %s"%CabName args = [" -r ",CabDir + "\\" + CabFile,' ',CabDir] result=subprocess.Popen("expand "+"".join(args), stdout=subprocess.PIPE) result.wait() if "Expanding Files Complete" not in result.stdout.read(): print "Error Expanding CAB file" sys.exit(1) ExpandedFile = CabDir + "\\" + CabName if not os.path.isfile(ExpandedFile): print "Did not find our expanded file %s"%CabName sys.exit(1) print "Check directory %s for expanded file %s"%(CabDir,CabName) return ExpandedFile def DecompressRoutine(pargs,hlzw,data): LzwCompPattern = "\x08\x00\xA5\x04\x01\x12\x03" regex = re.compile(LzwCompPattern) for match in regex.finditer(data): offset=match.start() print "Found our compression header at file offset-->{}".format(offset) Deflated=LzwDecompress(hlzw,data[offset:]) if Deflated: with open(pargs.out_file, "wb") as wp: wp.write(Deflated) print "Wrote decompressed stream to file-->%s"%(pargs.out_file) return True return False def Start(pargs,hlzw,data): CabCompPattern = bytearray("46444944657374726F790000464449436F7079004644494973436162696E657400000000464449437265617465000000636162696E65742E646C6C004D6963726F736F6674") #Check For CAB file magic value first found = False regex = re.compile(CabCompPattern.decode('hex')) for match in regex.finditer(data): found = True ExpandedFile=CabExtract(match,pargs,data) if ExpandedFile: with open(ExpandedFile,"rb") as fp: ExpandedData=fp.read() DecompressRoutine(pargs,hlzw,ExpandedData) return True if not found: result=DecompressRoutine(pargs,hlzw,data) if result: return True else: return False def main(): parser=argparse.ArgumentParser() parser.add_argument("-i", '--infile' , dest='input_file',help="Input file to process",required=True) parser.add_argument("-o", '--outfile', dest='out_file',help="Optional Output file name",required=False) results = parser.parse_args() if not results.out_file: results.out_file=results.input_file + "_dec.txt" lzwdll="LzwDecompress.dll" lzwdllpath = os.path.dirname(os.path.abspath(__file__)) + os.path.sep + lzwdll if os.path.isfile(lzwdllpath): lzw = windll.LoadLibrary(lzwdllpath) lzw.Decompress.argtypes=(LPCSTR,LPCSTR) lzw.Decompress.restypes=BOOL else: print ("Missing LzwDecompress.DLL") sys.exit(1) with open(results.input_file,"rb") as fp: FileData=fp.read() Success=Start(results,lzw,FileData) if not Success: print("Did not find CAB or Compression routine in file %s")%(results.input_file) if __name__ == '__main__': main() |
The Python script was recently updated to support multiple Reaver variants. The newer Reaver variants use Microsoft CAB compression as a first layer followed by LZW modified decompression. The script does the following:
- Loads our DLL LzwDecompress.dll
- Attempts to locate the magic signature values for the modified LZW header or Microsoft CAB
- For the LZW decompression routine creates two string buffers, which are pointers to a buffer. The source buffer is a pointer to the data that needs to be decompressed and the destination buffer is where we will store the decompressed data.
- Call the export named Decompress and pass it our two parameters
- Writes the data to a file
The following is an example of the script running:
Figure 8 Script decompressing data
The first example is of an older version of Reaver that uses the LZW decompression routine. The decompressed data is written to a text file that contains the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
RA@10001=ole32.dll RA@10002=CoCreateGuid RA@10003=Shlwapi.dll RA@10004=SHDeleteKeyA RA@10005=wininet.dll RA@10006=InternetOpenA RA@10007=InternetOpenUrlA RA@10008=InternetCloseHandle RA@10009=HttpQueryInfoA RA@10010=InternetReadFile [TRUNCATED] RA@10276=image/jpeg RA@10277=netsvcs RA@10282=Global\%sEvt RA@10283=\temp\%sk.~tmp RA@10284=Global\%skey RA@10285=%08x%s RA@10286=%s\ RA@10287=%s\*.* RA@10288=%s\%s RA@10289=CMD.EXE RA@10290=%s= RA@10311=\%sctr.dll RA@10312=\uc.dat RA@10313=ChangeServiceConfig2A RA@10314=QueryServiceConfig2A |
The next example is of a newer Reaver sample that added a layer of compression using Microsoft CAB.
Figure 9 Script expanding CAB file and decompressing data
Here the script found the magic values for Microsoft CAB, expanded the file, read in the expanded file, found the magic value in that file for the decompression routine and wrote the same decompressed data to a text file.
Conclusion
This blog has shown you that by taking the existing Reaver decompression routine straight from assembly, placing it into Visual Studio, compiling it into a DLL, then calling into it via Python saves us a considerable amount of time. You no longer must reimplement the routine in C or Python as you simply call the routine and pass it the same data as the malware would. The tradeoff is understanding assembly, stacks and knowing what registers the routine requires. Once armed with that knowledge it’s easy to implement and can be applied to any function within a binary.