This post is also available in: 日本語 (Japanese)
Executive Summary
Malware authors embed multiple anti-analysis techniques in their code to retard the analysis processes of human analysts and sandboxes. However, there are ways defenders can defeat these techniques in turn. This blog post describes two methods for faster analysis of malware that employs two distinctive anti-analysis techniques. The first technique is API function hashing, a known trick to obfuscate which functions are called. The second is opaque predicate, a technique used for control flow obfuscation.
The scripts that we are going to show here can be applied to BazarLoader, as well as other malware families that utilize similar anti-analysis techniques. As an illustration, we will show the IDAPython scripts we created during a recent analysis of BazarLoader with the reverse engineering tool IDA Pro to defeat these anti-analysis techniques. BazarLoader is a Windows backdoor that is used by various ransomware groups.
Palo Alto Networks customers are protected from malware families using similar anti-analysis techniques with Cortex XDR or the Next-Generation Firewall with the WildFire and Threat Prevention security subscriptions.
Primary Malware Discussed | BazarLoader |
Related Unit 42 Topics | Malware, anti-analysis techniques |
Reusing Malware Code to Defeat Obfuscated API Calls
Malware compiled as native files has to call Windows API functions to carry out malicious behaviors. The information on which functions are used is usually stored in the Import Address Table (IAT) in the file. Therefore, this table is often a good place to start the analysis process to get an idea of what the malware is trying to do.
To demonstrate, we focused on a BazarLoader sample we recently detected. After peeling away the packer layer of our BazarLoader sample, we saw that it doesn’t have an IAT (see Figure 1). Also, there is no IAT constructed during execution, a technique sometimes seen in other malware. BazarLoader obfuscates its function calls to make analysis more difficult and to evade detection techniques that rely on reading the IAT.
In fact, BazarLoader resolves every API function to be called individually at run time. After we figured out that the functions are resolved during execution, the following function caught our attention as it was referenced more than 300 times:
While most pieces of malware rely on publicly known hashing algorithms to resolve the functions’ addresses, the one used by BazarLoader is unique. The API function resolution procedure (sub_18000B9B0, labelled as FN_API_Decoder) requires three parameters and returns the address of the requested function.
Now, we could reverse engineer the algorithm used in FN_API_Decoder and reimplement it in Python to get all functions resolved. However, this would take a lot of time and we would have to repeat the whole process for every piece of malware that uses a different hashing algorithm.
Instead, the approach we used is independent from the hashing algorithm as it makes use of the hashing function itself. For this, we used the Appcall feature with IDAPython in IDA Pro to call FN_API_Decoder and pass it the required parameters. The result from Appcall would be the resolved address of the Windows API function. The Appcall feature used while debugging the malware allows us to execute any function from the sample as if it were a built-in function.
Using the following code, we can run FN_API_Decoder to resolve Windows API function addresses while debugging the malware process.
Next, we gathered all the required parameters by looking up all the cross references to FN_API_Decoder. The following code will search and extract the required parameters for resolving the API function calls.
Finally, by using the returned value from Appcall we are able to rename all the dynamic calls to the APIs to their corresponding names and apply comments:
Putting the above steps together, we deobfuscated the API function calls:
After all the API function calls are renamed, we can now easily locate other interesting functions in the malware. For example, sub_1800155E0 is the procedure in BazarLoader that carries out code injection.
With the help of our IDAPython scripts, we are now able to faster assess which functionality this BazarLoader sample contains.
Automating Opaque Predicate Removal
Opaque Predicate (OP) is used in BazarLoader to protect it from reverse engineering tools. OP is an expression that evaluates to either true or false at runtime. Malware authors make use of multiple OPs together with unexecuted code blocks to add complexities that static analysis tools have to deal with.
The following disassembled code shows one of the OPs in Bazarloader:
From the above control flow graph (CFG), the code flow won’t end up in infinite loops (Figure 10, red code blocks). Therefore, the above OP will be evaluated to avoid the infinite loop.
We can demonstrate the extent of the challenge OPs pose to malware analysts. The following CFG shows the unexecuted code blocks (Figure 11, red code blocks) in one of the smaller functions (sub_18000F640) in the sample.
We could manually patch away the code blocks that are not executed as we analyze each function in the sample, but this is not very practical and takes a lot of time. Instead, we will choose a smarter way by doing it automatically.
First, we have to locate all the OPs. The most common way to do this is to make use of the binary search mechanism in IDA Pro to find all the byte sequences of the OPs. This turns out not to be possible, as the OPs were likely generated by a compiler during the build process of the malware sample. There are just too many variants of the OPs that could be covered using the byte sequence.
Not only do we need to locate the OPs, we also have to know the exact point when the malware sample decides to avoid the unexecuted code blocks.
Using the following code, we locate the OPs in a function:
Next, we have to patch the instructions in OPs to force the code flow away from the unexecuted code blocks.
Using the following code, we patch the OPs in a function:
The OPs also messed with the output of the HexRays decompiler. This is how the function (sub_18000F640) looks before the OPs are patched:
After applying the two techniques above, we have decompiled pseudocode that is much easier to read and understand.
After patching all the OPs and renaming the obfuscated API calls, we could then tell that the function (sub_18000F640) is just a wrapper function for GetModuleFileNameW().
Malware Analysts vs Malware Authors
Malware authors often include anti-analysis techniques with the hope that they will increase the time and resources taken for malware analysts. With the above script snippets showing how to defeat these techniques for BazarLoader, you can reduce the time needed to analyze malware samples of other families that use similar techniques.
Palo Alto Networks customers are further protected from malware families using similar anti-analysis techniques with Cortex XDR or the Next-Generation Firewall with the WildFire and Threat Prevention cloud-delivered security subscriptions.
Indicators of Compromise
BazarLoader Sample ce5ee2fd8aa4acda24baf6221b5de66220172da0eb312705936adc5b164cc052
Additional Resources
Complete IDAPython script to rename or resolve obfuscation API calls is available on GitHub.
Complete IDAPython script to search and patch Opaque Predicates in a function is available on GitHub.