Defeating BazarLoader Anti-Analysis Techniques

By

Category: Malware

Tags: ,

A conceptual image representing malware, such as BazarLoader, often known for anti-analysis techniques

This post is also available in: 日本語 (Japanese)

Executive Summary

Malware authors embed multiple anti-analysis techniques in their code to retard the analysis processes of human analysts and sandboxes. However, there are ways defenders can defeat these techniques in turn. This blog post describes two methods for faster analysis of malware that employs two distinctive anti-analysis techniques. The first technique is API function hashing, a known trick to obfuscate which functions are called. The second is opaque predicate, a technique used for control flow obfuscation.

The scripts that we are going to show here can be applied to BazarLoader, as well as other malware families that utilize similar anti-analysis techniques. As an illustration, we will show the IDAPython scripts we created during a recent analysis of BazarLoader with the reverse engineering tool IDA Pro to defeat these anti-analysis techniques. BazarLoader is a Windows backdoor that is used by various ransomware groups.

Palo Alto Networks customers are protected from malware families using similar anti-analysis techniques with Cortex XDR or the Next-Generation Firewall with the WildFire and Threat Prevention security subscriptions.

Primary Malware Discussed BazarLoader
Related Unit 42 Topics Malware, anti-analysis techniques

Table of Contents

Reusing Malware Code to Defeat Obfuscated API Calls
Automating Opaque Predicate Removal
Malware Analysts vs Malware Authors
Indicators of Compromise
Additional Resources

Reusing Malware Code to Defeat Obfuscated API Calls

Malware compiled as native files has to call Windows API functions to carry out malicious behaviors. The information on which functions are used is usually stored in the Import Address Table (IAT) in the file. Therefore, this table is often a good place to start the analysis process to get an idea of what the malware is trying to do.

To demonstrate, we focused on a BazarLoader sample we recently detected. After peeling away the packer layer of our BazarLoader sample, we saw that it doesn’t have an IAT (see Figure 1). Also, there is no IAT constructed during execution, a technique sometimes seen in other malware. BazarLoader obfuscates its function calls to make analysis more difficult and to evade detection techniques that rely on reading the IAT.

BazarLoader obfuscates its function calls to make analysis more difficult and to evade detection techniques that rely on reading the IAT.
Figure 1. Missing IAT in BazarLoader as seen with CFF Explorer.

In fact, BazarLoader resolves every API function to be called individually at run time. After we figured out that the functions are resolved during execution, the following function caught our attention as it was referenced more than 300 times:

BazarLoader resolves every API function to be called individually at run time, as shown here.
Figure 2. Function for resolving the obfuscated Windows API functions (marked in yellow).

While most pieces of malware rely on publicly known hashing algorithms to resolve the functions’ addresses, the one used by BazarLoader is unique. The API function resolution procedure (sub_18000B9B0, labelled as FN_API_Decoder) requires three parameters and returns the address of the requested function.

Now, we could reverse engineer the algorithm used in FN_API_Decoder and reimplement it in Python to get all functions resolved. However, this would take a lot of time and we would have to repeat the whole process for every piece of malware that uses a different hashing algorithm.

Instead, the approach we used is independent from the hashing algorithm as it makes use of the hashing function itself. For this, we used the Appcall feature with IDAPython in IDA Pro to call FN_API_Decoder and pass it the required parameters. The result from Appcall would be the resolved address of the Windows API function. The Appcall feature used while debugging the malware allows us to execute any function from the sample as if it were a built-in function.

Using the following code, we can run FN_API_Decoder to resolve Windows API function addresses while debugging the malware process.

Using the code shown here, we can run FN_API_Decoder to resolve Windows API function addresses while debugging the malware process.
Figure 3. Using Appcall with IDAPython.

Next, we gathered all the required parameters by looking up all the cross references to FN_API_Decoder. The following code will search and extract the required parameters for resolving the API function calls.

The code shown here will search and extract the required parameters for resolving the API function calls.
Figure 4. IDAPython code to search and extract the three parameters.

Finally, by using the returned value from Appcall we are able to rename all the dynamic calls to the APIs to their corresponding names and apply comments:

By using the returned value from Appcall we are able to rename all the dynamic calls to the APIs to their corresponding names and apply comments.
Figure 5. IDAPython code to locate dynamic calls.

Putting the above steps together, we deobfuscated the API function calls:

We address anti-analysis techniques by deobfuscating the API function calls, as shown here.
Figure 6. Before executing the above IDAPython scripts.
A second step in deobfuscating API function calls (renaming with added comment)
Figure 7. Renamed API function call with added comment.

After all the API function calls are renamed, we can now easily locate other interesting functions in the malware. For example, sub_1800155E0 is the procedure in BazarLoader that carries out code injection.

After all the API function calls are renamed, we can now easily locate other interesting functions in the malware. For example, sub_1800155E0 is the procedure in BazarLoader that carries out code injection.
Figure 8. Before renaming API calls.
Anti-analysis techniques can include obfuscated API calls, as shown here - these are labeled with APIs related to code injection.
Figure 9. Obfuscated API calls labeled with APIs related to code injection.

With the help of our IDAPython scripts, we are now able to faster assess which functionality this BazarLoader sample contains.

Automating Opaque Predicate Removal

Opaque Predicate (OP) is used in BazarLoader to protect it from reverse engineering tools. OP is an expression that evaluates to either true or false at runtime. Malware authors make use of multiple OPs together with unexecuted code blocks to add complexities that static analysis tools have to deal with.

The following disassembled code shows one of the OPs in Bazarloader:

The disassembled code shown here showns one of the opaque predicates used in BazarLoader to protect it from reverse engineering tools - one of the common anti-analysis techniques covered here.
Figure 10. One example of OP in BazarLoader.

From the above control flow graph (CFG), the code flow won’t end up in infinite loops (Figure 10, red code blocks). Therefore, the above OP will be evaluated to avoid the infinite loop.

We can demonstrate the extent of the challenge OPs pose to malware analysts. The following CFG shows the unexecuted code blocks (Figure 11, red code blocks) in one of the smaller functions (sub_18000F640) in the sample.

Red blocks in the control flow graph shown here represent unexpected code blocks in one of the smaller functions in the BazarLoader sample.
Figure 11. sub_18000F640 function in BazarLoader with unexecuted code blocks colored in red.

We could manually patch away the code blocks that are not executed as we analyze each function in the sample, but this is not very practical and takes a lot of time. Instead, we will choose a smarter way by doing it automatically.

First, we have to locate all the OPs. The most common way to do this is to make use of the binary search mechanism in IDA Pro to find all the byte sequences of the OPs. This turns out not to be possible, as the OPs were likely generated by a compiler during the build process of the malware sample. There are just too many variants of the OPs that could be covered using the byte sequence.

Not only do we need to locate the OPs, we also have to know the exact point when the malware sample decides to avoid the unexecuted code blocks.

Using the following code, we locate the OPs in a function:

We need to located the OPs, as well as the exact point when the malware sample decides to avoid the unexecuted code blocks. The code shown here allows us to locate the OPs in a function.
Figure 12. IDAPython code to locate the OPs in a function.

Next, we have to patch the instructions in OPs to force the code flow away from the unexecuted code blocks.

Using the following code, we patch the OPs in a function:

The code shown here patches the instructions in OPs to force the code flow away from the unexecuted code blocks.
Figure 13. IDAPython code to patch the OPs.

The OPs also messed with the output of the HexRays decompiler. This is how the function (sub_18000F640) looks before the OPs are patched:

This shows how the function (sub_18000F640) looks before the OPs are patched.
Figure 14. Decompiled sub_18000F640 function.

After applying the two techniques above, we have decompiled pseudocode that is much easier to read and understand.

After patching all the OPs and renaming the obfuscated API calls, we could then tell that the function (sub_18000F640) is just a wrapper function for GetModuleFileNameW().

After applying the two techniques described in the text to defeat anti-analysis techniques, we have decompiled pseudocode that is much easier to read and understand, as shown.
Figure 15. Decompiled sub_18000F640 function after removing the OPs.

Malware Analysts vs Malware Authors

Malware authors often include anti-analysis techniques with the hope that they will increase the time and resources taken for malware analysts. With the above script snippets showing how to defeat these techniques for BazarLoader, you can reduce the time needed to analyze malware samples of other families that use similar techniques.

Palo Alto Networks customers are further protected from malware families using similar anti-analysis techniques with Cortex XDR or the Next-Generation Firewall with the WildFire and Threat Prevention cloud-delivered security subscriptions.

Indicators of Compromise

BazarLoader Sample ce5ee2fd8aa4acda24baf6221b5de66220172da0eb312705936adc5b164cc052

Additional Resources

Complete IDAPython script to rename or resolve obfuscation API calls is available on GitHub.

Complete IDAPython script to search and patch Opaque Predicates in a function is available on GitHub.