Malware

A Mega Malware Analysis Tutorial Featuring Donut-Generated Shellcode

7 min read

Executive Summary

We created an in-depth malware analysis tutorial featuring shellcode generated by a tool named Donut. The tutorial walks through a single infection chain from end to end, starting with a sample, and assuming no prior knowledge of the malware in question.

By the end of the tutorial, readers will better understand many components of the infection chain and identify the family of the final payload. The tutorial is designed to be a beginner-friendly lesson for those who understand the basics of malware analysis but have yet to analyze many samples in the wild on their own.

With the help of this tutorial, we hope that readers will:

Become familiar with common malware analysis tools like dnSpy, IDA Pro, x64dbg and ProcessHacker
Learn how to leverage both static and dynamic analysis to form a complete picture of malware behavior
Recognize common techniques used by malware in its natural context, such as:
- Dynamic API resolution
- Process injection
- Bypassing AMSI by using memory patching
Gain insight on how malware analysts at Palo Alto Networks might approach an unknown sample in their daily operations

The infection chain in this tutorial is composed of different stages, each playing a different role. These stages include downloading the initial malware, hiding traces of malicious activity and dropping the final payload.

Along the way, we record every step in our analysis, and we explain our thought process behind each decision. We explain not only what the malware sample is doing, but also the reasons why a malware sample might do the observed activity.

Due to the large size of the tutorial, we have included a small excerpt in this article as a preview. To read the tutorial in its entirety, please view it on our GitHub page.

Palo Alto Networks customers are better protected from the malware reviewed in this tutorial through the following products and services:

Cortex XDR and XSIAM
Our Next-Generation Firewall with Cloud-Delivered Security Services, including

If you think you might have been compromised or have an urgent matter, contact the Unit 42 Incident Response team.

Related Unit 42 Topics	Shellcode, Static Analysis

Excerpt of Donut Malware Analysis Tutorial

This excerpt features the analysis of an unknown function in the Donut-generated shellcode used during the attack chain. The analysis helps explain some basic techniques using IDA Pro as a disassembler and decompiler and x64dbg as a debugger.

The screenshot below shows the decompiled shellcode in IDA Pro. The unknown function is sub_10A31A, highlighted in a red box in Figure 1. This unknown function does not take any arguments.

Screenshot of computer code in an IDE, highlighting function definitions and calls, with specific lines marked in red to indicate errors or warnings. — Figure 1. Decompiled shellcode viewed in IDA Pro.

Using x64dbg as a debugger for this shellcode, we can view the content of the EAX register from the sub_10A31A function. The EAX register merely returns the address of the function, which is 06CDA31 as Figure 2 shows.

Text displaying two alphanumeric codes, "EAX" in red and "06CDA31A" in red. — Figure 2. The return value of sub_10A31A.

Figure 3 below shows the decompiled code of the sub_10A31A function.

Screenshot of a simple C programming code involving a function. The function is defined to return an integer and involves pointer operations. — Figure 3. The decompiled code of sub_10A31A.

This function is extremely simple because it just returns the address of the function, so it matches what we just observed in x64dbg. But what is the purpose of returning the address of the function? Let’s return to the debugger to find some clues.

Stepping through the shellcode in x64dbg, the Extended Instruction Pointer (EIP) is on the first instruction, call 6CDA31A as shown below in Figure 4. The operand of the call instruction, 6CDA31A, is the address of the sub_10A31A function.

Screenshot of a computer debug screen highlighting code operations with assembly language, including call and mov instructions, and memory addresses in hexadecimal notation. The first line is highlighted. — Figure 4. The call to sub_10A31A as shown in x64dbg.

This function calls the instructions starting at 0x06CDA31A. Figure 5 below shows these instructions.

Screenshot of a segment of computer code, highlighting various operations and memory addresses in different colors. — Figure 5. Instructions at 0x06CDA31A shown in x64dbg.

We can find the same instructions for this function by viewing the shellcode in IDA. However, IDA shows the same instruction as call $+5 in the disassembled code as Figure 6 below shows, in the red box.

Screenshot of computer code in an IDE, highlighting a subroutine call at an address with 'call' command in red text. — Figure 6. The assembly instructions of sub_10A31A in IDA.

Let’s break down the call $+5 instruction shown in IDA:

$+5 just means “the current address (EIP) plus 5.” With a value of E8 00 00 00 00, the full call instruction is 5 bytes, so $+5 effectively refers to the instruction immediately after the call instruction (i.e., the address of the pop eax instruction).
call pushes the return address (i.e., the address right after the call instruction) onto the stack and jumps to the operand of the call instruction.

Putting these two facts together, call $+5 means “push the address immediately after the call instruction onto the stack and then jump to that address.”

This might seem like a very roundabout way of pushing the address of the next instruction onto the stack, but the x86 instruction set does not provide a more straightforward way of doing so. An instruction like push eip+5 is not valid, as EIP cannot be used directly as an operand.

Let’s turn our attention back to the debugger to observe this in action. The instruction call 6CDA31F pushes 0x06CDA31F onto the stack and then jumps to 0x6CDA31F as shown in Figure 7.

Image showing a computer screen with hexadecimal code and arrow indicators highlighting specific segments of the code in different colors. — Figure 7. The operand of the call instruction is also the address of the next instruction.

Now that 0x06CDA31F is on the stack, it gets stored in the EAX register with the pop eax instruction as shown in Figure 8.

Text displaying "EAX 06CDA31F" in red on a white background. — Figure 8. EAX after the pop eax instruction.

And then we subtract 5 from 0x06CDA31F with the sub eax, 5 instruction as shown in Figure 9.

As we observed when we first stepped over sub_10A31A, the result is that 0x06CDA31A gets stored in EAX.

The sequence of instructions inside sub_10A31A is commonly used to implement PC-relative addressing and allows the shellcode to be position-independent. Why is this important? Just like any program, malware may have some resources that it needs to access.

Resources can be accessed via absolute addresses or an offset relative to a base address. Regular PE files can access resources using absolute addresses because the PE loader applies relocation adjustments if the program is loaded into a memory region different from its preferred base address. However, shellcode doesn’t have this capability and thus must rely on relative addresses.

By calling sub_10A31A, the shellcode can access the resources it needs by using an offset relative to the address of sub_10A31A in memory. We can then look at the decompiled code in Figure 10 to see how it’s used. The address returned by sub_10A31A (which we’ll now call get_pc) is used in the second argument of memcpy to access the address of the source buffer.

Screenshot of computer code in an IDE, featuring functions and parameters highlighted in red and blue. — Figure 10. The decompiled code after renaming sub_10A31A.

Conclusion

Analyzing malware is a very detailed and complex process. Through the full tutorial, we hope to help others improve their skills in malware analysis through a step-by-step analysis of an infection chain.

If you found this excerpt interesting, please read the full tutorial. Happy analyzing!

Palo Alto Networks customers are better protected from the shellcode discussed in this article through the following products:

The Advanced WildFire machine-learning models and analysis techniques have been reviewed and updated in light of the indicators shared in this research.
Advanced URL Filtering and Advanced DNS Security identify known domains and URLs associated with this activity as malicious.
Cortex XDR and XSIAM are designed to prevent the execution of known malicious malware, and also prevent the execution of unknown malware using Behavioral Threat Protection and machine learning based on the Local Analysis module.

If you think you may have been compromised or have an urgent matter, get in touch with the Unit 42 Incident Response team or call:

North America: Toll Free: +1 (866) 486-4842 (866.4.UNIT42)
UK: +44.20.3743.3660
Europe and Middle East: +31.20.299.3130
Asia: +65.6983.8730
Japan: +81.50.1790.0200
Australia: +61.2.4062.7950
India: 00080005045107

Palo Alto Networks has shared these findings with our fellow Cyber Threat Alliance (CTA) members. CTA members use this intelligence to rapidly deploy protections to their customers and to systematically disrupt malicious cyber actors. Learn more about the Cyber Threat Alliance.