Security Technology

Tailoring Sandbox Techniques to Hidden Threats

Clock Icon 9 min read
Related Products

This post is also available in: 日本語 (Japanese)

Executive Summary

Malware authors often throw curve balls that are meant to confound automated detection systems. We’ve adapted to these techniques by tailoring our analysis platform in a couple of notable ways that we’ll discuss, particularly to address malware that engages in sandbox evasion.

The first is an approach we call “dependency emulation,” where we can successfully detonate malware samples that would not normally be able to execute in our sandbox environment due to a lack of dependencies. We will also discuss our strategies of stealthy instrumentation for analysis of encrypted network traffic, to combat the ways SSL can be used to complicate command-and-control (C2) traffic analysis.

One overarching goal malware authors share is to remain undetected until their overall objective is reached. As we’ve previously detailed in this blog post series, this has resulted in a seemingly limitless number of evasions and strategies targeted at confounding sandboxing and static analysis approaches. Due to the endless cat-and-mouse nature of reacting to malware authors’ efforts to remain undetected, we’ve come to understand that the most important characteristic for any automated malware analysis platform is adaptability.

Palo Alto Networks customers receive improved detection for the evasions discussed in this blog through Advanced WildFire.

Related Unit 42 Topics Sandbox Evasion, Memory Detection

Dependency Emulation

In the context of malware execution, we talk a lot about dependencies. When we do, we’re referring to additional code that an executable depends on that is not inside the executable file. On Windows, external dependencies are usually Dynamically Linked Libraries (DLLs) that contain additional functionality that can be imported into the process to execute.

Often malware does not require external libraries in order to execute its malicious code, or it might call functions from those external libraries in a very late stage of its execution. Over the past few years we’ve noticed a large uptick in malicious Windows executables that require external libraries for things like graphical user interfaces or better cross-platform support, and if the libraries are missing, then the malicious sample won’t be able to run. When Windows cannot load an application because of a missing library, it will show a “System Error” error message box that displays the missing library name.

When we run any executable in our sandbox environment that requires dependencies that are not present, we would see the same error. But we wouldn’t be able to get any analysis information from the sample because no malicious code was executed.

In Figure 1, we can see a screenshot of the ATMSpitter malware family, which requires the DLL MSXFS.dll to fully detonate. Without the required DLL, we would just see an error dialog telling us that Windows can’t run this file. If we place the correct file inside the same directory and run the sample again, it runs just fine and we can extract information such as executed system calls and modified memory regions.

Image 1 is a screenshot of a system error popup which says “The program can’t start because MSXFS.dll is missing from your computer. Try reinstalling the program to fix this problem.”
Figure 1. ATMSpitter malware requires MSXFS.dll to run.

File infectors like Sality, Floxif, Expiro and Ramnit typically infect predefined sets of executables that they find on the disk. If the infected file requires specific libraries that are not part of the operating system (OS), then chances are it’s not going to detonate inside a malware analysis sandbox because these dependencies likely won’t be preinstalled in the environment. Our sandbox environments are constrained in resources like disk space, and the number of DLL files in the world is seemingly infinite.

Normally, when we run into this situation and the OS loader refuses to execute our file, we are left with no data at all from dynamic analysis. What is even more frustrating about this situation is that in many cases, those dependencies don’t matter for infected files in that they aren’t even going to be used.

File infectors usually call their malicious code very early (e.g., by replacing an early call instruction to their malicious code, or by overwriting the entry point in the PE header) before any function calls from external libraries are ever called. The OS will refuse to load and execute our file, despite the fact that none of the dependencies mattered anyway.

Example File Infector

Let’s take a look at one file that has been infected by the Sality malware, which requires external dependencies to run inside a sandbox. The original file is Groove Audit Service from Microsoft Office, and it requires dependencies from the libraries GrooveNew.dll and GroveUtil.dll (shown in Figure 2). If those two libraries are missing on the system, then the sample won’t be able to run.

Image 2 is a screenshot of the Groove Audit Service file. It shows the required dependencies needed to run the sample.
Figure 2. The required dependencies.

If we compare the original file to the infected file inside a PE viewer like StudPE (shown in Figure 3), we notice that the infector has modified only a few values in the PE headers including the size of the image, checksum and section flags to make the last section executable.

Image 3 is a screenshot of the software StudPE comparing an infected file with the original file. It shows options including the PE structure and Binary as methods, the Target File path, and the structure itself broken into columns.
Figure 3. Sality infected executable compared to the original file.

But if we open both the executables inside IDA Pro, we immediately see that the code at the entry point is totally different compared to the original (shown in Figures 4 and 5, below). The malicious code in the infected file is being called right at the entry point and doesn’t require any dependencies to fully infect the system. However, because of the required dependencies, regular malware analysis sandboxes won’t be able to run this kind of file.

Image 4 is a screenshot of many lines of code showing the original entry point.
Figure 4. Original entry point.
Image 5 is a screenshot of many lines of code showing the modified entry point.
Figure 5. Modified entry point.

Mitigating the Dependency Problem

What can we do about this dependency problem for analysis at scale? Dependency emulation is an approach we’ve recently prototyped and found useful to address this.

The main idea of this approach is to detect this situation automatically, and to adapt our sandbox environment to the files being detonated to ensure executables will have a chance to detonate. Since we have full control over the sandbox, it’s easy to determine that the executable under analysis requires an external library, and to lie to the executable that all of its dependency requirements are met.

Given the million or so unique Windows executables we process daily, it would be an extremely difficult (if not impossible) task to automatically detect and add the specific external libraries for each executable that required external dependencies. There are literally hundreds of millions of libraries out there with numerous versions for each, and it’s not really feasible to support providing any arbitrary library in a way that guarantees the intended execution of every sample.

However, as we’ve discussed in our initial post in this blog series, we often don't need full execution for accurate detection. If we can get the malware sample under analysis to get past the Windows loader, initialize itself and unpack any additional payloads, we have a much better shot at detection. Otherwise, we simply get an error from the Windows loader (as shown in Figure 1) and no additional information beyond what static analysis would buy us.

But what happens when the executable tries to call a function in one of those dependencies? The bad news is that, since we aren’t trying to provide actual copies of the required libraries, we can’t correctly simulate the function calls into any of the libraries not present in our environment.

The good news is that, in most cases, this is still enough for accurate detection. We don’t need to correctly execute every code path present in the executable, we just need enough execution and memory artifacts for accurate detection. For this reason, in order to avoid unintentional crashes or bugs from the analyzed sample, our sandbox stops the execution if any function from the generated stub libraries is called.

All things considered, with this approach we've seen a lot of samples detonate that wouldn't otherwise, and we've seen many correct detections.

VMI SSL/TLS Decryption

Another challenge we’ve faced with analyzing malware at scale is how to stealthily spy on TLS encrypted traffic in a way that’s not detectable by the malware. As HTTPS and other protocols built on SSL have become increasingly ubiquitous for accessing resources across the internet, it’s not surprising that malware authors also use this protocol for other things, such as pulling down the next stage of their payload, or C2 communications.

It is possible, of course, to take the easy route and intercept and decrypt all of the traffic from our sandbox with a meddler-in-the-middle approach. The problem with this is that it’s detectable by examining and validating TLS certificates. It’s also possible to instrument and log calls to some of the higher level networking APIs, but again, this is also detectable in many cases and isn’t going to get us all encrypted communications.

One solution we’ve found useful for this problem is to detect when TLS connections are being initiated and log the symmetric keys generated for the SSL/TLS connection using virtual machine introspection (VMI).

Before we look into that detection method, let’s have a quick lesson on TLS handshakes.

TLS Handshakes

When an SSL/TLS connection is being established, certain data between the local system and the remote server is exchanged to validate each side's authenticity and generate symmetric keys to encrypt data over the wire.

This data includes the following:

  • Preferred TLS version
  • Supported cipher suites
  • 32 bytes of random data used in the key generation algorithm
  • A data signature, in the case of an ephemeral elliptic-curve Diffie-Hellman (ECDHE) key exchange

An example of an ECDHE “Server Hello” message is shown in Figure 6.

Image 6 is a screenshot of Wireshark. There are three red highlights showing, from top down, the version, the random tab, and the cipher suite. It is the initial SSL connection handshake.
Figure 6. Wireshark screenshot illustrating the initial handshake for an SSL connection.

In the case of a Diffie-Hellman (DH) handshake, additional DH parameters are then exchanged, and each side separately calculates a pre-master secret. Finally, the same master secret key is calculated on each side by utilizing the pre-master secret and the client and server random values.

The calculated master secret is 48 bytes in length. Through reverse engineering efforts, we discovered that this key generation occurs in the ncrypt.dll library in recent versions of the Windows OS.

The key generation happens in memory, so the key is never written to disk and it doesn’t traverse the wire. By knowing the exact location in code where each master key is generated and precisely where it resides in memory, it is possible to extract the key. This can be accomplished with, for example, an analysis engine via virtual machine introspection, which makes a system invisible to the malware. A similar approach can be implemented in instances where other third party libraries are used for key generation, such as OpenSSL.

An example output, which includes everything needed to decrypt TLS traffic, is shown in Figure 7.

Image 7 is a screenshot of an extracted random value and a master key.
Figure 7. Extracted random value and master key.

Final keys and initialization vectors used to encrypt and decrypt network traffic are generated from the master secret through a key expansion. Conveniently, the open source tool Wireshark/Tshark accepts a keyfile to perform this decryption automatically on collected pcap traffic.

Example SSL Decryption With Wireshark

Let's take a look at a real world example of a sample that belongs to the malware family FormBook. The sample is loaded by a loader written in Delphi, which downloads the next stage from a public file storage service using HTTPS.

If we open the recorded package capture in Wireshark and provide the extracted SSL/TLS keys that were extracted by the analysis platform, we should be able to see the SSL encrypted traffic to the public file storage service, including the downloaded data.

The random client value and the master secret key must be stored in a specific file format that Wireshark supports. It looks like this:

To provide the decryption keys to Wireshark, we have to use the option ssl.keylog_file, which we can pass through the command line. For example:

After opening the capture (pcap) in Wireshark, the TLS encrypted data will be decrypted, and we can use the http filter to get the decrypted HTTPS traffic. This is shown in Figure 8.

Image 8 is a screenshot of Wireshark showing the decrypted HTTPS traffic.
Figure 8. Decrypted HTTPS traffic.

In Figures 9 and 10, we can see the communication to the public file storage service.

Image 9 is a screenshot of Wireshark showing the communication in a followed HTTP stream to the public file storage service.
Figure 9. Communication to the public file storage service.
Image 10 is a screenshot of Wireshark showing the communication in a followed HTTP stream to the public file storage service.
Figure 10. Downloaded data from public file storage.

With the decrypted content, we can write more precise behaviors and signatures that match a malicious sample's network communication.

Conclusion

In this post we have discussed two important ways we’ve been able to tailor our analysis environment. As we’ve mentioned in previous posts, adaptability in sandboxing is exceptionally important when it comes to staying on top of the malware detection problem. Threats are continually evolving, and architecting analysis systems as more of a flexible, nicely abstracted software development kit instead of a stand-alone monolithic application is crucial.

With this in mind, we’ve gone over two important ways our platform adaptability has paid off in Advanced WildFire. With Dependency Emulation, we can increase our detonation rates to collect more execution logs necessary for accurate detection. With VMI SSL/TLS Decryption, we have visibility into all SSL/TLS communications in a way that isn’t detectable by malware authors.

Once again, if you’ve made it this far, thank you again for reading and we hope you enjoyed this peek into some of the more interesting work we’ve been doing as part of our never-ending endeavor to accurately detect every last malicious file the global internet has to offer us.

Indicators of Compromise

SHA-256 Name
c5b43b02a62d424a4e8a63b23bef8b022c08a889a15a6ad7f5bf1fd4fe73291f ATMSpitter
a3b2de8f0d648f3e157300d0a88971919eb273b7d1c7b9ed023f26b5cc0ac3ca Sality infected file
4e32a6000a2b33ed0b8e4cf1256876c356cf5508ce0df2752fcfa214b6c2795b Formbook (SSL Decryption)

 

Enlarged Image