The Gopher in the Room: Analysis of GoLang Malware in the Wild

By

Category: Unit 42

Tags: , , , ,

This post is also available in: 日本語 (Japanese)

Executive Summary

In recent months, I have taken a keen interest in malware written in the Go programming language. Go, sometimes referred to as GoLang, was created by Google in 2009 and has gained additional popularity within the malware development community in recent years.

While there have been an increased number of blogs in recent years discussing Go malware families, I wanted to know if this programming language was indeed on the rise when it pertained to malware. Additionally, I was curious what malware families would be most prevalent, as there is a notion among many that Go is primarily used by penetration testers and red teamers. With that in mind, I set out to collect as much malware written in Go as possible, and cluster it by malware family. The blog discusses my methodology of data collection and my results.

In total, roughly 10,700 unique malware samples written in Go were obtained. Based on the samples’ first seen timestamps, we can conclude that Go-compiled malware has been steadily on the rise for a number of months. Additionally, 92% of the samples identified were compiled for the Windows operating system, indicating that this is the most heavily targeted system by Go malware developers.

Of the samples, 75% were able to have their malware family identified. The most prominent malware families included Veil, GoBot2, and HERCULES. Additionally, the most prevalent malware groupings included Pentesting, Remote Access Trojans (RATs), and Backdoors.

For a point of clarification, the distinction made between RATs and Backdoors pertains to the malware family’s feature sets. Those that simply provided minimal functionality and provided remote access were labeled as a Backdoor, while fully featured Remote Access Trojans were labeled as such.

Why Write in Go?

Go has a number of features that might entice an attacker to use this particular programming language. Certainly one of the biggest draws to Go is the fact that a single codebase may be compiled for all of the major operating system platforms, including Windows, OSX, and Linux. This allows an attacker to focus on a single codebase that can be used to infect victims on various platforms, versus other programming languages that might require an attacker to have three different code repositories.

Other alternatives include using a universal scripting language, such as Python, to write their codebase. This was seen previously by the Chafer threat group that wrote one of their payloads in Python. The Seaduke malware family is another example of a threat group that took this approach. However, as Windows historically has not provided Python natively within the environment, in order for these codebases to properly execute in this environment, they must rely on packaging them using a utility such as PyInstaller. Such a tool, while accomplishing the job, leaves a number of traces in files it drops at runtime. Alternatively, Go leaves none of these artifacts, which may be to the benefit of the attacker.

Another positive to Go (or negative depending how you view it) is the fact that all necessary libraries are statically linked within the compiled binary. This typically results in a binary of a higher than average size. Of the 10,700 malware samples written in Go, the average size of these samples was 4.65MB. This is far larger than malware is typically, making it more difficult to use in Trojaned packages. Additionally, it may be more difficult to include in phishing emails as the attachment’s large size may not be permitted by the email server.

However, this large size has unexpected benefits as well. In certain circumstances, anti-virus products may ignore files, or be unable to scan them, in the event they are too large. This was witnessed in the past in targeted attacks involving the Comnie malware family, where the malware authors appended 64MB of garbage data to their files in an attempt to circumvent anti-virus products.

Methodology

To begin this research, I originally had to collect as many malware samples compiled in Go as I possibly could. However, this task alone proved to be relatively difficult. The repositories used for this research included both our own, as well as the third-party VirusTotal service. To start, I began by simply collecting all of the Go samples that could be identified, regardless of whether they were malicious.

To collect these samples, I took a number of approaches, including, but not limited to, the following:

  • OSX or Linux samples with embedded URLs referencing ‘Go.org’
  • Samples using the ‘Go-http-client/1.1’ user-agent
  • Samples using the ‘GRequests’ user-agent
  • PE samples containing the ‘.symtab’ section name
  • PE samples using a series of identified import hashes
  • OSX samples referencing Google’s gopacket github repository
  • OSX samples referencing gopkg.in
  • Samples matching YARA rules

Regarding the YARA rules, three separate rules were created to identify Go samples for each of the major platforms. As an example, the following rule was used to identify Go samples compiled for OSX:

rule osx_GoLang

{

meta:

     author = "Josh Grunzweig"

     description = "Attempts to identify samples written in Go     

     compiled for OSX."

     strings:

          $Go = "go.buildid"

     condition:

          (

               uint32(0) == 0xfeedface or

               uint32(0) == 0xcefaedfe or

               uint32(0) == 0xfeedfacf or

               uint32(0) == 0xcffaedfe or 

               uint32(0) == 0xcafebabe or

               uint32(0) == 0xbebafeca

          ) and

          $Go

}

Using these various techniques, I was able to aggregate roughly 611,000 unique samples.

After all sample hashes were acquired, I queried both our systems as well as VirusTotal to determine which samples were malicious. I simply checked for a verdict of ‘malware’ or 5 or more positives in the case of VirusTotal. Finally, I downloaded these remaining samples and ran the previously created YARA rules against them to confirm they were in fact Go samples. When it was all said and done, I was left with around 13,000 unique samples.

Now that I had my dataset, I began to work on clustering these samples into their respective malware families. To do this I primarily took a manual approach, analyzing a given file and creating YARA rules based on the identified malware family. To assist me, I also wrote some helper scripts to extract helpful information from the identified binaries. The following helper script attempts to extract user-defined function names from the binary, as well as any user-defined paths that may exist:

import sys

import re

inputfile = sys.argv[1]

fh = open(inputfile, 'rb')

fd = fh.read()

fh.close()

minimum = 5

char = r"[\t\n\x20-\x7f]" + "{{{},}}".format(minimum)

wchar = r"(([\t\n\x20-\x7f]\x00)" + "{{{},}}".format(minimum) + r"\x00)"

allStrings = []

for s in re.findall(char, fd):

     allStrings.append(s)

for s in re.findall(wchar, fd):

     allStrings.append(s[0].replace("\x00",''))

 

blacklist = []

allStr = []

for s in allStrings:

     if s[-3:] == ".go" and "main.go" in s:

          allStr.append(s)

     elif s[0:5] == "main.":

          if "statictmp" not in s:

               if ".init." not in s:

                    if ".(*" not in s:

                         allStr.append(s)

for x in list(set(allStr)):

     print(repr(x))

An example of running this is below:

$ python find_interesting_strings.py fc684bbf9428a4e33c390e3963c9bfa24e81cb040ccd601c6e7f5b6c193e2808.bin

'main.encryptFile'

'main.writeLog.func1'

'main.writeLog'

'main.init'

'main.scanDir'

'main.ignoreUsersFolders'

'main.ignoreRootFolders'

'main.encryptFile.func1'

'main.logFilePath'

'main.ignoreProgramFilesFolders'

'C:/Users/pc/go/src/scaner/main.go'

'main.ignoreProgramDataFolders'

'main.initdone'

'main.makeReadmeFile.func1'

'main.ignoreFiles'

'main.ignoreFileExtensions'

'main.main'

'main.makeReadmeFile'

'main.DEBUG'

This allowed me to easily determine which function names and code paths were most common, and in some cases allowed me to cluster samples based on this information alone. One example of a malware family’s YARA rule is as follows:

rule trojan_golang_hercules: Pentesting

{

     meta:

          author = "jgrunzweig - PaloAltoNetworks"

          date = "2019-06-15"

          description = "the HERCULES malware family written in     

          Go."

          hash1 = "2a7da0a0acadb61fb79fa4a33130d09ecff5a904b0999d264d8c1edffeffea95"

          hash2 = "6e68dafbb717daf6a505d8a95c41e5114d91c4fde703343356352c1ca5cd24ea"

          hash3 = "645ed38f2d55b2f7731d5c9223329428592497eb95c96bcd7c01a4eaeb38e137"

          reference = "https://github.com/EgeBalci/HERCULES"

     strings:

          $buildid = "go.buildid"

          $uniq1 = "cGFja2FnZSBtYWluCgppbXBvcnQgIm5ldCIKaW1wb3J0ICJvcy9leGVjIgppbXBvcnQgImJ1ZmlvIgppbXBvcnQgInN0cmluZ3MiCmltcG9ydCAic3lzY2FsbCIKaW1wb3J0ICJ0aW1lIgppbXBvcnQgIkVHRVNQTE9JVCIKCgoKY29uc3QgSVAgc3RyaW5nID0gIjEwLjEwLjEwLjg0Igpjb25zdCBQT1JUIHN0cmluZyA9ICI1NTU1IgoKY29uc3QgQkFDS0RPT1IgYm9vbCA9IGZhbHNlOw"

          $uniq2 = "cGFja2FnZSBtYWluCgoKaW1wb3J0ICJlbmNvZGluZy9iaW5hcnkiCmltcG9ydCAic3lzY2FsbCIKaW1wb3J0ICJ1bnNhZmUiCi8vaW1wb3J0ICJFR0VTUExPSVQvUlNFIgoKY29uc3QgTUVNX0NPTU1JVCAgPSAweDEwMDAKY29uc3QgTUVNX1JFU0VSVkUgPSAweDIwMDAKY29uc3QgUEFHRV9BbGxvY2F0ZVVURV9SRUFEV1JJVEUgID0gMHg0MAoKCnZhciBLMzIgPSBzeXNjYWxsLk5ld0"

          $uniq3 = "cGFja2FnZSBtYWluCgppbXBvcnQgIm5ldC9odHRwIgppbXBvcnQgInN5c2NhbGwiCmltcG9ydCAidW5zYWZlIgppbXBvcnQgImlvL2lvdXRpbCIKLy9pbXBvcnQgIkVHRVNQTE9JVC9SU0UiCgoKCmNvbnN0IE1FTV9DT01NSVQgID0gMHgxMDAwCmNvbnN0IE1FTV9SRVNFUlZFID0gMHgyMDAwCmNvbnN0IFBBR0VfQWxsb2NhdGVVVEVfUkVBRFdSSVRFICA9IDB4NDAKCnZhciBLMzIgPS"

         $path = "/HERCULES/"

 

         $banner = "HERCULES REVERSE SHELL"

         $help1 = "~DOS -A \"www.targetsite.com\""

         $help2 = "~WIFI-LIST "

         $help3 = "~KEYLOGGER-DUMP "

         $help4 = "Creates a reverse http meterpreter session at given pid (EXPERIMENTAL)"

 

     condition:

          (

                // Windows binary

               (uint16(0) == 0x5a4d) or

               // OSX binary

               (

                    (

                         uint32(0) == 0xfeedface or

                         uint32(0) == 0xcefaedfe or

                         uint32(0) == 0xfeedfacf or

                         uint32(0) == 0xcffaedfe or

                         uint32(0) == 0xcafebabe or

                         uint32(0) == 0xbebafeca)

                    ) or

                    // Linux binary

                    (uint32(0) == 0x464C457F)

               ) and

               filesize > 500KB and

               $buildid and

               (

                    any of ($uniq*) or

                    $banner or

                    any of ($help*) or

                    $path

               )

}

This manual approach also allowed me to identify false positives. When I was finally through, about 2,000 false positives had been identified. This brought my total malware sample count to 10,700. Of these, 75% were identified as malware.

A total of 53 unique malware families were identified during this research, with a YARA rule being created for each.

Results

Perhaps one of the most clear conclusions to draw from this research is the relatively small number of malware files compiled in Go that were identified. While it is possible this may have been a result of my methodology, I believe it’s a fairly accurate number overall. Go as a malware development language is still very much in its infancy, and has yet to truly gain high popularity among this community. That being said, looking at a timeline of the malware samples compiled in Go’s first seen dates, we can see that it appears to be growing in popularity.

Figure 1. Timeline of Go Malware samples based on first seen dates.

Another possibly interesting result from this research was identifying what operating systems for which Go malware samples were most frequently compiled. In total, the majority were written for Windows, which may come as no surprise for many.

Figure 2. Operating systems for which Go malware samples were compiled.

In total, 92% of the Go malware samples identified were compiled for the Windows operating system, 4.5% were compiled for Linux, and the remaining were compiled for OSX. As Windows continues to be the most heavily targeted platform by attackers, this data is not surprising. However, I personally thought going into this research that the Windows operating system would not have such a large percentage of the overall malware identified.

As I previously stated, a total of 53 malware families were identified during the course of this research. The results are as follows:

Malware Family Malware Count (Percentage)
Veil 3772 (47%)
GoBot2 1025 (12.8%)
HERCULES 475 (5.9%)
CHAOS 471 (5.9%)
Generic Coinminer 406 (5.1%)
Infostealer 360 (4.5%)
TinyBanker 182 (2.3%)
GoBrut 165 (2.1%)
Neshta 164 (2.1%)
ARCANUS 150 (1.9%)
Gandalf Botnet 120 (1.5%)
hershell 86 (1.1%)
rocke 60 (0.75%)
Infostealer Variant B 44 (0.55%)
GoBot 43 (0.53%)
Shellcode Loader Variant B 41 (0.51%)
Downloader Variant D 41 (0.51%)
ShurL0ckr 37 (0.46%)
Mirai 36 (0.45%)
merlin 35 (0.44%)
EGESPLOIT 32 (0.4%)
Downloader Variant B 32 (0.4%)
Mauri870 Ransomware Family 27 (0.34%)
nett Botnet 21 (0.26%)
gscript 18 (0.23%)
Malicious FireFox Extension Loader 14 (0.18%)
Supic Backdoor 12 (0.15%)
r2r2 11 (0.14%)
RobbinHood 10 (0.13%)
jimm Ransomware 9 (0.11%)
braincrypt 9 (0.11%)
Rakos 8 (0.1%)
TrumpHead Ransomware 7 (0.08%)
HTRAN 7 (0.08%)
Keylogger Variant A 7 (0.08%)
YourRansom Ransomware 5 (0.06%)
RDW 5 (0.06%)
Ransomware Variant A 5 (0.06%)
Italian Downloader 4 (0.05%)
Scanner Variant A 4 (0.05%)
Shifr Ransomware 4 (0.05%)
Shellcode Loader Variant A 4 (0.05%)
go-bot 3 (0.04%)
Exploit Utility Variant A 3 (0.04%)
Zebrocy 3 (0.04%)
Downloader Variant C 3 (0.04%)
Downloader Variant A 3 (0.04%)
RaaS Ransomware 3 (0.04%)
goshell 2 (0.03%)
TeleGrab 2 (0.03%)
gorsh 2 (0.03%)
Czech Downloader 1 (0.01%)
Bitfinex Lending Bot 1 (0.01%)
TOTAL: 7997 (100%)

Table 1. Go malware families identified.

To provide a different representation of the results, the individual malware families were placed into different categories based on their attributes and purpose. These results are illustrated below:

 

Figure 3. Go Malware Categories.

As we can see, a majority of the files identified are associated with penetration testing activity. While they can be used maliciously as well, penetration testing was their intended purpose. While a majority of the samples had this characteristic, we see a number of malware samples identified that have no legitimate purposes. RATs, Backdoors, Coinminers, and Information Stealers top the list of the remaining categories.

Conclusion

Overall, this research exercise proved to be enlightening to me personally for a number of reasons. While certain preconceived notions around the prevalence of pentesting-related Go malware were indeed confirmed, there was also a wealth of different true malware families present. These malware families ranged from backdoors to botnets to banking Trojans. The overall low number of malware samples identified also was an interesting data point, showing that generally speaking, Go malware still has not gained a significant interest from malware developers. However, the timelines of the identified malware sample’s first seen timestamps indicate that Go malware is gaining popularity. Looking at the specific timeframe of January to March between 2017 to 2019, we see a significant rise where the number of identified malware samples rose by a factor of almost 20 (+1944%).

Go malware is still in its infancy. However, it is gaining attention of both malware developers as well as the security community in general, as new malware families are discovered and published on frequently. Because the developers may compile a single code base against all major operating systems, it is my belief that Go will constitute a much larger market share of developed malware in the years to come, and should be on the security community’s proverbial radar.

All of the research mentioned within this blog post has been used to further protections within Palo Alto Networks’ product suite.

Indicators of Compromise

To assist the security community, I am releasing the full list of hashes and their YARA rule matches. They can be downloaded here.