Malware

Angler Exploit Kit Continues to Evade Detection: Over 90,000 Websites Compromised

16 min read

By:
- Yuchen Zhou
- Wei Xu
Published:11 January, 2016 at 9:45 AM PST
Categories:
- Malware
- Threat Research
Tags:

Exploit Kits (EK), arguably the most impactful malicious infrastructure on the Internet, constantly evolve to evade detection by security technology. Tremendous effort has been spent on tracking new variations of different EK families. In this report, we look at an EK from an operational point of view. Specifically, we have been tracking the activity of the notorious Angler Exploit Kit and have uncovered traces of what we believe to be a large underground industry behind this EK.

Given the numerous existing reports from Sophos, Malwarebytes, and USENIX that cover different variants of Angler, we will focus on the new findings in terms of the global operation of Angler in this work. All of the findings are based on the results of our malicious web content detection system.

Key findings include:

Detected over 90,000 compromised websites involved in Angler’s operation. Among which, 30 are within Alexa top 100,000 rankings (Top 1 million websites available here.) We estimate the number of monthly visits to these 30 compromised websites to be over 11 million based on visit counts from TrafficEstimate.com
Discovered a highly organized operation that periodically updates the malicious content across all of the compromised websites and all of the EK gate sites at the same time. This indicates a sophisticated and persistent command and control channel between attackers and compromised websites.
Discovered fine-grained control over the distribution of malicious content. This means the injected scripts can stay invisible for days to evade detection and the compromised websites can choose to target only certain victim IP ranges and certain configurations. This has lead to very low detection rates from the scanners used by VirusTotal (VT). Even weeks after our initial discovery, most of the compromised sites we found were not listed as malicious in VT.
Found potential connections between activities of scanning vulnerable websites and leveraging scanned websites as entry point for the EK. This suggests an industry chain behind the operation of this EK.

Overview and Impact

Between November 5 (when we started to scan highly vulnerable websites for similar injections) and November 16, we discovered a total of 90,558 unique domains that had been compromised and used by Angler EK.

The compromised domains result in a total of 29,531 unique IPs. Among these, 1,457 IPs hosted more than 10 compromised domains. The IP address 184.168.47.225 hosted a total of 422 compromised sites. Some of the compromised sites were very popular with 177 of domains (30 FQDNs) in the Alexa top 100,000 and 40 in the top 10,000.

Most of the compromised sites remain undetected by VT. We tested early scanning results (5,235 malicious sites discovered at that time) with VT on November 16 and found that VT only reported 226 sites as malicious. At midnight of November 17 we repeated this experiment and VT still only found 232 sites. This amounts to a less than 5% detection rate. On December 14 we tested our full list (all 90,558) against VT and it only found 2,850 compromised sites – a 3% detection rate.

Figure 1: Angler EK Compromise Topology

Figure 1 outlines the redirection flow of a full compromise. The victims visit a list of compromised WordPress/Apache hosts and get redirected to the malicious server hosting the EK, either directly or via a middle layer, which is commonly referred to as ‘EK gate’. The final malicious payload served could vary, including ransomware such as Cryptowall, and spyware or botnets that connect to a C2 server. A more concrete redirection chain example and its fiddler packet capture are shown in Figures 2 and 3. The redirection from EK gate to the malicious file hosting server can happen within the same domain (shown in red in Figure 2) or cross-domain (shown in blue in Figure 2).

Figure 2: Redirection Chain

Figure 3: Fiddler packet captured during redirection (excerpt)

Figure 4 shows some of the post-infection traffic we obtained using Fiddler. In this case the infected VM sent out a C2-like request and received back a long encrypted response.

Figure 4: Post infection traffic

Figures 5 and 6 give an overview of all compromised hosts’ IP information.

Figure 5: Compromised host ISP distribution

Figure 6: Compromised IP country distribution (truncated to show major items)

We can see in the figures that the compromised sites are primarily hosted inside the United States, with a few exceptions in Europe and Asia. Among the systems in the United States, most of the identified sites are hosted on GoDaddy’s infrastructure and a few other popular hosting services.

Angler EK Evasion Techniques

In this section, we will highlight some of the featured behaviors of the malicious scripts that attackers injected into the compromised websites.

The original version of the injected malicious JavaScript code served on the compromised servers (left side of Figure 1) targets almost ALL major versions of the IE browser (version 8-11 confirmed), presumably because these users may also have vulnerable flash versions installed. A more recent variant targets all modern major browsers, including Webkit-based ones such as Chrome and Gecko-based Firefox. In addition to its JavaScript-based anti-detection maneuver, the next hop redirection (EK gate) also selectively serves malicious content based on the victim’s geolocation. We further break down some of these behaviors in this section.

Static/Signature analysis evasion

The injected JavaScript code in all of the compromised websites looks similar but is not exactly the same. Figure 7 is an excerpt of the injected code. The obfuscated code is constantly morphing with random variable names. Even the responses between two consecutive requests are different due to the randomness. We are only able to find four meaningful keywords that remained constant in this script: ActiveXObject, window.sidebar, charAt, and Function. As we will detail in the Organized Evolution section, these keywords will slowly vanish as the EK evolves over time. This polymorphic behavior likely renders many signature-based static analysis methods ineffective.

Figure 7: injected script excerpt

Browser emulator evasion

We found that the exploit kit code contains multiple layers of behavior cloaking, and in one case uses ActiveXObject initialization as shown in Figure 7. The variable zmomfokopbpbbi shown in this example contains a long random-looking string (truncated in line 2). The malicious JavaScript attempts to fingerprint the browser to evade browser emulators and malware detectors which are usually designed to provide valid ActiveXObject at all times, or use symbolic/concolic execution which intentionally suppresses errors and forces execution to take the other branch. In other words, a real IE browser will throw an error for the try clause but a detector agent may not. Given that the value of qvqymkykvfzpl is 1 before line 9, the value zkvluyycbrtp will be 0 for real IE browsers and any other browser that does not support ActiveXObject, but it will remain to be 1 for certain browser emulators. Looking further down the malicious script (Figure 8) we see that the value of zkvluyycbrtp is again used in many other functions, and the return value of those functions depend on its value. This ultimately determines whether the malicious script will carry out the attack or not. This is one of the many cloaking mechanisms attackers employ that separates the real, intended victim population from browser simulators used by AV vendors for detection.

Figure 8: injected script excerpt (cont.d)

In addition to ActiveXObject initialization, the obfuscated code also examines UserAgent. This examination is different from a naïve substring search. Particularly, the code searches for the existence of two strings “rv:11” and “MSIE”. Combining the user agent examination with a special ‘browser quirk’ of old IE (version 10.0 or older), the malicious JavaScript ONLY exhibits malicious behavior when the browser is ‘truly’ Internet Explorer, but not other browser brands (e.g. Firefox/Chrome/other testing browser agents), even if they are ‘mimicking’ old IE user agents. We show the detailed combination of different configurations and their interactions in Table 1.

Malicious content trigger condition: zfglugdvsvhpmstz – hladygwivaoha == 2
Potential target	UserAgent	JS Engine	“zfglugdvsvhpmstz”(Browser Quirk Testing)	“hladygwivaoha”(UserAgent pattern match)
Old IE User	IE 8-10	JScript/Chakra	3	1
New IE User	IE 11	Chakra	2	0
Firefox/Chrome User	FF/Chrome	SpiderMonkey/V8	2	2
Security researcher	IE 8-10	SpiderMonkey/V8	2	1
Security researcher	IE 11	SpiderMonkey/V8	2	0

Table 1: Malicious content trigger condition

In this table, we can see that the malware authors target IE users and attempt to avoid security researchers; however, they left out one scenario: a non-IE browser mimicking IE 11. In this scenario the malicious behavior is actually exposed, and this is how we are able to automatically extract a number of next hops redirection (i.e. EK gate URLs) in Table 2.

IP address filter evasion

The successful execution of the abovementioned JavaScript always results in an injection of an iframe pointing to the EK gate. The malicious JavaScript injects an iframe similar to what’s shown in Figure 9.

Figure 9: iframe injection by malicious JavaScript

This URL structure resembles others that were previously disclosed by sources like malware-traffic-analysis.net; however, as mentioned earlier, the way this iframe is injected is entirely different in this campaign compared to their previous mechanism, which simply injected a flash file (<object>) into the HTML code.

It’s not easy to obtain the malicious content of these iframes because when we visit the compromised URLs from an IP addresses that belongs to Palo Alto Networks, the attacker’s server either does not respond, or returns an empty 200 response. The same results occurred when we used different browsers with different versions, residential IPs in California, and then an Amazon EC2 instance. On November 16 we used a proxy service to redirect our traffic through IP blocks across the world and found that when we use an IP block from Turkey, the server returned the Angler EK’s landing page. The landing page looks similar to previously posted ones, and eventually redirected the victim browser to download a flash file.

It is also interesting to note that many domain names hosting the EK gate pages, like filchnerkunstkring.diversityadvice[.]com or ullshift-vastreden.avimiller.org, have legitimate and benign root level domains diversityadvice.com and avimiller.org. We suspect:

1) That the DNS nameserver of these domains are compromised and a rogue DNS record was created to point the malicious subdomain to the attacker’s server; and

2) the credential that can unlock registering subdomains has been stolen. Such DNS compromise is also popularly known as Domain shadowing.

In addition to the EK gate IP filtering, the compromised host seems to serve the malicious redirection scripts using similar IP filtering rules as well. We initiated requests from two clean machines using different outgoing IP addresses and the same user agent at almost the same time. The machine user of one IP address consistently received a malicious page while the other user only received clean HTML. It is particularly worthwhile to note that the attackers perform IP cloaking adaptively; we used one IP address range to scan the web for compromised sites and after approximately two weeks of scanning, the attacker stopped serving malicious content to these IPs. We suspect that the attackers detected abnormal scanning behavior from the IPs and therefore cloaked themselves to avoid detection.

Timing-based evasion

It also appeared to us that the injected content turns on-and-off inside the duration of our scan. After we discovered this behavior, we picked ten sites and significantly increased their scanning frequency to every ten minutes. Figure 10 shows the vulnerability status of three of these ten sites over the course of 24 hours. The markings of the top portion indicate that the site’s malicious code was active during that time slot while the markings on the bottom portion indicate the site was benign, or dormant, during that time slot. It appears to us that nine out of 10 sites share a similar (but not exactly the same) dormant/active pattern, as shown in the orange and blue dots, while the other site (www.grillman[.]com.au) shows a somewhat different pattern. We are not exactly sure why the injection exhibits such behavior over time, but our guess is that the malicious code intends to hide itself and put the website owner or security companies under the illusion that the threat has been cleaned up.

Figure 10: Time-based cloaking of compromised hosts – Pacific Standard Time.

User Agent-based evasion

In addition to the User Agent checks inside the JavaScript code we described earlier, the servers will also perform their own check of the visiting browsers UA string. We found that unless we used a special user agent string (mimicking IE 8 and 9), we were unable to access the malicious content. This is yet another way that the attackers appear to be attempting to evade detection from web scanners.

Cookie-based evasion

Finally, the compromised site sets the user’s cookie the first time the victim visits the site, and never sends the injected code a second time to a browser if it detects the same cookie on subsequent requests. We consider this as one of the many mechanisms to cloak the threat against security researchers that may employ dynamic analysis approaches to visit the compromised sites repetitively.

Organized Evolution

Detection evasion techniques are crucial for a malicious attackers operation, but in time researchers will identify and expose them. To avoid being caught, attackers constantly evolve the compromised sites to further complicate the detection and prevention process. We list some of the more important changes we observed below.

EK Gate URL evolution

Continuous monitoring of the EK gate URLs (result of DNS shadowing) shows that they change frequently, at approximately half-hour to one-hour periods. Our large scale continuous scanning reveals that, at any given time, almost all compromised sites point to the same next-hop domain, but in roughly half an hour to an hour this domain changes completely and all infected hosts make the change at approximately the same time. Table 2 shows our scanning result for the source hostnames of the injected iframe URLs. Since we cannot get hold of a compromised host and capture the traffic ourselves, we suspect that this synchronized behavior is an indication of malicious C2 server(s) actively and continuously communicating with the compromised hosts to activate the switch to new EK gates.

Approximate switching time	src host of injected iframe
START	hxxp://perintprinssinhadezeu1.mirastravels[.]net
2015-11-13 00:38 AM	hxxp://gren1elintensiirto.bonihutchinson[.]com
2015-11-13 01:48 AM	hxxp://collatiesmuskambrette.tsm-nj[.]com
2015-11-13 02:17 AM	hxxp://orneginsanscritista.grownmanbody[.]com
2015-11-13 02:48 AM	hxxp://syvimpisubnumber.dura-tekllc[.]com
2015-11-13 03:50 AM	hxxp://qreplies-gabberhouse.curionemotorsports[.]com

Table 2: EK gate domain changes

Although the hostname changes frequently, we are able to confirm using passive DNS data that the IPs these domains resolved to are relatively limited, including 91.239.74.80 and 188.120.235.94. This indicates the attacker is reusing some IP resources behind the subdomain-fluxing mechanism.

Injected JavaScript evolution

Figure 11 illustrates the timeline of the injected JavaScript’s evolution. We denote the first version we saw (November 5) as Version 1.0. During our continuous tracking, we found that the EK’s malicious JavaScript code evolved into a slightly advanced variant on November 11. In this version (1.1,) the code previously used to detect a Firefox browser is gone, i.e. (+[window.sidebar]). We suspect this line was removed because it might trigger the traditional signature-based detection as it is always written in plain text. On November 21, another version (1.2) appeared across all of the compromised sites and completely removed ActiveXObject initialization in the scripts. Upon observing this evolution we quickly adjusted our detection method to continue tracking the compromised pages. On November 27, the JavaScript changed again, this time to a completely different structure (version 2.0). Instead of concatenating tiny strings together, this version boasts a very long string that it decodes to the iframe injection statement. A particular interesting aspect of this variation is that the injected code targets all major browsers including Firefox and Chrome, with the exception that for IE7, 8 and 9, they do additional checks via browser quirks to see if the declared userAgent matches the actual browser behavior. Again, on December 5, we found that the malicious code reverted back to version 1, but with a slight tweak of statement order; it also changed variable zmomfokopbpbbi’s initialization in Figure 7 from a static long random string to a concatenation of smaller strings.

Figure 11: Injected JavaScript evolution

In each of the evolutionary steps, almost all compromised sites we had identified presented the update at the same time. This timing provides further evidence that a C2 channel is likely maintained at all times between the attacker and the compromised hosts.

SWF/Binary file evolution

Continuous monitoring of the Flash files served by the EK revealed that it changes slightly on a daily basis, and VT has never seen these samples by the time we obtain them. We submitted SWF file distributed on November 16 to VT, and the immediate detection score was 3 with less confident verdicts e.g. ‘behaves like Flash Exploit’. On December 3 we requested a rescan on the same file, this time VT gave a score of 11 with many major AV vendors picking up the detection.

Infection Vectors

In this section we explore some interesting common properties that the compromised hosts share. We demonstrate how we use this information to discover many more compromised websites.

Inferred infection vectors

Generally speaking, we found that the infections fell into two categories, indicating that there may be two infection vectors used to compromise the websites.

1) For a small portion of sites, the malicious script is injected at the very top of the HTML source code, before the opening of <html> tag. One example is www.cxda[.]gov.cn. We think this is because the compromised host has an Apache or system level vulnerability which was exploited by the attacker.

2) For most of the compromised websites, the malicious script is injected right after the opening of the <body> element, and all these injections happen on WordPress-powered websites. This type of infection also shares another common attribute – a cookie-setting snippet is installed just before the large malicious JavaScript file to make sure the malicious content is not served twice to a single victim, as described above.

Extending scans

Based on our injection vector inference, we extended our scan from newly registered domains to two additional categories of websites, greatly increasing the number of detections.

Collocated hosts

Since the attacker may exploit the same Apache/web server vulnerability on the same machine, we believe the hosts collocated with the known compromised sites have a higher chance to be compromised as well. Many hosting services host multiple websites on the same host and IP address (i.e. virtual hosting). During our daily scan of newly registered domains, we found a large number of compromised sites served by popular hosting services including GoDaddy, and found that some of them share the same IPs. Using passive DNS data, we are able to retrieve a sizable list of likely-vulnerable sites – those that are hosted on the same IPs as ones we already detected. The list contains a total of 82,000 unique domains. Of these, approximately 65,000 domains are actively hosting websites and at least 3,880 of them are compromised.

WordPress sites

Based on the high percentage of WordPress websites present in the compromised site list, it is highly likely that the attacker is exploiting one or more WordPress vulnerabilities. However, to compromise these websites the attackers would have to first perform some type of reconnaissance. We theorized that the malware sample behavior collected in Palo Alto Networks WildFire could help us discover more of these infected websites. In WildFire scans, we identified many malware samples actively probing vulnerable WordPress sites by requesting their xmlrpc.php file. This file is linked to several vulnerabilities and hazards that have been previously disclosed. We collected such probing behavior in WildFire history, which amounts to a total of 201K unique domains. We determined that 174K URLs that were still alive and responding, and of these, our malicious web detection system identified 535 additional compromised sites.

Following the success of this scan, we further obtained a large list of websites using WordPress which contained almost 17 million sites and scanned them using the same system. This revealed over 84,000 compromised WordPress websites in total.

Infection lifecycle

Looking at how many websites are being compromised and how quickly their operators detect and remove the infections helps us better understand the lifecycle of Angler EK infections.

First, our daily scan reveals tens to hundreds of new compromised sites that have never previously been detected, as seen in Table 3 and Figure 12. These numbers suggest that this is still a very active threat.

	11/6	11/7	11/8	11/9	11/10	11/11	11/12	11/13	11/14	11/15	11/16	11/17
Accu.	214	285	467	472	549	593	704	880	932	1196	1211	1241
Incr.	NA	71	182	5	77	44	111	176	52	264	15	30

Table 3: Unique new compromised sites detected every day

Figure 12: Unique new and total compromised websites each day.

Before we make any statements about how quickly compromised websites are cleaned up, we would like to point out that the numbers discussed here are educated guesses to the upper-bound, due to the fact that the injected scripts may simply just be dormant for a long time. For example, we observed one site, ‘seorewolucja[.]pl’ that was first observed as infected on November 6 and followed the on-and-off infection pattern until 05:30 PST on November 15. Since then the site remained clean of infection for three days, until the morning of November 18 when the injected script appeared again. Although the injected code looks similar before November 15 and after November 18, we cannot be sure if the site owner disinfected their system and it was later re-compromised, or if the infection simply stayed dormant for three days. This demonstrates how long the infection may stay dormant and that we should not make hasty decisions regarding whether a site has been cleaned up and patched appropriately to prevent future infections.

To get a rough idea about the cleanup rate and status, for every six hours we rescanned the entire infected population collected through November 16 – a total of 5,234 unique URLs. We aggregated the scanning result on November 18 and our system found that 5,002 URLs were still infected at least one time in our scanning period, and saw a total of 396 sites that never showed any infection behavior throughout the scans from noon of November 16 to November 18. Even if we consider all of these sites as disinfected, they account for less than 8 percent of the entire infected population. When we checked this number again on November 19, the total number of clean sites dropped to 377. This means that some of the 396 sites that we thought had been cleaned up were simply staying dormant from November 16 to 18. Since many of these sites were discovered in early November and possibly infected even earlier, the scanning results indicate that disinfection is happening very slowly.

Conclusion

Modern exploit kits are becoming harder to catch as they maneuver to avoid detection by security researchers. Particularly, the Angler EK boasts the following features:

Targeted Exploitation: This family of JavaScript and iframe injections is targeted at specific configurations and/or geographic and IP distributions. This attacks malicious scripts and servers that use multiple techniques to target IE users and visits from an IP outside of the United States.
Cloaking against researchers: The constantly evolving injected scripts are trying their best to identify malware researchers’ sandboxes. They hide their malicious behavior from sandbox/emulated environments. The techniques used include browser fingerprinting using browser quirks, as well as IP and UserAgent
Frequent evolution and persistent control: Large scale tracking of many compromised domains revealed that the attackers have persistent control over the compromised machines. We saw three major version changes in injected scripts as well as hourly switches of the malicious EK gate domains over the course of one month. These actions cannot occur without continuous control of the compromised hosts. This contradicts the common assumption that the hosts are compromised only at one point and injected malicious code once.
Growing number of infections: According to our observation, newly compromised sites appear at a consistent rate of over 100 sites per day (this is a lower bound as we can only scan a limited number of websites per day), while older compromised sites do not seem to be disinfected promptly, if at all. This results in a steady increase of total active compromised sites, and this threat is still a long ways from elimination.

Despite these challenges, we also found some consistent behavior patterns and limitations of this attack:

Suspicious redirections: Although the redirection script may change, the redirection chain stays relatively stable. The EK is always served from a different WordPress-like domain and a flash file is downloaded soon afterwards.
Infrastructure reuse: Exploiting known WordPress vulnerability and weak DNS configuration for DNS shadowing may be easy for the attackers, however, changing the exploit kit’s hosting server is relatively hard. This requires the attacker to physically control a new machine or move an existing machine. At least for now, we have never seen the attacker serve the actual EK file on a compromised machine, possibly to avoid bandwidth spikes/AV detection of the compromised sites.

We will continue to track down the compromised sites, learn more about modern exploit kits and offer maximum protection for our customers.

Get a list of the compromised domains analyzed in this research.