This post is also available in: 日本語 (Japanese)
Executive Summary
Unit 42 researchers have been observing various malicious campaigns abusing either legitimate challenge and response services (such as Google’s reCAPTCHA) or deploying customized fake CAPTCHA-like validation. Recent security blogs on phishing campaigns and cybercriminals using reCAPTCHA and research papers like PhishTime and CrawlPhish show an increasing trend of CAPTCHA-protected phishing pages. Hiding phishing content behind CAPTCHAs prevents security crawlers from detecting malicious content and adds a legitimate look to phishing login pages.
In this blog, we show techniques to detect malicious content with security crawlers even in the presence of CAPTCHA evasion. In some cases, these techniques can even track and detect such campaigns. We see many malicious campaigns reuse CAPTCHA service keys, either to simplify their malware infrastructure or to avoid being blocked by the legitimate reCAPTCHA provider for creating too many CAPTCHA accounts and keys.
Our research paper “Betrayed by your Dashboard” (published in 2018 at TheWebConf) shows that web analytics IDs can be used to identify large-scale malicious campaigns, as attackers often use legitimate web analytics services. Here, we show how similar pipelines can be used to detect phishing pages through the association of CAPTCHA keys.
Looking at the top 10 most popular malicious CAPTCHA keys across broad phishing campaigns just over the last month, we blocked 7,572 unique URLs over 4,088 pay-level domains, protecting our customers from visiting them at least 202,872 times. At the same time, we see that such URLs are slower in appearing in third-party malicious feeds, presumably because of hidden phishing, scam and other malicious content.
Palo Alto Networks Next-Generation Firewall customers with Advanced URL Filtering and WildFire security subscriptions are protected against such sophisticated phishing campaigns.
CAPTCHA-Protected Phishing Campaigns
At Palo Alto Networks, we focus on how we can detect and track malicious campaigns across various domains and URLs.
Phishing Example for Apple ID Credentials
Let’s look at the example (hxxp://utem[.]com/[.]YSou8XI) of a long-running phishing campaign that we have been monitoring since July 2020. It has been pushing phishing pages and targeting Microsoft Outlook, Apple and other login pages. Users see the following CAPTCHA challenge when they visit the page.
After solving a standard reCAPTCHA challenge, the browser will see a classic phishing page, shown in Figure 2 below. In this example, phishing content was generated dynamically on the same page, but more often a top-level redirection occurs.
However, on the main page (before solving the CAPTCHA challenge), we observe the following sub-requests, which reveals the reCAPTCHA API key used in the URL parameters:
Such identifiers can be parsed out and searched for on other pages, which gives us the ability to find other phishing pages. For example, a webpage using the same ID was pushing Apple ID phishing too.
Alternatively, CAPTCHA keys can be extracted from HTML. The example shown below was used in another recent Outlook phishing campaign:
Such CAPTCHA keys are a strong signal for detecting malicious pages even without getting phishing content. Moreover, malicious CAPTCHA keys can be mined automatically using similar ground truth data and filtering pipelines, which were presented in the paper Betrayed by Your Dashboard: Discovering Malicious Campaigns via Web Analytics. However, we noticed that such sophisticated malicious pages are slow to appear in third-party malware and phishing feeds. As such, manually verified ground truth data gives more useful CAPTCHA keys or clustered CAPTCHA keys from an unlabeled feed of URLs.
Microsoft Phishing Example
Here we see op[.]g2yu-bere[.]xyz/?e=c2Nhc2VAY2l0Y28uY29t, where an attacker is attempting to phish for Microsoft account credentials. The CAPTCHA challenge makes it seem legitimate for both the users and security scanners. After the user solves the CAPTCHA, the attacker attempts to phish Office 365 credentials from the user.
Is It Only About Phishing?
In addition to various phishing campaigns, beginning in October 2020 we started to observe more scam campaigns and malicious gateways using CAPTCHA evasion. Often, they show CAPTCHA challenges only if they suspect automation with other means (for example, based on IP and browser versions).
Grayware Campaigns
Another category of malicious pages protected by CAPTCHA is grayware. Survey and lottery scams are some of the most common grayware pages. In exchange for a fake payment or chance at winning the lottery, the user is lured into disclosing sensitive information, including address, date of birth, banking information, annual income, etc.
Below is another example of a lottery scam page (win[.]click2win4life[.].com/api/offer) that uses CAPTCHA evasion with ID 6LfKnxEUAAAAAO1iXBX9FqL0w-68XqXGl3UPBF5p and attempts to collect user information.
Malware Delivery
We have seen recent examples of malware delivery pages abusing legitimate CAPTCHA services. For example, the URL hxxps://davidemoscato[.]com serves a malicious JAR file (PayeeAdvice_IN00231_Q1626801_32843.jar) that is hidden from security scanners by protecting the page with a CAPTCHA challenge.
Efficacy of CAPTCHA Signatures
We present the statistics of the 10 most popular malicious CAPTCHA IDs in a one-month period (April 18-May 18). The graph below shows the number of new detections per day for each ID. We see that on a daily average, 529 new URLs are found to use such malicious IDs. We received a total of 7,572 unique URLs from these top 10 IDs in a 30-day period.
We ranked their popularity using the number of unique detections per day. Because we see that attackers use these IDs for a long time – more than 250 days in some cases – they are robust indicators of malicious activity.
ID Live Days:
CAPTCHA ID |
Unique 30 day detections |
Avg detections / day |
ID live (days) |
6LcEthAUAAAAANLeILVZiZpPDbVwyoQuQ7c3qlsy | 3,290 | 228 | 264 |
6LcJK64UAAAAAKwjDYyWpakQ_5aFAb34tK-EkiDA | 2,094 | 87 | 287 |
6Le-dsYUAAAAABJa32oIuo9LEPsur7OcBz-a9kyL | 1,132 | 42 | 294 |
6LfKnxEUAAAAAO1iXBX9FqL0w-68XqXGl3UPBF5p | 1,021 | 39 | 238 |
6Lc8-cQUAAAAAF60sMK0PjhPOA6ciyzy6cfnGcl0 | 784 | 38 | 294 |
6LeihuEUAAAAAEgMRhYQKQCxnJvsqIZnRghJAPcH | 222 | 42 | 182 |
6LezpHMUAAAAALunasQAvKdhRwFC1oqRE0OZW8f4 | 216 | 23 | 295 |
6LdkVo0aAAAAAN5yxjGbJPH39rF--s6ZVsl_LxzE | 201 | 10 | 43 |
6LdVFrgUAAAAAEMNq1ljl8HZSQ2sA8Hu6a8umPQr | 191 | 7 | 287 |
6LfrPbMUAAAAAF2DLXNWH8-s0Ln08lXtaX9k1tRC | 152 | 13 | 294 |
Table 1. Top 10 CAPTCHA IDs ranked by 30-day unique detections count.
It is also interesting to note that the three top-ranked CAPTCHA IDs alone account for 70% of the detections.
Impact of Detections
Let’s look at the impact these detections have on our customers. For the same 30-day timeframe, we observe that our customers attempted to visit these pages at least 202,872 times. The graph below shows the number of visits to the 10 most popular malicious URLs. Six of them belong to grayware, and four belong to malware categories. The grayware page that collects user information for a chance at the lottery (win[.]omgsweeps[.]info) accounts for 51% of the customer visits to these malicious pages.
Other Detection Methods
We observe that CAPTCHA IDs are often not the only signal in the detected sites. In addition to the IDs, we can use some other methods to detect these malicious sites.
Static URL analysis: In some instances, we can identify malicious sites just by looking at the URL. Many campaigns reuse similar URL patterns, related domains, IPs or other signals. Based on previous examples seen with the same pattern as the URL, runswift-besthighlyfile[.]best/ZW2RR5af4KcKjjWeJS2qTOgg92QyTjh7NL0_4Yv8R98, we can mark it malicious.
Traffic analysis: In a few cases, we can look into the HTML traffic for malicious activity. For example, the malicious page, https:/syans2008[.]3dn[.]ru/news/barbi_princessa_rapuncel_skachat_igru/2013-10-23-1705, can be detected with the CAPTCHA ID, 6LcpAwsUAAAAAPif4MyLJQVv7r5Nr1Wv31NB86C6, or with the YARA rule below.
rule Rule {
strings:
$s1 = "100, 111, 99, 117, 109, 101, 110, 116, 46, 99, 117, 114, 114, 101, 110, 116, 83, 99, 114, 105, 112, 116, 46, 112, 97, 114, 101, 110, 116, 78, 111, 100, 101, 46, 105, 110, 115, 101, 114, 116, 66, 101, 102, 111, 114, 101, 40, 115, 44, 32, 100, 111, 99, 117, 109, 101, 110, 116, 46, 99, 117, 114, 114, 101, 110, 116, 83, 99, 114, 105, 112, 116, 41"
$s2 = "document.currentScript.parentNode.insertBefore(s, document.currentScript)"
$s3 = "s=d.createElement('script')"
condition: $s1 or ($s2 and $s3)
}
When simulating client-side behavior, we observe the HTML traffic with (SHA256: 781e16b89604cdcd37928009920654628cc95f6e1b34916fd47b880ff3c7cc92) that the page havnsardf[.]ga loads. The YARA rule above can uncover many cases of malicious JavaScript injections or downloads. This execution behavior is usually seen in situations in which attackers have taken over a web server and intend to inject malicious JavaScript from their servers into the victim web server.
Using content analysis: In some cases, malicious phishing content is already present in the HTML, but just not shown, or a custom/fake CAPTCHA is used. Such pages are usually JavaScript-rich, and detectable with malicious JavaScript analysis used at Palo Alto Networks. For example, the malicious site, yourstorecentre[.]com, protected by CAPTCHA ID, 6LcA2tEZAAAAAJj7FTYTF9cZ4NL3ShgBCBfkWov0, contains the malicious JS with SHA256: 68687db7ae5029f534809e3a41f288ec4e2718c0bbdefdf45ad6575b69fed823, which is shown to be malicious when analyzed.
Finally, the simplicity of making detections with CAPTCHA signatures has the benefit of being early in newer detections. For example, if we look up the site, lowautocasion[.]es, on third-party vendor feeds, it remained undetected by many standard methods until July 7, but was detected as malware by Palo Alto Networks Advanced URL Filtering using CAPTCHA signatures as early as May 18.
Conclusion
Mass phishing and grayware campaigns have become more sophisticated, using evasion techniques to escape detection by automated security crawlers. Fortunately, when malicious actors use infrastructure, services or tools across their ecosystem of malicious websites, we have a chance to leverage these indicators against them. CAPTCHA identifiers are one great example of such detection by association.
Palo Alto Networks continually monitors CAPTCHA IDs as one example of a malicious indicator, and we use it to detect phishing, malware and grayware pages. Palo Alto Networks Next-Generation Firewall customers with Advanced URL Filtering and WildFire security subscriptions are protected against such sophisticated phishing campaigns.
Signatures
Below is the list of top 10 popular Captcha ID signatures for the period April 18-May 18.
6LcEthAUAAAAANLeILVZiZpPDbVwyoQuQ7c3qlsy
6LcJK64UAAAAAKwjDYyWpakQ_5aFAb34tK-EkiDA
6Le-dsYUAAAAABJa32oIuo9LEPsur7OcBz-a9kyL
6LfKnxEUAAAAAO1iXBX9FqL0w-68XqXGl3UPBF5p
6Lc8-cQUAAAAAF60sMK0PjhPOA6ciyzy6cfnGcl0
6LeihuEUAAAAAEgMRhYQKQCxnJvsqIZnRghJAPcH
6LezpHMUAAAAALunasQAvKdhRwFC1oqRE0OZW8f4
6LdkVo0aAAAAAN5yxjGbJPH39rF--s6ZVsl_LxzE
6LdVFrgUAAAAAEMNq1ljl8HZSQ2sA8Hu6a8umPQr
6LfrPbMUAAAAAF2DLXNWH8-s0Ln08lXtaX9k1tRC
Indicators of Compromise
The list of all IOCs can be found on GitHub.
Acknowledgements
We’d like to thank Unit 42 for helping us with this blog. Special thanks to Bahman Rostamyazdi, David Fuertes, Taojie Wang, Tao Yan and Hector Debuc for helping us with the data.