Discovering CAPTCHA Protected Phishing Campaigns

A conceptual image representing phishing, as discussed in this post. Here, we cover CAPTCHA-protected phishing campaigns and how they can be detected and mitigated.

This post is also available in: 日本語 (Japanese)

Executive Summary

Unit 42 researchers have been observing various malicious campaigns abusing either legitimate challenge and response services (such as Google’s reCAPTCHA) or deploying customized fake CAPTCHA-like validation. Recent security blogs on phishing campaigns and cybercriminals using reCAPTCHA and research papers like PhishTime and CrawlPhish show an increasing trend of CAPTCHA-protected phishing pages. Hiding phishing content behind CAPTCHAs prevents security crawlers from detecting malicious content and adds a legitimate look to phishing login pages.

In this blog, we show techniques to detect malicious content with security crawlers even in the presence of CAPTCHA evasion. In some cases, these techniques can even track and detect such campaigns. We see many malicious campaigns reuse CAPTCHA service keys, either to simplify their malware infrastructure or to avoid being blocked by the legitimate reCAPTCHA provider for creating too many CAPTCHA accounts and keys.

Our research paper “Betrayed by your Dashboard” (published in 2018 at TheWebConf) shows that web analytics IDs can be used to identify large-scale malicious campaigns, as attackers often use legitimate web analytics services. Here, we show how similar pipelines can be used to detect phishing pages through the association of CAPTCHA keys.

Looking at the top 10 most popular malicious CAPTCHA keys across broad phishing campaigns just over the last month, we blocked 7,572 unique URLs over 4,088 pay-level domains, protecting our customers from visiting them at least 202,872 times. At the same time, we see that such URLs are slower in appearing in third-party malicious feeds, presumably because of hidden phishing, scam and other malicious content.

Palo Alto Networks Next-Generation Firewall customers with Advanced URL Filtering and WildFire security subscriptions are protected against such sophisticated phishing campaigns.

CAPTCHA-Protected Phishing Campaigns

At Palo Alto Networks, we focus on how we can detect and track malicious campaigns across various domains and URLs.

Phishing Example for Apple ID Credentials

Let’s look at the example (hxxp://utem[.]com/[.]YSou8XI) of a long-running phishing campaign that we have been monitoring since July 2020. It has been pushing phishing pages and targeting Microsoft Outlook, Apple and other login pages. Users see the following CAPTCHA challenge when they visit the page.

A long-running phishing campaign uses the following CAPTCHA-protected phishing page. Users see the challenge shown when they visit the page.
Figure 1. CAPTCHA challenge.

After solving a standard reCAPTCHA challenge, the browser will see a classic phishing page, shown in Figure 2 below. In this example, phishing content was generated dynamically on the same page, but more often a top-level redirection occurs.

In this example, phishing content was generated dynamically on the same page, but more often a top-level redirection occurs.
Figure 2. Phishing page.

However, on the main page (before solving the CAPTCHA challenge), we observe the following sub-requests, which reveals the reCAPTCHA API key used in the URL parameters:

The sub-requests shown reveal the reCAPTCHA API key used in the URL parameters. Re-CAPTCHA API keys are bold.
Figure 3. The sub-requests shown reveal the reCAPTCHA API key used in the URL parameters.

Such identifiers can be parsed out and searched for on other pages, which gives us the ability to find other phishing pages. For example, a webpage using the same ID was pushing Apple ID phishing too.

Screenshot of of a webpage phishing for Apple ID.
Figure 4. Phishing for Apple ID credentials.

Alternatively, CAPTCHA keys can be extracted from HTML. The example shown below was used in another recent Outlook phishing campaign:

CAPTCHA keys can be extracted from HTML. The example here was used in another recent Outlook phishing campaign.
Figure 5. HTML from a recent Outlook phishing campaign.

Such CAPTCHA keys are a strong signal for detecting malicious pages even without getting phishing content. Moreover, malicious CAPTCHA keys can be mined automatically using similar ground truth data and filtering pipelines, which were presented in the paper Betrayed by Your Dashboard: Discovering Malicious Campaigns via Web Analytics. However, we noticed that such sophisticated malicious pages are slow to appear in third-party malware and phishing feeds. As such, manually verified ground truth data gives more useful CAPTCHA keys or clustered CAPTCHA keys from an unlabeled feed of URLs.

Microsoft Phishing Example

Here we see op[.]g2yu-bere[.]xyz/?e=c2Nhc2VAY2l0Y28uY29t, where an attacker is attempting to phish for Microsoft account credentials. The CAPTCHA challenge makes it seem legitimate for both the users and security scanners. After the user solves the CAPTCHA, the attacker attempts to phish Office 365 credentials from the user.

Screenshot of a CAPTCHA-protected phishing campaign seeking Microsoft account credentials.
Figure 6. Phishing page protected by CAPTCHA.

Is It Only About Phishing?

In addition to various phishing campaigns, beginning in October 2020 we started to observe more scam campaigns and malicious gateways using CAPTCHA evasion. Often, they show CAPTCHA challenges only if they suspect automation with other means (for example, based on IP and browser versions).

Grayware Campaigns

Another category of malicious pages protected by CAPTCHA is grayware. Survey and lottery scams are some of the most common grayware pages. In exchange for a fake payment or chance at winning the lottery, the user is lured into disclosing sensitive information, including address, date of birth, banking information, annual income, etc.

Screenshot of survey scams asking for sensitive information, including address, date of birth, banking information, annual income, etc.
Figure 7. Survey scam examples.

Below is another example of a lottery scam page (win[.]click2win4life[.].com/api/offer) that uses CAPTCHA evasion with ID 6LfKnxEUAAAAAO1iXBX9FqL0w-68XqXGl3UPBF5p and attempts to collect user information.

Screenshot of Click to Win 4 Life lottery scam.
Figure 8. Lottery scam examples.

Malware Delivery

We have seen recent examples of malware delivery pages abusing legitimate CAPTCHA services. For example, the URL hxxps://davidemoscato[.]com serves a malicious JAR file (PayeeAdvice_IN00231_Q1626801_32843.jar) that is hidden from security scanners by protecting the page with a CAPTCHA challenge.

Screenshot of malware hidden behind a verification page asking users to select all images with crosswalks.
Figure 9. Malware page protected by CAPTCHA.

Efficacy of CAPTCHA Signatures

We present the statistics of the 10 most popular malicious CAPTCHA IDs in a one-month period (April 18-May 18). The graph below shows the number of new detections per day for each ID. We see that on a daily average, 529 new URLs are found to use such malicious IDs. We received a total of 7,572 unique URLs from these top 10 IDs in a 30-day period.

Graph showing number of detection of top 10 IDs from April 18-May 18.
Figure 10. Daily detections of top 10 IDs in 30 days (April 18 - May 18).

We ranked their popularity using the number of unique detections per day. Because we see that attackers use these IDs for a long time – more than 250 days in some cases – they are robust indicators of malicious activity.

ID Live Days:

CAPTCHA ID

Unique 30 day detections

Avg detections / day

ID live (days)

6LcEthAUAAAAANLeILVZiZpPDbVwyoQuQ7c3qlsy 3,290 228 264
6LcJK64UAAAAAKwjDYyWpakQ_5aFAb34tK-EkiDA 2,094 87 287
6Le-dsYUAAAAABJa32oIuo9LEPsur7OcBz-a9kyL 1,132 42 294
6LfKnxEUAAAAAO1iXBX9FqL0w-68XqXGl3UPBF5p 1,021 39 238
6Lc8-cQUAAAAAF60sMK0PjhPOA6ciyzy6cfnGcl0 784 38 294
6LeihuEUAAAAAEgMRhYQKQCxnJvsqIZnRghJAPcH 222 42 182
6LezpHMUAAAAALunasQAvKdhRwFC1oqRE0OZW8f4 216 23 295
6LdkVo0aAAAAAN5yxjGbJPH39rF--s6ZVsl_LxzE 201 10 43
6LdVFrgUAAAAAEMNq1ljl8HZSQ2sA8Hu6a8umPQr 191 7 287
6LfrPbMUAAAAAF2DLXNWH8-s0Ln08lXtaX9k1tRC 152 13 294

Table 1. Top 10 CAPTCHA IDs ranked by 30-day unique detections count.

It is also interesting to note that the three top-ranked CAPTCHA IDs alone account for 70% of the detections.

Graph of cumulative detections of CAPTCHA IDs.
Figure 11. Cumulative detections ordered by popularity of CAPTCHA ID.

Impact of Detections

Let’s look at the impact these detections have on our customers. For the same 30-day timeframe, we observe that our customers attempted to visit these pages at least 202,872 times. The graph below shows the number of visits to the 10 most popular malicious URLs. Six of them belong to grayware, and four belong to malware categories. The grayware page that collects user information for a chance at the lottery (win[.]omgsweeps[.]info) accounts for 51% of the customer visits to these malicious pages.

Bar graph showing number of customer visits to malicious web pages within 30 days.
Figure 12. Top 10 most visited malicious URLs by customers.

Other Detection Methods

We observe that CAPTCHA IDs are often not the only signal in the detected sites. In addition to the IDs, we can use some other methods to detect these malicious sites.

Static URL analysis: In some instances, we can identify malicious sites just by looking at the URL. Many campaigns reuse similar URL patterns, related domains, IPs or other signals. Based on previous examples seen with the same pattern as the URL, runswift-besthighlyfile[.]best/ZW2RR5af4KcKjjWeJS2qTOgg92QyTjh7NL0_4Yv8R98, we can mark it malicious.

Traffic analysis: In a few cases, we can look into the HTML traffic for malicious activity. For example, the malicious page, https:/syans2008[.]3dn[.]ru/news/barbi_princessa_rapuncel_skachat_igru/2013-10-23-1705, can be detected with the CAPTCHA ID, 6LcpAwsUAAAAAPif4MyLJQVv7r5Nr1Wv31NB86C6, or with the YARA rule below.

rule Rule {
     strings:
         $s1 = "100, 111, 99, 117, 109, 101, 110, 116, 46, 99, 117, 114, 114, 101, 110, 116, 83, 99, 114, 105, 112, 116, 46, 112, 97, 114, 101, 110, 116, 78, 111, 100, 101, 46, 105, 110, 115, 101, 114, 116, 66, 101, 102, 111, 114, 101, 40, 115, 44, 32, 100, 111, 99, 117, 109, 101, 110, 116, 46, 99, 117, 114, 114, 101, 110, 116, 83, 99, 114, 105, 112, 116, 41"
         $s2 = "document.currentScript.parentNode.insertBefore(s, document.currentScript)"
         $s3 = "s=d.createElement('script')"
     condition: $s1 or ($s2 and $s3)
}

When simulating client-side behavior, we observe the HTML traffic with (SHA256: 781e16b89604cdcd37928009920654628cc95f6e1b34916fd47b880ff3c7cc92) that the page havnsardf[.]ga loads. The YARA rule above can uncover many cases of malicious JavaScript injections or downloads. This execution behavior is usually seen in situations in which attackers have taken over a web server and intend to inject malicious JavaScript from their servers into the victim web server.

Using content analysis: In some cases, malicious phishing content is already present in the HTML, but just not shown, or a custom/fake CAPTCHA is used. Such pages are usually JavaScript-rich, and detectable with malicious JavaScript analysis used at Palo Alto Networks. For example, the malicious site, yourstorecentre[.]com, protected by CAPTCHA ID, 6LcA2tEZAAAAAJj7FTYTF9cZ4NL3ShgBCBfkWov0, contains the malicious JS with SHA256: 68687db7ae5029f534809e3a41f288ec4e2718c0bbdefdf45ad6575b69fed823, which is shown to be malicious when analyzed.

Finally, the simplicity of making detections with CAPTCHA signatures has the benefit of being early in newer detections. For example, if we look up the site, lowautocasion[.]es, on third-party vendor feeds, it remained undetected by many standard methods until July 7, but was detected as malware by Palo Alto Networks Advanced URL Filtering using CAPTCHA signatures as early as May 18.

Conclusion

Mass phishing and grayware campaigns have become more sophisticated, using evasion techniques to escape detection by automated security crawlers. Fortunately, when malicious actors use infrastructure, services or tools across their ecosystem of malicious websites, we have a chance to leverage these indicators against them. CAPTCHA identifiers are one great example of such detection by association.

Palo Alto Networks continually monitors CAPTCHA IDs as one example of a malicious indicator, and we use it to detect phishing, malware and grayware pages. Palo Alto Networks Next-Generation Firewall customers with Advanced URL Filtering and WildFire security subscriptions are protected against such sophisticated phishing campaigns.

Signatures

Below is the list of top 10 popular Captcha ID signatures for the period April 18-May 18.
6LcEthAUAAAAANLeILVZiZpPDbVwyoQuQ7c3qlsy
6LcJK64UAAAAAKwjDYyWpakQ_5aFAb34tK-EkiDA
6Le-dsYUAAAAABJa32oIuo9LEPsur7OcBz-a9kyL
6LfKnxEUAAAAAO1iXBX9FqL0w-68XqXGl3UPBF5p
6Lc8-cQUAAAAAF60sMK0PjhPOA6ciyzy6cfnGcl0
6LeihuEUAAAAAEgMRhYQKQCxnJvsqIZnRghJAPcH
6LezpHMUAAAAALunasQAvKdhRwFC1oqRE0OZW8f4
6LdkVo0aAAAAAN5yxjGbJPH39rF--s6ZVsl_LxzE
6LdVFrgUAAAAAEMNq1ljl8HZSQ2sA8Hu6a8umPQr
6LfrPbMUAAAAAF2DLXNWH8-s0Ln08lXtaX9k1tRC

Indicators of Compromise

The list of all IOCs can be found on GitHub.

Acknowledgements

We’d like to thank Unit 42 for helping us with this blog. Special thanks to Bahman Rostamyazdi, David Fuertes, Taojie Wang, Tao Yan and Hector Debuc for helping us with the data.