Detecting Patient Zero Web Threats in Real Time With Advanced URL Filtering

A conceptual image representing problems on the web, such as the patient zero web threats discussed here.

This post is also available in: 日本語 (Japanese)

Executive Summary

URL filtering solutions based on blocklists and databases are generally unable to catch patient zero web threats – in other words, malicious URLs that are being seen for the first time. The reason is not only due to the reactive nature of such classification (a malicious URL, domain or IP must be seen and allowed at least once before it gets analyzed and blocked), but also because of cloaking techniques used by sophisticated malicious actors. One-time URLs, short-lived domains, bot detection and other measures are widely used by malware and phishing campaigns in order to bypass security crawlers and scanners.

Our inline analyzers and machine learning techniques detected and blocked more than 500,000 patient zero URLs and about 200,000 malicious scanning requests in June-September. In this blog, we describe examples of the most noticeable campaigns from that data set and the types of stealthy techniques they use. We observe examples of malware delivery, command-and-control (C2) URLs, phishing links and grayware scams. The inline vantage point also gives us the ability to detect malicious scanning activity and attacking requests.

We detect thousands of patient zero URLs per day. Among these patient zero detections, Unit 42 researchers have found multiple campaigns that traditional detection crawlers are unable to catch because of various cloaking techniques deployed (e.g., browser-version cloaking, CAPTCHA protection). Moreover, we see abuse of legitimate hosting and file sharing platforms, as well as many child URLs of recently compromised websites — such cases are hard to cover with blocklists, as the root domain usually has a high benign score. As such, it is not surprising that 40% of these patient zero detections remain uncaught by VirusTotal vendors even after 48 hours of the first detection by our inline analyzers. This highlights the benefits of inline real-time analysis.

Palo Alto Networks Next-Generation Firewall customers with Advanced URL Filtering are protected against patient zero malicious campaigns similar to the ones described in this blog. All the mentioned malicious indicators (domains, IPs, URLs and hashes) are also covered by DNS Security and WildFire products.

How Do We Detect Patient Zero Web Threats?

Figure 1. Traditional URL filtering versus Advanced URL Filtering.
Figure 1. Traditional URL filtering versus Advanced URL Filtering.

Figure 1(a) illustrates how a traditional URL filtering service works. Whenever a client visits a URL, the firewall holds the request for a short time and queries the URL category database to decide whether to allow or block it. If the request passes this database lookup, the firewall receives web content from the target host server and passes it to the client. After a delay, offline crawlers pick up the visited URL for analysis and crawl the same URL again. Only after crawling and analyzing the web contents will the URL category database be updated.

There are known weaknesses in this traditional URL filtering process. First, organizations are exposed to the very first attack, since the traditional URL filtering solution is only able to detect a threat after it is first seen by protected organizations. Protection the second time and beyond is welcome, but a first attack can be extremely dangerous. For example, say a new URL appears in network traffic but is not blocked by the URL category database. Even if it looks suspicious – such as mybank-account-ma234[.]com shown in Figure1(a) – the traffic may go through the firewall (based on firewall configurations) and the organization may be compromised. Another problem with relying on an existing database is that malicious hosts can fool offline crawlers with some techniques such as cloaking and CAPTCHA. Offline crawlers might get completely different web contents than what a real visitor gets from the malicious host. In this way, malicious URLs can bypass offline crawler-based detections.

A system with both a URL database and real-time detectors, as shown in Figure 1(b), can address issues with the database-only method. With integrated inline detectors, it becomes possible to detect threats when they are first seen. When the client tries to visit a URL, a cloud-based system can check both the URL database and the real-time detectors before allowing traffic to go through the firewall. If a real-time detector identifies that a request involves security threats, it can block the traffic even if the URL is unknown or benign in the URL database. For example, in Figure 1(b), a new URL appears in the network, which is not in the blocklist of the URL category database. With the power of inline machine learning and pattern matching techniques, the real-time analyzers can still detect it as malicious and block it.

As an example, Palo Alto Networks Advanced URL Filtering leverages multiple machine learning models for different types of threats. For example, our research paper “Innocent Until Proven Guilty,” published in 2021 at the 4th Deep Learning and Security Workshop (co-located with the 42nd IEEE Symposium on Security and Privacy), shows one of the novel deep learning paradigms deployed in Advanced URL Filtering. This framework has multiple benefits, such as resistance to adversarial noise and resistance to performance loss due to a data distributional shift. In other words, we improve the model robustness to out-of-distribution content by discovering noise-resistant and uniquely identifiable features of the modeled classes. In addition, Advanced URL Filtering incorporates millions of signatures generated via a continuous signature mining process with machine learning, as well as signatures added by Unit 42 researchers to cover specific malicious or suspicious campaigns.

Malicious Cloaking Campaigns

Malicious websites may deploy various techniques to hide their malware or phishing content from non-human visitors in order to avoid automated analysis by security scanners and crawlers, while maximizing the number of victims. For example, attackers may check for a residential IP range, target specific browser versions or even show a CAPTCHA challenge to make sure that the user is not an automated bot. Such techniques are commonly known as “cloaking.” Several recent research papers show the high effectiveness of client-side cloaking by phishing campaigns in addition to the classic server-side cloaking. For example, campaigns may fingerprint and bypass known crawlers (PhishPrint: Evading Phishing Detection Crawlers by Prior Profiling by Acharya et al., Usenix Security 2021), or deploy various dynamic checks on the page (CrawlPhish: Large-scale Analysis of Client-side Cloaking Techniques in Phishing by Penghui et al, IEEE Symposium on Security and Privacy 2021).

While this is a limitation for crawling, it is not for inline analysis, which sees the original content in real time. Moreover, we still have the original URL to inspect in both cases – one thing that an attacker can’t hide, which may reveal malicious signs or links to known campaigns that are enough for a confident detection. In fact, by using static URL analysis, it is possible to track various malicious campaigns which use cloaking and are undetectable by other crawler-based analyzers.

Here, we focus on the popular cloaking trend of targeting mobile browsers only (as first discovered in PhishFarm: A Scalable Framework for Measuring the Effectiveness of Evasion Techniques Against Browser Phishing Blacklists by Oest et al., IEEE Symposium on Security and Privacy 2019). To identify malicious cloaking campaigns that employ this method, we used different browser profiles – including well-known mobile versions of web browsers (mimicking mobile OS platforms) – to repeatedly crawl a small fraction of our daily URL feeds. Then, we selected URLs that were detected as malicious only after crawling with the mobile browser, but received a benign verdict after the desktop crawl. From June to August, we identified 15,948 malicious cloaked URLs using this method. Out of these, 2,601 were detected via our static URL analysis (similar to the one deployed in Advanced URL Filtering), otherwise not detected with the crawlers. In other words, 16.3% is a lower bound on the cloaking detection benefits of URL-only detectors, and this is based only on a single cloaking variant. Among detected malicious cloaking URLs, we indeed see expected examples of mobile-only phishing pages, which reveal phishing content only when we emulate a mobile browser. Other notable campaigns are described below.

Multi-step Phishing Campaigns

First, in addition to classic phishing campaigns, we see a popular multi-step phishing campaign that serves a PDF file first with a malicious link hidden behind a fake CAPTCHA prompt, as shown in the figure below. Then, the link performs several redirections, which include more anti-bot checks.

Figure 2. Example of a popular fake-CAPTCHA PDF phishing campaign at aliceinformaticasrl[.]com/user/pages/20991962233.pdf
Figure 2. Example of a popular fake-CAPTCHA PDF phishing campaign at aliceinformaticasrl[.]com/user/pages/20991962233.pdf

Cloaking Malware Delivery

Second, we see cloaking malware delivery. For example, URLs that distribute malicious APK downloads usually work only on a browser with the Android Chrome user agent. For example, cq6ydl[.]qp8u[.]com/yx/22238.apk was serving Android Adware with SHA256: d3b0fbd6ff688034471e4400717742ffa21dcb1c909b0c1a1b2e82b34ae91d03. The download might not work for users who visited with browsers without the Android Chrome user agent.

In addition, we see malware delivery URLs, which are not targeting mobile users, but are cloaking against default browsers used by Palo Alto Networks crawlers. Examples include the campaigns below:

178stu[.]com/new2_r_login.exe?collcc=739845076&collcc=3630194067&
25697[.]xc[.]wenpie[.]com/down/matlab%202017a%20???????????????@1166_3054.exe
xz[.]duote[.]com[.]cn/softdown/minjiehf@_29773.exe
down2[.]abckantu[.]com/tui/tips/2/v1.2.0.17/tips2-4.exe
govole[.]info/d23c1cda8c7736f9842243148d5eaf5b/getfp.exe
48995[.]xz[.]dy008[.]com/acdiu/setup_2000.exe
48156[.]xc[.]zhongguohao123[.]com/down/%E6%B8%85%E5%8D%8E%E5%A4%A9%E6%B2%B3pccad2015%2064%E4%BD%8D%E7%A0%B4%E8%A7%A3%E7%89%88@1166_9653.exe
72jdxe[.]securedfile[.]ru/b2/3/7/888e525e633be262a4412eff50518a2f/SpywareTerminatorSetup.exe
24910[.]xc[.]wenpie[.]com/down/cfree@1577_2873.exe
48272[.]xz[.]dy008[.]com/czasd/Setup_2000.exe

Another campaign, which injects malicious VBScript malware (targeting Internet Explorer browsers) and attempts to run commands over the WScript shell, was also noticed trying cloaking against our crawlers using server-side techniques, as shown in Figure 3 below.

Figure 3. Malicious VBScript injected into http://www[.]bjcslper[.]com/info/js/js/js/css/js/js/js/js/js/css/js/js/css/js/js/css/js/js/css/js/js/css/js/js/js/js/js/js/css/js/css/js/js/css/js/js/css/css/js/js/js/js/js/js/css/js/js/js/css/style.css
Figure 3. Malicious VBScript injected into http://www[.]bjcslper[.]com/info/js/js/js/css/js/js/js/js/js/css/js/js/css/js/js/css/js/js/css/js/js/css/js/js/js/js/js/js/css/js/css/js/js/css/js/js/css/css/js/js/js/js/js/js/css/js/js/js/css/style.css

Malvertising URLs

Third, we see cloaking widely used in malvertising. In general, you have more chances to be redirected to a malicious landing page when crawling with a mobile browser – and potentially revisiting the URL several times. Such URLs as below are rather complex and have a fingerprintable structure and patterns, which helps for inline detection:

hxxp://best-targeted-traffic[.]com/install.php?pais=Unknown&unq=19o721145058oildxbc&version=1.7
click[.]imageperfect[.]in/lp/lp.php?urlid=2bccd82ee1&adst=152313&nsrc=5090&visitor_id=bmconv_20210809015727_8eea308d_b7d0_4637_9aa6_214ef468fbb9&siteid=2_to

What About Patient Zero Web Threats?

Given that static URL detection adds coverage of cloaking campaigns, it is not surprising that we see the first-time detections of cloaking malicious web pages, which were detected in real time with Advanced URL Filtering. For example, the URL coursera-quiz-answers-quora.pageinternetinfo[.]pw was registered on June 19, 2021. On July 18, 2021, at 16:49:51.414 UTC, we first detected and blocked it in our customer traffic inline. VirusTotal’s first-time scanning of this URL happened on July 22, 2021, at 22:28:45 UTC. However, by August 19, 2021, at 04:10:43 UTC, when we requested a re-analysis (see Figure 5), no vendor in VirusTotal detected it. The domain uses cloaking techniques like CAPTCHA before redirecting users to the type of phishing or grayware websites (such as websites distributing unwanted browser extensions) shown in Figure 4. Cloaking techniques could be the reason it remained undetected on VirusTotal.

Figure 4. CAPTCHA-protected page after redirecting from the malicious URL coursera-quiz-answers-quora.pageinternetinfo[.]pw

Figure 4. CAPTCHA-protected page after redirecting from the malicious URL coursera-quiz-answers-quora.pageinternetinfo[.]pw
Figure 4. CAPTCHA-protected page after redirecting from the malicious URL coursera-quiz-answers-quora.pageinternetinfo[.]pw
Figure 5. VirusTotal’s results for the URL coursera-quiz-answers-quora.pageinternetinfo[.]pw by August 19, 2021.
Figure 5. VirusTotal’s results for the URL coursera-quiz-answers-quora.pageinternetinfo[.]pw by August 19, 2021.

Ransomware Patient Zeros

Ransomware is one of the top threats in cybersecurity today. Unit 42 has been tracking ransomware threats for years. According to research we recently released on the behavior observed in common ransomware families in 2021, web browsing is the second most popular way to deliver ransomware.

To evaluate the patient-zero prevention efficacy of our techniques against ransomware, we tested the real-time analyzers in Advanced URL Filtering against all URLs observed delivering ransomware from January 2021 to August 2021. The result shows that the real-time analyzers in Advanced URL Filtering are capable of blocking 29.18% of these ransomware URLs. Those URLs cover 29.6% of all ransomware samples we collected during this period.

Please note that databases already cover many of the samples we used in our testing. The significance of the testing in this case is that it demonstrates that real-time analyzers can block almost a third of the ransomware delivery URLs we checked – even if they are patient zero web threats to our databases.

Figure 6 below shows the distribution of the ransomware families that can be captured with our real-time analyzers. As we found earlier, these ransomware variants are very diverse. However, our real-time detectors are able to capture a large variety of them.

Figure 6. Distributions of ransomware variants detected by Advanced URL Filtering.
Figure 6. Distributions of ransomware variants detected by Advanced URL Filtering.

We also measured how well our real-time detectors in Advanced URL Filtering can block different ransomware variants. As the figure below shows, the real-time detectors perform very well against TorrentLocker and Xorist, but still need to improve on the popular VirLock variant and other small ransomware families.

Figure 7. Real-time detector’s detection rate for different ransomware variants.
Figure 7. Real-time detector’s detection rate for different ransomware variants.

Compromised or Abused Websites

Compromised or abused websites (such as web and file hosting services) are usually popular websites, which makes it easier for attackers to reach more victims. Traditional URL filtering databases usually fail to block compromised or abused popular websites as the websites may host both legitimate and malicious content, and the malicious content changes URLs rapidly.

This can be addressed by adding inline detection against these patient zero threats, especially to protect against rapidly varying malicious child URLs produced via abusing a web or file hosting service. For example, Advanced URL Filtering uses this technique to detect more than 100 compromised or abused websites daily on average. The following shows two examples of abusing 000webhost and Discord URLs.

Abuse of 000webhost

000webhost (000webhostapp.com) is one of the most popular web hosting services. It has been abused to host malware and phishing. The following example shows one of the pieces of malware that has been hosted on this service. The malicious websites hosted on this service tend to have a significantly short lifespan.

http[:]//udskhhkdsjdjskjdds[.]000webhostapp[.]com/nnv.exe

  • SHA256 of downloaded sample: 335cf91959d1dcb04c2e68431300d06a62035e31daba1d19dbfcca0aa398bda2
Figure 7. Screenshot of URL http[:]//udskhhkdsjdjskjdds[.]000webhostapp[.]com, which hosts malware (e.g., nnv[.]exe) and has no index webpage being set up.
Figure 7. Screenshot of URL http[:]//udskhhkdsjdjskjdds[.]000webhostapp[.]com, which hosts malware (e.g., nnv[.]exe) and has no index webpage being set up.

Abuse of Discord

Discord (discordapp.com) is a VoIP, instant messaging and digital distribution platform designed for creating communities. Its content delivery network (cdn.discordapp.com) is where Discord hosts images and other files. It has at times been abused to store malware. The following figure shows two pieces of malware (Setup2.exe, Nitro_Gen.exe) stored in this service. Some malware samples are observed not to be accessible any longer due to the permission change in Google cloud storage.

Figure 8. Two pieces of malware downloaded from discordapp[.]com (download bar); one of the malware samples is no longer accessible due to permission change (XML Error).
Figure 8. Two pieces of malware downloaded from discordapp[.]com (download bar); one of the malware samples is no longer accessible due to permission change (XML Error).
https[:]//cdn[.]discordapp[.]com/attachments/873992598220599389/873994139908313148/Setup2.exe

  • SHA256 of downloaded sample: 0984c3c6a8ab0a4e8f4564ebcd54ab74ae2d22230afafe48b346485251f522e2

https[:]//cdn[.]discordapp[.]com/attachments/831792884545093653/834461595358199908/Nitro_Gen.exe

  • SHA256 of downloaded sample: f5b7b51ef8f1d1e76c86bd1e78d99c439c6e65361d4560b2c9e7345cebffdcca

Command-and-control (C2) URLs

Real-time detectors can also be capable of capturing C2 URLs by recognizing specific patterns and signatures for various C2 families. Traditionally, researchers collect C2 indicators of compromise (IoCs) through analysis of malicious files and then publish them to a URL filtering database. This can, however, be used to contribute to real-time detectors as well. For example, with the vast malicious file database of WildFire, Unit 42 researchers are able to mine the patterns and signatures of the behavior of malicious files and apply them to Advanced URL Filtering. As a result, Advanced URL Filtering is able to capture new C2 URLs that are similar to what has already been observed and added to WildFire. During August, Advanced URL Filtering detected and blocked 1,685 first-time C2 communications in real time. Below are some examples.

C2/Sality-A is the threat name associated with the C2 servers used by members of the Sality malware family. The following are some detection examples of C2/Sality-A.

  • www[.]eri[.]edu[.]pk/images/logo.gif?213c963=209107026
  • st1[.]dist[.]su[.]lt/logoh.gif?397ab36=301357070
  • motherengineering[.]com/images/logo.gif%3f345a38=24016776
  • cart133[.]org/images/main.gif?1f3bc6f=163753515
  • cacs[.]org[.]br/novosite/logos.gif?5f4e290=499674320
  • web4m[.]de/wordpress/wp-content/themes/twentyfourteen/image.gif?2df1b=1505496

The Ursnif Trojan can collect the system activity of the victims, record keystrokes and keep track of network traffic and browser activity. The malware stores the data in an archive and then sends it to the C2 servers. The following are some detection examples of Ursnif.

  • 13[.]59[.]135[.]197/wp-includes/fqhw5-6k88r-dgufy.view/
  • 35[.]233[.]127[.]71/zjed1-iae7t-kdzwv.view/
  • 114[.]116[.]171[.]195/wp-includes/h5zf-65kb9-btmdu.view/
  • 119[.]9[.]136[.]146/ctkfp-ebmhpu-vifzs.view/
  • 13[.]127[.]110[.]92/wcs3-94yxcd-vpne.view/
  • 128[.]199[.]72[.]218[:]4700/wp-content/uploads/b4t7-uqcaw8-bvfis.view/

Malicious Newly Registered Domains

Newly registered domains are often employed to host phishing websites or malware. Attackers may host malicious content on the new domains anytime they want – sometimes after a delay. Before that happens, traditional URL filtering may label it as benign or unknown. However, it is possible to detect a newly registered domain based on URL the first time it starts to host malicious content and shows in a firewall. For example, from mid-July to mid-August, Advanced URL Filtering successfully detected and blocked 3,594 sessions with newly registered domains.

More Phishing Campaigns

Phishing attacks lure people to click seemingly legitimate links, but redirect them to fake web pages to steal credentials. Attackers usually use camouflage techniques to make the phishing URLs look more legitimate. For example, attackers may use squatting domains in the display URLs to trick users into clicking. However, this also exposes semantic information that a deep learning model can capture and analyze.

For example, Advanced URL Filtering is empowered with multiple machine learning models trained with hundreds of millions of malicious and benign URLs to detect malicious URLs like those commonly associated with phishing campaigns. The examples below are some phishing attacks we detected recently.

The URL hxxp[:]//securty-supporrt[.]sun2seauvprotection[.]com[.]au/customer_center/customer-IDPP00C793/myaccount/signin/ is a phishing website targeting PayPal users as shown in the figure below. The website is likely compromised as the domain itself has been registered for years and points to a legitimate website.

Figure 10. Screenshot of phishing URL http[:]//securty-supporrt[.]sun2seauvprotection[.]com[.]au/customer_center/customer-IDPP00C793/myaccount/signin/
Figure 9. Screenshot of phishing URL http[:]//securty-supporrt[.]sun2seauvprotection[.]com[.]au/customer_center/customer-IDPP00C793/myaccount/signin/

Detecting Scanning URLs

It is important to detect attacking or scanning patterns in URLs as well. Different from other web threats, scanning or attacking URLs found in web traffic usually mean that a client has a rogue process running that probes or compromises legitimate web servers. In particular, this may indicate a host infected with malware that attempts to spread to other servers.

Traditional blocklist-based URL filtering platforms are not able to filter scanning and attacking traffic as the attacker’s target URLs change frequently. Attacks try to search for and exploit vulnerabilities in remote servers. And before getting compromised, those remote servers cannot be added to a blocklist since they are still legitimate.

Using real-time detectors changes the equation, as scanning and attacking traffic can be blocked inline, avoiding the need to add legitimate target URLs to blocklists. For example, during the first three months after release (from June to August), Advanced URL Filtering captured and blocked several malicious scanning activities and probing behaviors. Below are several examples:

LokiBot scanning:
192.xx.xxx.xxx/sqladmin/fre.php
192.xx.xxx.xxx/axis2/axis2-admin/fre.php
192.xx.xxx.xxx/websql/fre.php
192.xx.xxx.xxx/phppma/fre.php192.xx.xxx.xxx/mysql/dbadmin/fre.php

Ursnif scanning:
192.xxx.xx.xx/index.php/Index/images/images/calendar/images/css/css/css/js/css/js/images/user.gif
192.xxx.xx.xx/index.php/Index/images/images/calendar/images/css/css/css/js/css/js/images/logo.gif
192.xxx.xx.xx/index.php/index/css/images/calendar/js/css/js/js/js/js/js/images/key.gif
192.xxx.xx.xx/index.php/index/css/images/calendar/js/css/js/js/js/js/js/images/user.gif

Path traversal:
192.xx.xxx.xx/..../..../..../..../..../..../..../..../..../windows/win.ini
159.xxx.x.xx/iisadmpwd/..\..\..\..\..\../winnt/system32/cmd.exe?/c%2Bdir%2Bc:
212.xxx.xxx.xx/pbserver/..\..\..\..\..\../winnt/system32/cmd.exe?/c%2Bdir%2Bc:/%2B/OG
owner-experience.com/httpInboundTracking/modules/profile/user.php?aXconf%5Bdefault_language%5D=../../../../../../../../../../../../../../../../../../etc/passwd
212.xxx.xxx.xx/php.exe?c:\boot.ini

Conclusion

To perform URL filtering more effectively, it’s important to go beyond the traditional approach of using a database to gather known threats. Here, we discussed how a malicious URL filter database can be combined with real-time detection capabilities to capture and prevent patient zero threats.

Here, we presented the example of Advanced URL Filtering. Powered by machine learning and the work of Unit 42 researchers, the inline analyzers used in the product have detected and blocked hundreds of thousands of patient zero URLs since release.

We also presented several campaigns that would likely be missed by the traditional URL filtering approach that the real-time analyzers captured, including ransomware, phishing and C2. Some campaigns leverage different evasion techniques like cloaking to avoid being detected by offline crawlers. Some host the malicious content on popular web hosting services or newly registered domains to bypass URL blocklists. In addition, we also presented the importance of blocking scanning activities with real-time analysis.

Palo Alto Networks Next-Generation Firewall customers with Advanced URL Filtering are protected against patient zero malicious campaigns similar to the ones described in this blog. All the mentioned malicious indicators (domains, IPs, URLs and hashes) are also covered by DNS Security and WildFire products.

Acknowledgment

We want to thank Jun Javier Wang, Kelvin Kwan, Erica Naone and Laura Novak for their invaluable input on this blog post.

Indicators of Compromise

Indicators of compromise including suspicious URLs, phishing URLs, indicators of malware behind cloaking, compromised/abused websites, C2/Sality-A and Ursnif Trojan can be found in our GitHub repository.