This post is also available in: 日本語 (Japanese)
Threat actors register thousands of new domains daily, preparing for future malicious activities such as serving command and controls (C2), hosting malware and delivering deceptive content. Palo Alto Networks employs state-of-the-art methods to detect emerging network threats and protect customers through a cloud-delivered domain denylist. The majority of existing domain abuse detectors focus on digging up DNS lookup patterns of ongoing attacks and actively crawling web content for malicious indicators. They usually have delays in discovering new threats due to visibility and resource limitations. Thus, they fail to protect patient zero. In particular, to avoid being blocked, malicious domains usually conduct attacks only for a short period of time after the threats are hosted on them. As a result, it is often too late to block such domains after the malicious activity has been observed.
To detect potentially abused domains as quickly as possible and protect our customers, we developed a proactive system for Palo Alto Networks DNS Security to identify malicious domains at the time of registration based on their registration records. Our method leverages predictive indicators from WHOIS records that can expose abused network hotspots (e.g., registrars, name servers) and abnormal registration behaviors (e.g., bulk domain registration). Compared to a well-known publicly available online URL scanner (denoted as public-scanner going forward), our detector reduces the discovery time for malicious domains by 9.25 days on average. It achieves a five-times higher detection rate for suspicious newly registered domains (NRDs) from abnormally large registration campaigns compared to public-scanner.
Once the proactive detector captures a "will-be-malicious" domain, the knowledge is distributed from DNS Security to other Palo Alto Networks Next-Generation Firewall security subscriptions, including URL Filtering and WildFire.
To recognize malicious domains before their content launch, we needed to identify predictive features as indicators of abnormal behaviors by attackers, at the time of domain registration. The most common indicators include the specific network services favored by attackers due to cost, anonymity and censorship. Additionally, criminals usually launch their campaigns on thousands of domains registered in bulk to maximize profits and sustain attacks before the domains get blocked. Furthermore, malicious domain names also present unique lexical characteristics, such as using intimidating words, which are discussed below. All of these indicators can be extracted from WHOIS records, which are disclosed to the public once a domain’s registration is complete. Previous research has demonstrated that WHOIS information can effectively and accurately expose the domains potentially used for network abuse.
Based on the data available to us and our prior knowledge of network abuse, we leverage three groups of predictive indicators. The largest group of predictive indicators is the comprehensive reputation score of WHOIS records. Each domain's WHOIS record includes domain owners, registrars and name servers. Combined with the knowledge we accumulated during our continuous threat hunting, we can identify cybercriminal hotspots in the WHOIS dataset. To extract these indicators, we built a reputation system analyzing each field in WHOIS records. Then, the proactive detector calculates the reputation score of each NRD to quantify its similarity to confirmed malicious domains.
From the reputation databases, we can identify hotspots abused by the darknet market. We directly capture the registrants of known malicious domains. For example, the registrant email email@example.com is the identity of an attack operator, as 85.14% of its domains are confirmed phishing hosting sites. As shown in Figure 1, one of its phishing domains, ophenhand[.]org, hosts a fake shared document requesting Microsoft Outlook and Office 365 account credentials. While the first login option only redirects to an official Microsoft site with an error message, the other two send victims' credentials to the attacker’s server through the URL ophenhand[.]org/ghose123354/next.php.
Attackers favor some service providers, including specific registrars and name servers, due to low cost and loose censorship. Therefore, particular service combinations could be indicators for potential malicious activities. For example, we observed a large cluster of malicious domains using the same services. Their registrar is a major internet service provider based in the Asia-Pacific region, the WHOIS server is discount-domain[.]com and the name server is zi3qe[.]com. Out of all the NRDs with this profile, 87.01% are categorized as malicious or adult. Most of the domains are generated by domain generation algorithms (DGA), producing results such as hfcclixb[.]xyz.
Apart from what can be seen in WHOIS, burst domain registration is another reliable indicator for future network abuse. Threat actors usually deploy their services on hundreds and thousands of domains to evade threat hunters. This enables them to switch to alternate domains quickly when the old ones are taken down. To control cost and reduce operation efforts, adversaries are more likely to buy domains from the same registrars in bulk with the same WHOIS information. Our detection pipeline clusters daily WHOIS data to reveal registration campaigns and feeds the cluster information into the verdict prediction models. Intuitively, the larger a campaign a domain belongs to, the more suspicious it is.
The last group of features focuses on the lexical characteristics of malicious domains. Some keywords, such as secure, alert and award, are commonly used by attackers to generate deceptive domains similar to squatting domains. These intimidating words tend to convince victims that the malicious domains are related to something legitimate, important or profitable. On the other hand, noticeably random domain names are likely generated by DGAs. These domain names are meaningless for humans, but widely leveraged to carry C2 traffic. We build separate language models for both known malicious and legitimate domains to evaluate the likelihood of an NRD being dangerous.
Based on all of the above features, we train multiple supervised machine learning models to predict malicious domains from daily NRDs. Figure 2 shows the detection performance on the domains registered in December 2020. The system detected on average 500 malicious domains out of roughly 20,000 NRDs every day. The average daily detection rate is 2.56%. The following sections will illustrate how this predictive coverage provides significant protection, using statistics and real-world cases.
New malicious domains usually carry active attacks shortly after registration and are listed in public denylists later. In contrast, legitimate service developers typically buy their domains long before they release websites officially and serve many visitors. Figure 3 shows the DNS traffic distribution of suspicious domains captured by our proactive detector at the time of registration. We retrieve this DNS traffic from the passive DNS dataset. Of the DNS queries to these domains, 62.31% are requested within the first 10 days of activation. Only ~1% of traffic happens 30 days after activation, which means most attacks are launched within the first month. Thus, it's crucial to detect malicious domains as soon as possible. Unlike most network abuse detectors, which are equipped to recognize ongoing attacks, proactive detection can block the malicious domains before they cause any damage.
To evaluate the benefit brought by our system, we cross-checked the coverage rate of the public-scanner for the malicious DNS traffic detected by our system in Figure 4. A DNS query of a domain is considered to be covered by the public-scanner as long as one of its engines classifies the domain as malicious. As the detection time varies for separate domains and their DNS traffic distribution is different, the overall daily coverage fluctuates. However, there is a trend of increasing coverage rate as time goes. The public-scanner only blocks 8.23% of attack traffic on the registration day. The average coverage rate for traffic in the first 10 days is 17.14%. The public-scanner does not block 60% of the malicious DNS traffic until roughly 30 days after domain registration. In comparison, our proactive detector captures these domains 9.25 days earlier on average and covers 4.28 times more DNS traffic of these malicious domains.
C2 domain minorleage[.]top is an example illustrating the early detection advantage. The domain was registered on Nov. 13, 2020, and labeled as suspicious by our system. Its WHOIS record received a low reputation score because all domains registered by its registrant are confirmed malicious. Using other publicly available information, the historical malicious rate of its registrant state, “Moskow,” is 74%, and that of its registrar is 44%. Palo Alto Networks WildFire observed it serving a Trojan campaign from Dec. 23-Jan. 13, 2021. WildFire detected 298 pieces of malware in this campaign, performing penetration activities including stealing Windows vault passwords, accessing digital currency wallets and process injection. The malware resolved minorleage[.]top to three IP addresses (104.24.101[.]218, 104.24.100[.]218 and 172.67.167[.]27) hosting the C2 server. The malware then set up SSL connections to one of these addresses directly through port 443. After the initial communication, the C2 server sent a malicious payload of about 3.3 MB to the victims' machine. The JA3 fingerprint of C2 connection (JA3: 6312930a139fa3ed22b87abb75c16afa, JA3s:8685e43ade3e6ec8993efb5d149fb4bc) is widely used by the Sodinokibi ransomware. While the public-scanner started blocking the domain on Dec. 24, 2020, 68 pieces of malware had already been distributed by Dec. 23. Therefore, our proactive system's early detection can bring 23% additional coverage against this campaign's C2 traffic. Apart from connections from observed malware, we found more than 1,000 DNS requests resolving minorleage[.]top to the C2 addresses as early as Dec. 16 from passive DNS. This suggests the threat actors had deployed the attacking infrastructure and started penetration activities even earlier.
Another real-world example is a phishing domain called penguinsac[.]com. The attacker registered this domain on Dec 2, 2020. The proactive detector blocked it because the registrant is recognized as a dedicated threat actor. The domain hosts two fake login pages trying to steal victims' credentials for Microsoft OneDrive (Figure 5a) and Office (Figure 5b). It was labeled as a phishing domain by two vendors on Dec. 23 and three other vendors in the public-scanner. However, the earliest passive DNS traffic was traced back to Dec. 15. We found 10% of total malicious DNS requests happened before any vendor's detection provided by the public-scanner.
Higher Coverage for Malicious Domain Registration Campaigns
To attract more traffic and avoid being blocked, gray service launchers usually buy hundreds of domains in a short period with the same registration information. Therefore, large clusters of similar NRDs could be indicators of network abuse. With comprehensive visibility on NRDs' WHOIS records, our system has the advantage of recognizing this suspicious behavior and achieves a higher coverage rate for malicious domain registration campaigns.
Figure 6 compares the coverage rate for different sizes of registration campaigns between our proactive detector and the public-scanner. Our pipeline groups NRDs with identical registrant, registrar and NS information into the same cluster. This figure displays the percentage of domains labeled as will-be-malicious by our detector at the time of registration. For comparison, we calculate the rate of domains that are detected by at least one vendor in the public-scanner one month after their registration.
While the coverage rates are similar for small clusters, our detector significantly improves in coverage for campaigns with more than 500 domains. On average, our detection rate for NRDs belonging to these large registration campaigns is 21.44%, which is about five times higher than the public-scanner. This advantage appears for two reasons: first, the proactive system keeps scanning daily NRDs to have broad visibility on discovering suspicious domains, and second, our method calculates the correlation between NRDs to identify registration campaigns and considers this feature during abuse recognition.
Our detector captured a phishing campaign that registered four domains (kelvinso412[.]com, kelvinso45[.]com, kelvinso4[.]com and kelvinsoirnt98[.]com) on Dec. 30, 2020. These domains were bought from the same registrar simultaneously and use nameserver websiteserverbox[.]com from the same hosting service. All of them started pointing to the same phishing page three days after registration, receiving the highest daily visit count on Jan. 6, 2021. As shown in Figure 7, the attacker tried to steal the victims' Square credentials. We didn't see any vendors in the public-scanner that managed to block this attack completely. Although two of them labeled kelvinso45[.]com as phishing once it hosted malicious content, they didn't enforce the label consistently for the other three domains.
Unlike phishing campaigns that only involve a limited number of domains, gambling and adult campaigns are more likely to distribute through thousands of domains. These grayware websites usually employ automatic scripts to generate arbitrary domain names and register them in bulk. Our system captured one of these abuse campaigns during October and November 2020. Out of 11,831 NRDs with the campaign's WHOIS profile created during the same period, the proactive detector labeled 9,544 (80.67%) domains as suspicious, while the public-scanner only covered 498 (4.21%). In this campaign, we observed many Chinese adult domains, such as 99s13[.]xyz and fs10[.]xyz with one malicious count and one suspicious count in the public-scanner. However, we also observed many more NRDs with top-level domain (TLD) .xyz, such as 69av19[.]xyz and theav9[.]xyz, hosting similar content, despite being considered clean in the public-scanner.
Besides detecting sites involved in malicious activities directly, the proactive detector also provides innovative coverage for network abuse entry points. To maximize profits, darknet market actors, especially adult and gambling website operators, employ various methods to improve visibility and increase visits. One of the adversaries' strategies is redirecting traffic from many gateway domains they control by registering or purchasing traffic from the domain owners. These gateway websites aim to guide visitors to malicious landing sites, either by displaying deceptive links or redirecting visitors automatically.
It's not straightforward to detect domains used as gray services entrances. First of all, the launchers usually fill these websites with meaningless content or text crawled from legitimate publications such as news outlets. Furthermore, the attackers employ more sophisticated methods to conceal their intentions, such as hiding the malicious links in pictures and leveraging captcha before redirection. It's more challenging for content-based abuse detectors to trigger suspicious redirection and observe their relationship to the underground services.
Instead of digging for deceptive content or links, our detector investigates these darknet market gateways from their registration information and discovers suspicious indicators. For example, the proactive system captured a gambling campaign registering hundreds of gate domains on Dec. 10, 2020. The threat operators filled all their websites (e.g., hobbytoypark[.]com, jemstutoring[.]com, krk13pearland[.]com) with arbitrary articles copied from popular news outlets and cover images of best-selling books (Figure 8a). The text is meaningless and irrelevant to the pictures, so that it won't indicate the hidden shady services directly. The landing domain, cc222[.]com, is not introduced explicitly but is attached to all images (Figure 8b). Since there is no deceptive content nor malicious links on the page, these domains can escape content-based detectors that don't recognize the text in an image.
Our detector labeled this gray service campaign as suspicious based on predictive features at the time of registration. First, its WHOIS reputation score is low. The registrant information is redacted for privacy while the registrar, conbin[.]com, has 45.12% historical NRDs labeled as malicious. Furthermore, the NRD cluster algorithm grouped 842 domains registered on the same day within the same hour serving this campaign. This abnormal registration behavior is also a strong indicator of questionable activities.
At Palo Alto Networks, we keep close track of newly registered domains and proactively dig for potential cybercriminal activities, including C2, phishing and grayware hosting, as the majority of network attacks happen within a short period after malicious domain registration. Our system can prevent patient zero, detect more suspicious domains from attackers' registration campaigns compared to public-scanner, and discover innovative malicious indicators.
Palo Alto Networks identifies the detected domains with grayware category through our security subscriptions for Next-Generation Firewalls, including URL Filtering and DNS Security. Our customers are protected against any damage from risky domains mentioned in this blog as well as captured by our system. Other malicious indicators (IP, URL, SHA256) are covered via the Next-Generation Firewall, URL Filtering, and WildFire, where applicable.
Special thanks to Laura Novak, Eddy Rivera, Jun Javier Wang, and Arun Kumar for their help with improving the blog.