This post is also available in: 日本語 (Japanese)
Palo Alto Networks customers with the Next-Generation Firewall and the Advanced URL Filtering security subscription are protected against the sophisticated types of phishing attacks discussed here.
Many existing phishing detection systems rely on the presence of login forms, brand logos and similar signals within the HTML of a webpage to determine whether the page is a phishing page.
In Figures 1-2, we can see an example of a phishing webpage employing client-side cloaking techniques. This page claims that there has been some unusual activity on the user’s Apple ID account and that the user needs to process a refund related to the activity.
Upon first visiting the webpage, there is no phishing form immediately apparent. Only after the user clicks the “Confirm Refund Request” button is the credential-stealing form revealed. Since many crawler-based phishing detection systems cannot handle these sorts of interactions, these types of phishing pages can often pass undetected.
After investigating the source code behind the page, we see that most of the page content does not exist directly within the main body of the HTML. Rather, there is a large script tag at the bottom of the HTML source that uses the document.write(...) API to inject the bulk of the page content into the HTML document; this happens only after the page is rendered in the browser.
Note that this script is also highly obfuscated, likely in an attempt to avoid phishing detection engines. The obfuscated code runs through AES decryption before being passed into the document.write(...) call.
Sophisticated phishing pages like these may pose problems to traditional phishing detection engines. Therefore, we need to investigate additional machine learning (ML) techniques to classify pages like these as phishing.
Once we have a phishing score from the model, we apply various thresholds to make our final verdict regarding whether to publish the given URL as phishing or not. The PhishingJS model contributes roughly 15,000 new phishing detections weekly, meaning that Palo Alto Networks customers – specifically those who subscribe to Advanced URL Filtering – will be protected from these sophisticated phishing attacks in the real world.
Here are some sample detections that our model has generated.
First, we can see that the model can detect phishing pages employing client-side cloaking, such as when the page requires the user to click a button before actually revealing the credential-stealing form. In Figures 4-5, we see an example of a phishing webpage impersonating a Dropbox login page. The user must first click a “Sign in with Gmail” or “Sign in with Outlook” button before being presented with a modal asking for their login information. This particular URL was detected as phishing with a score of 0.99998, meaning that the model was very confident in marking this page as phishing.
We find that the model is capable of detecting highly interactive scam pages as well. In Figures 8-10, we show a scam page claiming that the user has won a free Samsung Galaxy S2; all the user needs to do to claim the prize is share the page with five groups or 20 friends on WhatsApp.
Each time the user clicks the “Share” button, the page opens the user’s WhatsApp app and asks the user to share the page with another WhatsApp number. After each time the page is “shared,” the blue bar is incremented farther toward the right.
Once the user has shared the page with the requisite number of friends, the user is prompted to complete the final registration step. Presumably, after clicking “Complete registration,” the user will be shown a form asking for some sensitive information so that the scammers can “ship the free phone” (which, to be clear, does not exist) to the user.
Traditional phishing detection engines can often struggle to detect the increasingly sophisticated phishing webpages that cybercriminals are now crafting. Specifically, they are often unable to detect instances of client-side cloaking, wherein the phishing page may require some user interaction before revealing the actual phishing content, or wait until the page is rendered in the browser before injecting phishing content into the HTML document.
- The Innocent Until Proven Guilty Learning Framework Helps Overcome Benign Append Attacks
- Worldwide Phishing Attacks Ramped Up at the Peak of Working From Home
The author would like to thank Wei Wang for helping to guide the PhishingJS project from start to finish; Wayne Xin and Jingwei Fan for helping to get the model into production; Brody Kutt for the original model architecture; and Seokkyung Chung, Yu Zhang, Zeyu You and Ziqi Dong for helping to review the model detections.