This post is also available in: 日本語 (Japanese)
Executive Summary
This article examines two specific issues in Google Kubernetes Engine (GKE). While each issue might not result in significant damage on its own, when combined they create an opportunity for an attacker who already has access to a Kubernetes cluster to escalate their privileges. This article serves as a crucial resource for Kubernetes users and administrators, offering insights on safeguarding their clusters from potential attacks.
The first issue we’ll discuss is the default configuration of GKE's logging agent FluentBit, which runs by default on all clusters. The second issue is the default privileges for Anthos Service Mesh (ASM), which is an optional add-on that customers can enable. ASM is Google's implementation of the Istio Service Mesh that controls service-to-service communication within a GKE environment.
If an attacker has the ability to execute in the FluentBit container (e.g., by discovering a remotely exploitable vulnerability in that component) and the cluster has ASM installed, they can create a single powerful chain to gain complete control of a Kubernetes cluster. Attackers can use this access to conduct data theft, deploy malicious pods and disrupt the cluster's operations.
Kubernetes is the most widely adopted open-source container platform used for application deployment and management. Due to its complexity, many Kubernetes environments are susceptible to security breaches due to misconfiguration and excessive privileges. In some cases, this can occur without any awareness from the customer, which can leave them exposed to security vulnerabilities.
Google fixed both configuration issues on Dec. 14, 2023, with GCP-2023-047.
Palo Alto Networks customers are better protected from the attacks described in this article in the following ways:
- Organizations can engage the Unit 42 Incident Response team for specific assistance.
- Prisma Cloud can help you protect your Kubernetes cluster from a variety of threats, including attacks that target system pods and add-on pods.
- Cortex XDR enhances the capabilities of SOC teams by providing a comprehensive incident narrative spanning the entire digital environment.
Related Unit 42 Topics | Kubernetes, Container Escape, Google Cloud |
Kubernetes and Google Kubernetes Engine (GKE)
Before delving into the issues and their exploitable nature, it is imperative to establish a common understanding of several fundamental Kubernetes concepts, as well as a basic overview of GKE capabilities.
Kubernetes Concepts
Kubernetes is a complex platform that is composed of a number of different concepts. These concepts are essential for understanding the attack scenario. For a better understanding of the attack scenario we will explain two of them: DaemonSets and role-based access control (RBAC).
DaemonSets
DaemonSets and Deployments are two Kubernetes controllers that are used to manage the creation and deployment of pods.
However, they have different purposes and use cases. Deployments are used to manage the creation and deployment of multiple instances of a pod across a Kubernetes cluster. This is useful for running stateless applications, such as web servers and database servers.
DaemonSets are used to ensure that a single instance of a pod is running on each node in a Kubernetes cluster. This is useful for running pods that provide essential cluster services, such as logging, monitoring and networking.
RBAC Permissions
Role-based access control (RBAC) is a security mechanism that provides a fine-grained access control mechanism for resources in a Kubernetes cluster. RBAC works by assigning users and groups to roles, and then granting those roles permissions to specific resources.
RBAC is an important security feature in Kubernetes because it helps prevent accidental privilege escalation and unauthorized access. By carefully configuring RBAC permissions, it is possible to ensure that users and groups only have the permissions that they need to perform their jobs.
GKE Features
GKE provides a number of features that are inherent to the platform, as well as some optional features that can be enabled. These features are designed to simplify the deployment and management of Kubernetes clusters. However, it is important to be aware of the potential security implications of these features.
FluentBit
FluentBit is a lightweight and efficient log processor and forwarder. Since Milestone 105 (March 2023), FluentBit has been the default logging agent in all GKE clusters, deployed by default as DaemonSets.
This means FluentBit is installed on each node in the cluster. It is enabled by default from Container-Optimized OS 109, and the legacy agent is disabled by default.
Anthos Service Mesh
Anthos Service Mesh is Google's implementation of the powerful Istio open-source project, allowing users to manage, observe and secure services without having to change application code.
The Next Generation of Second-Stage Cloud Attacks
Second-stage cloud attacks are a type of attack where the attacker has already gained some level of access to the Kubernetes cluster. The attacker will then look to spread into the cluster or escalate their privilege and they will search for misconfigurations or other vulnerabilities to do so.
In these cat and mouse games between attackers and defenders, container escape will continue to be a threat. Attackers will always try to find a way to escape and gain control over cloud environments. Cloud infrastructures and environments should be secure enough that even if an attacker succeeds in entering, they will not be able to do damage (or at least no significant damage).
Sometimes it will be possible to think that a certain misconfiguration is not necessarily a security matter or an issue that can affect your protected cloud environment. However, chaining several misconfigurations can lead to the creation of a strong exploit chain.
The two issues described in this post can be chained as a part of a second-stage attack to gain full control of a Kubernetes cluster.
The Privilege Escalation Finale
Prerequisite Step: FluentBit Exploit or Container to Node Escape
Since this is a second-stage attack, the attacker must first exploit the FluentBit container by discovering a remote code execution or arbitrary file read vulnerability, or otherwise breaking out of another container to gain access to the Node.
First Step: Exploit FluentBit Permissions To Read Projected Service Account Tokens
The first step of this chain exploits a misconfiguration, which is that the FluentBit container mounted the /var/lib/kubelet/pods volume. Below this directory, each pod running on a node has a kube-api-access volume that contains a projected service account token.
Figure 4 shows how this allows the FluentBit container to access any token of any of the pods on the node.
The kube-api-access volume contains the projected service account token for a pod to communicate with the Kubernetes API, which is a sensitive piece of information. If an attacker compromises the FluentBit pod, it would have access to its volume, and they could use any token of any pod on the node.
Using the pod token, the attacker can impersonate a pod with privileged access to the Kubernetes API server and gain unauthorized access to the cluster. In addition, the attacker could map the entire cluster, as they are able to list the running pods (using the get pods command).
Besides gaining unauthorized access to the cluster, an attacker can escalate their privilege or perform harmful actions. In fact, this gives the attacker a huge attack surface, depending on the permissions of the neighboring pods in the node.
The FluentBit container, by default and even generally, does not need direct access to the Kubernetes API server. This container can use either the Kubernetes infrastructure or its sidecar container. This is because the sidecar container’s primary purpose is to collect, parse and forward logs from the main application container. The sidecar container operates within the context of the pod and leverages the Kubernetes infrastructure to access log files and container runtime metadata.
Second Step: Exploit Istio Post-Installation Permissions
The second step of this chain exploits the fact that the ASM's Container Network Interface (CNI) DaemonSet retains excessive permissions post-installation. This allows an attacker to create a new pod with ASM's CNI DaemonSet permissions. When enabling Anthos Service Mesh, Istio-cni-node DaemonSet is installed in the cluster.
The Istio-cni-node DaemonSet is responsible for installing and configuring the Istio CNI plugin on each node in the cluster. As such, it has powerful permissions to perform these tasks. But once it's up and running, it won't need such extensive permissions.
The DaemonSet has two roles:
- Installing the CNI plugin. It does this by hostPath mounts and writing some files to the host FS (later read by Kubelet). This requires no RBAC, but does use a hostPath mount.
- A "repair" mode. This detects if pods were started without configuration and handles them. This requires some RBAC privileges to work.
The need for these permissions by ASM's CNI DaemonSet can allow an attacker to exploit the DaemonSet and gain unauthorized access to the cluster, for example, by creating a “powerful pod.”
Chaining the two issues we’ve discussed together allows an attacker to gain complete control over the Kubernetes cluster by escalating privileges to cluster admin.
Full Chain: Gaining Cluster Admin
After understanding the Kubernetes concepts and the issues, let’s see how we can leverage them to gain privileged access to the cluster as a cluster admin.
Prerequisite: Anthos Service Mesh Feature Is Enabled
Once the attacker has gained privileged access to the Kubernetes cluster – a task that can be done by taking control of the FluentBit container – an attacker can exploit the default configuration of a FluentBit container to mount the /var/lib/kubelet/pods volume. which has access to the kube-api-access-<random-suffix> directory. By doing so, the attacker will have all the tokens from all pods with a node.
The fact that FluentBit is a DaemonSet allows the attacker to search for any mounted tokens of any other pods in the cluster by repeating the initial compromise on each node. The attacker can map the entire cluster and find the Istio-Installer-container token.
The attacker will then take advantage of the ASM CNI DaemonSet's powerful permissions after the installation process is complete. The attacker will then create a new pod in the Kube-System namespace.
For this to be a meaningful privilege escalation, the attacker would need to target a powerful service account.
The Kube-System namespace offers a number of preinstalled, extremely powerful service accounts to choose from.
The clusterrole-aggregation-controller (CRAC) service account is probably the leading candidate, as it can add arbitrary permissions to existing cluster roles. The attacker can update the cluster role bound to CRAC to possess all privileges.
Figure 6 shows how the attacker will grant the CRAC’s service account in the pod’s YAML file and they will finally save the token in one of their own volume folders.
The CRAC token is now mounted to the new pod the attacker just created. The attacker can once again exploit the FluentBit misconfiguration and take the CRAC token, which has permissions to operate as a cluster admin by itself.
Fixes and Mitigations
FluentBit uses a hostPath volume mount of the /var/lib/kubelet/pods directory to let it read certain logs, which it needs to do its job. Before the fix, it was clear that the volume mount configuration included unnecessary access to the pod directory (including the projected service account tokens). The Google Security Team fixed and reduced FluentBit's access to only the logs required.
Regarding Anthos Service Mesh, Google was already aware of the high privileges associated with its CNI DaemonSet from an internal report. They were already in the process of fixing it and reducing its permissions when we reported it to them and this has now been fixed.
Unit 42 researchers were the first to combine the FluentBit vulnerability with ASM's CNI DaemonSet privileges to an attack chain that eventually allows escalating to cluster admin privileges.
Following our advisory and over the course of the last several weeks, Google deployed fixes and mitigations to FluentBit volume mount and Anthos Service Mesh high-privilege permissions. These prevent the reported attack and harden the platform against similar exploits.
Google addressed these issues by doing the following:
- They removed the /var/lib/kubelet/pod volume mount from the Fluent Bit pod, eliminating its access to the projected service account tokens for other pods.
- They modified ASM's ClusterRole and re-architected some of its functionality to remove excessive RBAC permissions.
Conclusion
Kubernetes is a powerful container orchestration platform that organizations of all sizes use to run their applications. However, system pods in some circumstances can be an unguarded area of an organization’s security.
Cloud vendors automatically create system pods when your cluster is launched. They are built in your Kubernetes infrastructure, the same as add-on pods that have been created when you enable a feature. This is because cloud or application vendors typically create and manage them, and the user has no control over their configuration or permissions. This can also be extremely risky since these pods run with elevated privileges.
This post demonstrates how an attacker can use two issues in system pods and add-on pods to escalate privileges and gain admin permissions.
Palo Alto Networks Protections and Mitigations
Prisma Cloud
Palo Alto Networks Prisma Cloud is a cloud security platform that are designed to help you protect your Kubernetes cluster from a variety of threats, including attacks that target system pods and add-on pods.
Prisma Cloud provides a variety of features that can help you to:
- Monitor and detect suspicious activity in your cluster.
- Identify misconfigurations and excessive privileges in system pods and add-on pods.
- Prevent attackers from exploiting misconfigurations and vulnerabilities in system pods and add-on pods.
By using Prisma Cloud, you can improve the security of your Kubernetes cluster and protect it from a wide range of threats.
Cortex
Cortex XDR enhances the capabilities of SOC teams by providing a comprehensive incident narrative which may span their entire digital environment. This is achieved through the integration of activity data from Kubernetes Nodes and Kubernetes API Server audit logs, as well as endpoint and network data. Cortex effectively uses this information to identify anomalous Kubernetes actions that align with established TTPs, including Kubernetes credential theft, cryptojacking, container escapes and other security threats.
If you think you may have been compromised or have an urgent matter, get in touch with the Unit 42 Incident Response team or call:
- North America Toll-Free: 866.486.4842 (866.4.UNIT42)
- EMEA: +31.20.299.3130
- APAC: +65.6983.8730
- Japan: +81.50.1790.0200
Palo Alto Networks has shared these findings with our fellow Cyber Threat Alliance (CTA) members. CTA members use this intelligence to rapidly deploy protections to their customers and to systematically disrupt malicious cyber actors. Learn more about the Cyber Threat Alliance.
Disclosure Timeline
- Sep. 12, 2023: Palo Alto Networks submitted a report Dual Privilege Escalation Chain to Google
Cloud Vulnerability Reward Program regarding this issue. - Sep. 13, 2023: Google Security Team accepted the report as a security issue.
- Oct. 24, 2023: Google Security Team accepted the report as a bypass of significant security controls.
- Nov. 24, 2023: Palo Alto Networks notified Google of the intention to publish an article and offered the opportunity for fixes and input on the article.
- Dec. 11, 2023: Google Security Team sent inputs on the article.
- Dec. 14, 2023: Google Security Team fixed the security issues.
Additional Resources
- Kubernetes Privilege Escalation: Excessive Permissions in Popular Platforms – Unit 42, Palo Alto Networks
- Container Escape to Shadow Admin: GKE Autopilot Vulnerabilities – Unit 42, Palo Alto Networks