Gaining Visibility Within Container Clusters

10 min read

Executive Summary

A service mesh platform is a dedicated infrastructure layer that allows for the granular control of how applications share data. A standard use case could be to control the flow and rate of network traffic to a new version of a production web-based application. When the new web application is brought online, it is important to test the application to ensure it can handle a particular level of performance without failing. Service mesh platforms can be leveraged to facilitate how network traffic flows within a network and can be used to load balance the network traffic for a set of web-based applications.

However, service mesh platforms can also provide more insight into what has been a relative black hole for security practitioners within microservice environments: the dynamic monitoring of container processes and their network operations within Kubernetes (K8s) clusters. This blog will shed light on the threats facing cloud microservice architectures and how service mesh platforms like Istio, Linkerd and Consul can enhance the visibility of security incidents occurring within K8s clusters.

Development operations (DevOps) teams design containerized applications to be dynamically scalable, stateless, immutable and ephemeral, as is required to operate and maintain profitability within today’s evolving cloud computing landscape.

However, malicious attacker groups like TeamTNT, WatchDog and Nobelium (the group behind the SolarWinds Orion supply chain attack) have been witnessed directly targeting containerized applications. These attacks have resulted in the successful compromise of these ephemeral and immutable instances and have allowed the actors to escape compromised containers – and launch additional containers designed to perform other nefarious actions.

Sadly, security operations teams are often none the wiser to these activities, as visibility within the K8s cluster is frequently lacking.

TeamTNT is a prime example of threat actors targeting K8s applications, specifically in terms of their approach to Atlassian’s Confluence application. TeamTNT actors are actively using the exploit CVE-2021-26084, however, they are not the only actor group targeting this application either. As of the time of this writing, Shodan lists 976 individually internet-facing Atlassian Confluence servers around the world (see Figure 1), giving TeamTNT and other actor groups several opportunities to compromise cloud systems. While these servers are exposed, they may not be vulnerable to the given exploit.

Shodan results for Atlassian's Confluence: Total results - 976; Top countries: United States - 466, Germany - 142, Ireland - 78, China - 65, Australia - 50 — Figure 1. Shodan results for Atlassian’s Confluence.

The Atlassian Confluence example paints a vivid picture of active cryptojacking attacks against cloud K8s containers, at least from the perspective of TeamTNT. However, exposed Atlassian Confluence applications running within cloud containers only represent one application.

Can more popular containerized applications present similar risks? Researchers analyzed the top 20 downloaded container images as listed on Docker Hub. Notable in this analysis, Atlassian’s Confluence is not on that list. But little is known about whether attacks are being performed against any of the more popular and potentially more exposed cloud-based applications such as web service applications or database applications. This is the foundation for stating that more visibility is required in order for security resources to detect the threats facing microservice infrastructure.

Prisma Cloud’s Defender agent offers one step toward the type of visibility discussed.

Service Mesh Architecture

Service mesh platforms, like those provided by Istio, Linkerd and Consul, have been primarily used by DevOps to provide a means to secure the network communication pathways between interconnected microservices hosted within a K8s cluster. Service mesh platforms provide features such as reliability, security and observability and make the process of organizing, monitoring and securing each individual microservice more maintainable and manageable, and thus more secure.

Large enterprise organizations already use service mesh architecture within their K8s environments. This can be seen from HG Insights, which records and displays a relatively anonymous view of the various technologies that organizations employ. As of this writing, the data shows:

Consul – 5,588 organizations
Istio – 3,473 organizations
Linkerd – 643 organizations

While organizations are using service mesh architectures, it is unclear if or how these organizations are integrating the telemetry insight these mesh architectures offer with their already existing security operations center (SOC) tool sets.

What is known is that service mesh platforms can provide a pathway for security teams to gain visibility within their K8s clusters, by leveraging the same features already used to mirror traffic while performing load balancing operations.

Before we dive into how to leverage service meshes for more insight, I want to talk about why we as an industry need to gain additional insight in the first place.

The Wider Ecosystem

Service mesh platforms can assist security teams in monitoring the actions of a container after it has been compromised. So how can an ephemeral, immutable and stateless container be compromised?

TeamTNT targeting Atlassian Confluence applications hosted within K8s clusters using known exploits targeting that application is a great example. However, are there other known exploits that can be leveraged to target more popular containerized applications?

The following figures illustrate the number of known exploits associated with each of the top 20 containerized applications (Figure 2), and the number of known exposed instances associated with that application (Figure 3). Exploit statistics were pulled from ExploitDB, and the number of exposed instances was determined from Shodan. The listed values for each finding were as of the date of this writing.

ExploitDB number represented by blue lines, TotalShodan number represented by red lines. Chart covers alpine, ubuntu, nginx, python, busybox, postres, redis, https, node, mongo, mysql, memcached, traefik, mariadb, rappitmq, openjdk, golang, wordpress, centos and influxdb — Figure 2. The number of known exploits and the number of known exposed instances associated with each of the top 20 containerized applications.

In total, 1,989 exploits are listed within the top 20 containerized applications alone. (Additional research was reported on the number of vulnerabilities within the infrastructure as code (IaC) templates and was published in the 2H 2021 Unit 42 Cloud Threat Report.)

Of these top 20 applications, 13 contain at least one known exploit and four contain at least 90 (see Table 1). It is extremely important to ensure that the latest and most up-to-date applications are deployed within K8s clusters to ensure they are not vulnerable to known exploits.

Container Image	Number of Vulnerabilities	Exploit Versions Identified in This Analysis
WordPress	1321	0.6 < 5.7.x
Apache	279	0.8.x < 1.0x
MySQL	158	3.20.32 < 6.0.9
Ubuntu	90	5.10 < 19.10
Python	39	1.5.2 < 3.5

^{Table 1. Top five downloaded container images with the highest number of known exploits.}

Figure 3 presents another unique view of the state of known exposed instances for each of the top 20 images. The chart presents the number of instances exposed in terms of the cloud service provider environments they are hosted upon. Some interesting takeaways: Amazon Web Services hosts the vast majority of exposed python, httpd and node instances. Google Cloud hosts the majority of exposed PostgreSQL and MySQL instances. Perhaps the most interesting finding is that DigitalOcean hosted a significant portion of the publicly exposed instances in nearly every one of the top 20 analyzed container images, as of April 4, 2022. It is currently unclear why DigitalOcean users appear to expose more applications compared to other CSP users. But it could be due to several factors including price, which see users spin up cloud instances faster and more cheaply. This could attract a user base that may also neglect security best practices.

While some applications are designed to be exposed to the public – namely web services like nginx, httpd, etc. – database applications such as postgres, mongo, mysql, mariadb, etc should never be directly exposed to the public as a matter of security best practice. Users and cloud architects can dramatically improve their cloud security by simply ensuring that only those applications that require public exposure are allowed to do so, while preventing other applications from being exposed.

If an organization hosts cloud instances on any cloud platform, Unit 42 researchers recommend that these organizations perform an audit of their cloud resources to ensure users of those platforms are adequately securing that cloud infrastructure. Researchers recommend that users implement a Cloud Native Application Protection Platform (CNAPP) designed to detect and mitigate unwanted cloud-based applications from being publically exposed.

Chart shows the distribution of exposed instances for alpine, ubuntu, nginx, python, busybox, posgres, redis, httpd, node, mongo, mysql, memcached, traefik, mariadb, rabbitmq, openjdk, golang and wordpress. CSPs include AWS, GoogleCloud, Azure, Alibaba, Oracle, IBMCloud and Digital Ocean. — Figure 3. Distribution of exposed instances by cloud service provider.

Hypothetical questions for the readers: If Atlassian’s Confluence is being widely targeted by malicious actors and has yet to crack the top 50 of most popular container images, what other containerized applications, which maintain several hundred known vulnerabilities, are being targeted by malicious actors? If and when these containers are compromised, are they being monitored for malicious operations by security teams or their tools?

Example Security-Focused Service Mesh Architecture

So what does a security-focused service mesh architecture look like? See Figure 4. This architecture leverages the network traffic monitoring and mirror functionalities which are packaged within service mesh architectures to allow for traffic to be mirrored to standard security tools that are common within SOCs. This provides network traffic analysis capabilities within K8s clusters.

Additionally, by using cloud-native technologies like a Cloud Workload Protection Platform (CWPP), e.g. Prisma Cloud Compute, users can monitor runtime container operations within their K8s clusters, to detect and mitigate malicious processes, network traffic and API connections. Cloud-native technologies like CWPPs can work hand in hand with service mesh platforms, extending the service mesh’s capabilities to gather and collect auditable events from containers directly from the container’s host system without generating a significant amount of processing resource overhead.

Example of how a security-focused service mesh architecture can increase visibility within K8s clusters. Image shows relationships of cloud service provider (CSP virtual private network, internet gateway, load balancer), to Kubernetes and a Service Mesh Platfroom (IngressGateway, Network Policy, Destination Rule, RBAC, Load Balancer. It also shows how security tools and production apps could fit into the picture. — Figure 4. Example of a security-focused service mesh architecture.

Implementing a Security-Focused Service Mesh

CWPPs like Prisma Cloud’s Compute provide the functionality of process monitoring within a K8s cluster and also incorporate a robust shift-left scanning functionality for DevOps continuous integration and continuous development (CI/CD) pipelines. Implementing shift-left scanning practices will dramatically improve an organization’s security risk exposure by making the process of compromising the organization’s cloud infrastructure more difficult for would-be attackers.

However, organizations should not be lulled into a false sense of security by thinking that scanning alone will prevent compromise post-deployment. Scanning alone can only assist in making the cloud infrastructure more difficult to initially compromise. It does not provide visibility, monitoring capabilities or alerting after a successful compromise or during an active security incident.

This is where a service mesh security solution comes into play. The service mesh can increase visibility and enhance the security of microservice infrastructure. It can work in tandem with the intelligence gathered from CWPP and cloud security posture management (CSPM) solutions.

Where CWPP products like Prisma Cloud Compute provide runtime monitoring in a container, firewall as a platform solution (FWaaPs) like Palo Alto Networks CN-Series provides URL filtering and Layer 7 detection mechanisms. A service mesh solution is able to integrate with both of these solutions and help increase security protection by providing mutual transport layer security (mTLS) encryption between microservices as well as the ability to mirror network traffic to dedicated network traffic analysis applications, which allow for greater insight into network-based attacks targeting microservices.

Runtime Monitoring

A CWPP runtime monitoring application or agent, e.g. Prisma Cloud’s Defender agent, is installed within each K8s cluster and sends the auditable data processed by the containers, and reports that data back to a centralized security application.

There are a number of security tools that are not cloud-native and may not be able to be installed on the K8s cluster itself, such as security information and event management (SIEM) or endpoint detection and response (EDR) agents, or various open-source log collection tools. If non-cloud-native tools such as these are being used, there is still a good chance these tools can be installed upon a cloud-based virtual machine, and most service mesh platforms will allow for a sidecar application to be installed. Sidecars are common architectural components within service mesh platforms and allow a single container to collect telemetry from the containers within a K8s cluster. Sidecars traditionally have small resource requirements – on average < 1 vCPU and < 40MB RAM. The sidecars do not need a significant amount of resources as they are solely relaying container data back to a centralized security application. For more information on sidecar performance see the Istio sidecar and Consul server performance metrics.

To be clear, any tool that provides runtime monitoring will incur a resource utilization requirement. However, the amount of security visibility gathered from the container operations is a key incentive, especially when considering the need to gather security incident data in the event of a compromise.

This level of insight could also allow internal threat hunters to detect the next wave of cutting-edge cloud-focused malware targeting your organization’s cloud infrastructure. Having detailed runtime monitoring capabilities provides a defensive measure to ensure cloud infrastructure remains secure, dynamic and scalable.

Network Traffic Monitoring

Network traffic monitoring is where a service mesh platform truly shines. Service mesh platforms dramatically increase the security hygiene of a K8s cluster by providing mTLS for all microservices running within the cluster. This means that all inter-network communications between services within the cluster will be encrypted.

Additionally, service mesh platforms provide the ability to customize the traffic load balancing operations within a K8s cluster itself by means of an integrated sidecar proxy. Instead of load-balancing network traffic at the CSP level, service mesh platforms can load-balance network traffic within the cluster itself, allowing DevOps teams to roll out new versions of applications incrementally. It also allows them to perform A/B testing without requiring redundant K8s clusters to be created or complex cloud platform load balancing operations to be performed.

K8s cluster network monitoring via service mesh platforms can also provide a needed security operation. Using the service mesh’s native Layer 7 visibility, the mesh can mirror any or all traffic to one or more systems, namespaces, or entire K8s clusters.

The ability to mirror network traffic based upon any layer of the open systems interconnection (OSI) model allows for the integration of security monitoring tools. Integrating URL filtering and DNS monitoring services can detect and prevent network traffic injection attacks, man in the middle (MITM), distributed denial of service (DDoS) or other network-based attacks. Furthermore, the service mesh platform provides an avenue to integrate CSP identity and access management (IAM) tools, allowing security teams to integrate access controls to specific microservices within a K8s cluster. By using the same IAM policies designated for user and service accounts within the hosted cloud platform, the security team can manage user access within the K8s cluster at a more granular level.

Conclusion

Service mesh platforms like Consul, Istio, and Linkerd are paramount to maintaining security best practices within containerized environments. However, their role within the security-focused operations in cloud environments has not been fully developed. Specifically, service mesh platforms can integrate network monitoring tools across all K8s pods and namespaces into a single third-party security-focused application, which dramatically eases security monitoring tasks for security teams.

Prisma Cloud’s Defender agent can be installed on K8s clusters allowing security teams to monitor the runtime operations of containers hosted on a K8s cluster. Service mesh platforms focus on the network mirroring and monitoring components of data within the K8s cluster. With the ability to monitor, investigate and respond to alerts centered around network traffic within K8s clusters, service mesh platforms provide much-needed visibility into K8s clusters. This visibility can shine a light on the threats that face cloud microservices today. This gives security operation teams the ability to truly monitor and respond to threats facing cloud infrastructure.