During a recent red-team engagement against a self-managed Kubernetes cluster hosted on E2E Networks in India, I observed that traditional signature-based EDRs failed to catch a container escape leveraging CVE-2024-21626. The attacker utilized a file descriptor leakage in runc to gain host-level access, a technique increasingly common in Docker-based infostealer worms. Standard audit logs showed the container started, but the subsequent fchdir syscalls targeting host-relative paths were invisible to the application-layer monitoring. This gap highlights why runtime security must move from the user space into the kernel.
Understanding Kubernetes Runtime Security
Runtime security focuses on the active execution phase of a container's lifecycle. While image scanning and static analysis catch known vulnerabilities in libraries, they cannot predict the behavior of a compromised process. We define runtime security as the continuous monitoring and active restriction of binary execution, network calls, and file system modifications once a Pod is in the Running state.
The primary challenge in Kubernetes is the shared kernel architecture. Unlike Virtual Machines (VMs) that provide hardware-level isolation, containers rely on Linux namespaces and cgroups. If an attacker identifies a syscall that bypasses these abstractions, they gain visibility into the underlying host. We use runtime security to bridge this visibility gap by hooking into the kernel directly, often referencing vulnerability data from the NIST NVD to prioritize patches.
Build-time vs. Runtime Security: The Visibility Gap
Build-time security is deterministic. We check for CVEs in apt packages or npm dependencies. However, runtime is dynamic. A "clean" image can still be used to download a malicious binary into a /tmp directory (if writable) and execute it. I have seen environments where kubectl exec was used to install nmap inside a production pod to scan the internal VPC. Build-time scanners will never see this.
Runtime security tools provide the "who, what, where, and when" of an incident. While build-time tools reduce the attack surface, runtime tools provide the defensive depth required to stop zero-day exploits. In the context of the DPDP Act 2023, Indian firms acting as "Data Fiduciaries" must prove they have taken "reasonable security safeguards." Static scanning alone does not satisfy the requirement for active breach prevention and auditability.
Common Runtime Threats and Attack Vectors
Most Kubernetes attacks follow a predictable pattern: Initial access (via a vulnerable web app, often exploiting flaws listed in the OWASP Top 10), execution of a reverse shell, and lateral movement. We frequently see curl | sh patterns used to pull post-exploitation toolkits. Another common vector is the exploitation of service account tokens mounted at /var/run/secrets/kubernetes.io/serviceaccount/token.
Container Escapes and Kernel Exploits
CVE-2022-0492 demonstrated how vulnerabilities in cgroups v1 allowed escape via the release_agent file. An attacker with CAP_SYS_ADMIN (often granted by overly permissive security contexts) can trigger a kernel callback that executes a script on the host. To detect this, we monitor mount syscalls targeting the cgroup filesystem.
Zero-Day Vulnerabilities and eBPF
eBPF (Extended Berkeley Packet Filter) has changed how we handle zero-days. Instead of waiting for a vendor patch, we can deploy a kernel-level probe to block specific syscall patterns. For CVE-2024-21626, we can monitor sys_openat2 and sys_fchdir to detect if a process is attempting to navigate into the host's root filesystem from a containerized environment.
The Role of the Container Runtime Interface (CRI)
The CRI (Containerd, CRI-O) is the bridge between the Kubelet and the actual container execution. Security at this layer is critical because the CRI manages the lifecycle of the container namespaces. By monitoring the CRI sockets (e.g., /run/containerd/containerd.sock), we can detect unauthorized container creations or modifications that bypass the Kubernetes API server.
Implementing the Principle of Least Privilege
We must start by hardening the securityContext of every deployment. Most developers default to running as root, which is a significant risk in Indian FinTech environments where a single compromised pod could lead to unauthorized access to UPI transaction logs.
apiVersion: v1 kind: Pod metadata: name: secure-service spec: securityContext: runAsNonRoot: true runAsUser: 1000 fsGroup: 2000 containers: - name: app image: my-app:v1 securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true
By setting readOnlyRootFilesystem: true, we stop 90% of common malware that attempts to write to /usr/bin or /etc. Any temporary data should be handled via emptyDir volumes with size limits.
Hardening the Underlying Host Operating System
Kubernetes security is only as strong as the node it runs on. In unmanaged clusters on providers like CtrlS or Netmagic, I often find nodes running unnecessary services like rpcbind or outdated ssh versions. We use CIS Benchmarks to audit host configurations.
- Disable SSH root login and use secure SSH access for teams to ensure all remote sessions are authenticated and logged.
- Ensure
auditdis configured to capture changes to/etc/kubernetes/manifests. - Use
seccompprofiles to restrict the syscalls available to the container. - Apply
AppArmororSELinuxprofiles to restrict file access, even for the root user.
Continuous Monitoring and Real-time Threat Detection
Monitoring must be granular. Standard Prometheus metrics tell you if a CPU is high, but they won't tell you that a python process just spawned a /bin/bash child process. We need kernel-level observability feeding into a centralized SIEM for real-time alerting. We use bpftool to verify which eBPF programs are currently loaded on a node to ensure our security agents are active.
$ sudo bpftool prog list 124: kprobe name tetragon_kprobe tag 4f8a8e12 gpl loaded_at 2024-05-20T10:15:22+0530 uid 0 xlated 512B jited 312B memlock 4096B
This output confirms that the Tetragon kprobe is active and monitoring the kernel. If this list is empty, your runtime security tool is likely failing to hook into the kernel, leaving you blind.
Implementing Effective Kubernetes Security Policies
Network policies are the primary defense against lateral movement. By default, Kubernetes allows all pods to talk to all other pods. In a microservices architecture, this is a disaster waiting to happen. If your frontend pod is compromised, the attacker can directly query your auth-db.
Leveraging Network Policies for Microsegmentation
We implement a "Default Deny" policy for every namespace. This forces developers to explicitly define which connections are allowed.
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all namespace: production spec: podSelector: {} policyTypes: - Ingress - Egress
After applying this, we add specific "Allow" rules. For Indian compliance, ensure that egress traffic to external APIs (like GST or Aadhar verification endpoints) is restricted to specific CIDR ranges or DNS names using a Service Mesh like Istio or Cilium.
Transitioning from PSP to Pod Security Standards (PSS)
Pod Security Policies (PSP) were deprecated in Kubernetes 1.21 and removed in 1.25. We now use the built-in Pod Security Admission (PSA) controller. It operates at three levels: Privileged, Baseline, and Restricted. I recommend enforcing Restricted on all non-system namespaces.
$ kubectl label --overwrite namespace prod-apps \ pod-security.kubernetes.io/enforce=restricted \ pod-security.kubernetes.io/enforce-version=v1.28
Role-Based Access Control (RBAC) Best Practices
RBAC misconfigurations are the leading cause of privilege escalation. Avoid using * in apiGroups or resources. I frequently see service accounts with cluster-admin privileges used for simple monitoring agents.
- Use
RoleandRoleBindinginstead ofClusterRolewhenever possible to limit the blast radius to a single namespace. - Audit permissions regularly using tools like
kubectl-who-can. - Remove the
system:unauthenticatedgroup from all bindings. - Ensure the
defaultservice account has no permissions andautomountServiceAccountToken: false.
Top Kubernetes Runtime Security Tools
We rely on eBPF-based tools because they provide deep visibility with minimal performance overhead. Unlike the old sysdig kernel modules, eBPF programs are verified by the kernel before execution, ensuring they cannot crash the host.
Open Source Solutions: Falco and Tetragon
Falco is the de-facto standard for runtime alerting. It uses a set of rules to detect suspicious activity. For example, to detect when a shell is run inside a container:
- rule: Terminal shell in container
desc: A shell was used as the entrypoint or spawned in a container condition: > spawned_process and container and shell_procs and not user_expected_terminal_shell_executions output: > A shell was spawned in a container (user=%user.name user_loginuid=%user.loginuid %container.info shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline terminal=%proc.tty container_id=%container.id image=%container.image.repository) priority: WARNING
Tetragon, part of the Cilium project, goes beyond alerting. It can perform real-time enforcement. Using a TracingPolicy, we can instruct the kernel to SIGKILL any process that attempts to access sensitive files like /etc/shadow or .ssh folders.
apiVersion: cilium.io/v1alpha1 kind: TracingPolicy metadata: name: "block-ssh-and-shadow-access" spec: kprobes: - call: "sys_openat" syscall: true args: - index: 1 type: "string" selectors: - matchArgs: - index: 1 operator: "Prefix" values: - "/etc/shadow" - "/root/.ssh/" matchActions: - action: Sigkill
Comparing Signature-based vs. Behavioral Detection Tools
Signature-based tools look for known malicious hashes or IP addresses. They are useful for blocking known botnets but useless against custom exploits. Behavioral tools (like those using eBPF) look for "anomalies" such as a web server process suddenly calling ptrace. In a Kubernetes environment, behavioral detection is superior because container workloads are typically highly predictable; any deviation is a high-fidelity signal of compromise.
Integrating Security Tools into the CI/CD Pipeline
Runtime security starts at the pipeline. We use falcoctl to manage rulesets as code. During the CI phase, we can validate our TracingPolicy YAMLs against the cluster schema.
Install Falco driver on the node
$ falcoctl driver install $ falcoctl driver load
Verify the driver is active
$ falco --version
For Indian organizations using Jenkins or GitHub Actions runners hosted on-premise, ensure that the runner itself is hardened. A compromised CI runner can inject malicious code into your production images, bypassing all runtime protections by "living off the land" with legitimate binaries.
Managing Secrets and Sensitive Data Safely
Under the DPDP Act 2023, the mishandling of personal data (PII) carries heavy penalties. Storing secrets in Kubernetes Secret objects is not enough, as they are only Base64 encoded. We must use a dedicated KMS (Key Management Service).
Integrating with External Vaults
We use the Secrets Store CSI Driver to mount secrets from HashiCorp Vault or AWS KMS directly as volumes. This ensures that secrets never touch the disk in unencrypted form.
Example: Fetching the Tetragon pod to observe events
$ TETRAGON_POD=$(kubectl get pods -n kube-system -l app.kubernetes.io/name=tetragon -o jsonpath='{.items[0].metadata.name}') $ kubectl exec -it $TETRAGON_POD -n kube-system -- tetra observe --namespace production
By observing the output of tetra observe, we can see real-time file access events. If a pod attempts to access a mounted secret volume it shouldn't, we can trigger an immediate alert to the SOC team.
The Future of Cloud-Native Security
The industry is moving toward "Security as Code" where the kernel itself enforces the business logic of the application. We are seeing a shift from reactive alerting to proactive kernel-level enforcement. As Indian cloud infrastructure matures, the reliance on eBPF for both networking (Cilium) and security (Tetragon/Falco) will become the standard architecture.
The next step for security researchers is to automate the generation of seccomp and AppArmor profiles based on actual container behavior. By running a container in a staging environment for 24 hours, we can capture every syscall it makes and generate a profile that blocks everything else. This "Zero Trust" approach at the syscall level is the only way to effectively mitigate the next generation of container escape exploits.
To begin auditing your own cluster's runtime visibility, run the following command to see if your nodes support the necessary BPF features:
$ bpftool feature probe | grep eBPF
