Karina Babcock for Causely

Posted on Dec 2, 2024

Tackling CPU Throttling in Kubernetes for Better Application Performance

#kubernetes #cloudnative #containers #containerapps

CPU throttling is a frequent challenge in containerized environments, particularly for resource-intensive applications. It happens when a container surpasses its allocated CPU limits, prompting the scheduler to restrict CPU usage. While this mechanism ensures fair resource sharing, it can significantly impact performance if not properly managed. CPU throttling can be a major obstacle for applications like web APIs, video streaming platforms, and gaming servers. Addressing this issue involves two key steps: identifying throttling and implementing effective solutions.

What is CPU Throttling?

CPU throttling in containers occurs due to resource constraints set by control groups (cgroups). Kubernetes and other container orchestrators rely on cgroups to enforce resource limits. When a container attempts to use more CPU than its assigned quota, it gets throttled, delaying execution of tasks. (When containers have CPU limits defined, they will be converted to a cgroup CPU quota.)

Working on the vendor side for over a decade, I have seen the impact CPU throttling can have on different services across many industries. Here are three top-of-mind examples, both from my days at Turbonomic and from recent conversations with customers at Causely:

Financial Systems

Example: A stock trading platform uses containers to handle real-time market data feeds and execute trades. Throttling during peak trading hours delays data processing, potentially causing missed opportunities or incorrect order placements.
Impact: Missed deadlines for transaction processing.

Gaming Servers

Example: Online multiplayer games hosted in containers experience throttling, leading to delayed responses (lag) during gameplay. Players may experience slow rendering of in-game actions or disconnections during high traffic.
Impact: Latency and poor user experience.

Video Streaming Platforms

Example: A video-on-demand service runs encoding jobs in containers to transcode videos. Throttling increases encoding times, leading to delayed content availability or poor streaming quality for users.
Impact: Degraded video quality and buffering issues.

How to Identify CPU Throttling

It’s often difficult to catch CPU throttling because it can happen even when the host CPU usage is low. It’s critical to have the right level of monitoring set up in order to see CPU throttling when it happens, or even better, before it becomes a problem.

Monitor Your cgroup Metrics

Linux cgroups provide detailed metrics about CPU usage and throttling. Look for the cpu.stat file within the container’s cgroup directory (usually under /sys/fs/cgroup):
Within the cpu.stat file there are three key metrics:

nr_throttled: Number of times the container was throttled.
throttled_time: Total time spent throttled – which I believe is in nanoseconds
nr_periods: Total CPU allocation periods.

Example:
cat /sys/fs/cgroup/cpu/cpu.stat

Output:

nr_periods 12345
nr_throttled 543
throttled_time 987654321

If nr_throttled or throttled_time is high relative to nr_periods, then you have CPU throttling on your container.

Monitor Container Orchestration Metrics

If you’re running Kubernetes, you can use the kubectl top pod command to get metric data on the highest utilized pods. Try the command below to get metrics for a pod and all the associated containers:
kubectl top pod --containers

This is a very manual process, and you will need to compare the CPU usage against the defined limits in the pod’s resource config. If you run a describe command on the pod it will show you this information. This also means you will need to know which pod is having an issue on it. Usually when issues arise on an application it’s going to take some time to drill down to a component that might be performing poorly. You will need the Kubernetes metric server to be installed in order to run commands like kubectl top but being able to access metrics like container_cpu_cfs_throttled_periods_total and container_cpu_cfs_periods_total offer valuable insights into CPU usage and throttling.

Application Performance Metrics

Although they can be expensive, application performance monitoring (APM) tools provide invaluable insights into CPU throttling, offering detailed visibility that can help uncover the issue. These tools can often track throttling over time, identify exactly when it first occurred, and, in some cases, even predict future throttling trends based on usage patterns. Many organizations use a combination of monitoring tools to get a comprehensive view of their systems. APM tools also highlight the symptoms of CPU throttling, which may manifest as:

Prolonged request durations, leading to slower application response times.
Decreased throughput, resulting in fewer transactions or tasks processed within a given timeframe.
Irregular CPU usage patterns, which can signal performance instability or inefficiencies.

By combining the capabilities of APM tools with metrics collected from Kubernetes, teams can proactively address CPU throttling and ensure optimal application performance.

Best Practices to Manage CPU Throttling in Kubernetes

There are many ways to fix CPU throttling and even a few ways you can prevent it. Most root causes of CPU throttling are overcommitted nodes, or misconfigured CPU limits. Below are some ways to fix CPU throttling when it occurs and some best practices to avoid it in the future.

Adjust CPU Limits

Update resource limits in your container or pod configuration. Kubernetes resource specs can be updated like in the example below. Usually what I see from the customers I have worked with is they will set the limit just above the peak usage in the last 30, 60, or even 90 days. For non-critical workloads I have seen a few companies set this limit to 80% of max usage, and a few companies use more advanced techniques like calculating percentiles:

resources:
  requests:
    cpu: "500m"
  limits:
    cpu: "1000m"

Increase the limits.cpu value to reduce throttling frequency.
Set requests.cpu to ensure better performance during contention. Note that if you do not set the request, Kubernetes will automatically set the request to the limit.

Use Autoscaling like Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler (HPA) in Kubernetes helps address CPU throttling by dynamically adjusting the number of pods in a deployment based on real-time resource usage. Resources like CPU and memory are monitored, and when certain thresholds are met, HPA kicks in to provision more pods. In more idle periods it will also scale down the number of pods to help you run more efficiently. By distributing the workload across more pods, HPA reduces the CPU demands on individual pods, thereby mitigating CPU throttling.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 80

In this example, if average CPU utilization across all pods exceeds 80% then it will add more pods as necessary within the bounds of 2-10 (Min and Max Replicas).

Analyze Node Resource Allocation

Check overall CPU availability on nodes using a describe command:
kubectl describe node <node-name>
Ensure nodes aren’t overcommitted. Use taints and tolerations to control scheduling and ensure high-priority workloads run on dedicated nodes. Nodes that are overcommitted run the risk of not having available CPUs. If containers’ requests are higher than the node CPU availability, you are going to run into scheduling problems. Or if the limits are set too high relative to CPU availability on the node and the workload suddenly increases, then you are going to have contention.

Tweak CPU CFS Settings

Containers use the Completely Fair Scheduler (CFS) by default. The CFS in Kubernetes is a mechanism inherited from the Linux kernel that enforces CPU usage limits on containers. It works by using two key parameters from Linux cgroups: cpu.cfs_quota_us and cpu.cfs_period_us. These parameters allow Kubernetes to control the amount of CPU time a container can use over a specific period:

cpu.cfs_quota_us: Maximum microseconds of CPU time allowed per period.
cpu.cfs_period_us: Length of a scheduling period in microseconds.

To prevent CPU throttling: Increase cpu.cfs_quota_us to provide more CPU time:
echo 200000 > /sys/fs/cgroup/cpu/cpu.cfs_quota_us

I have seen this create issues before though so be careful with this adjustment as it can lead to overcommitment. In other words, if too many tasks are scheduled and you increase the amount of time a container can use the CPU, then it will create delays and throttling. Start by playing around with this in Dev or Test clusters before you make any changes to prod… duh.

Use CPU Pinning

This is more of an edge case, but instead of using CPU shares and limits, pin containers to specific CPUs for predictable performance. The Kubernetes CPU Manager controls how CPUs are allocated to containers. The enable CPU pinning, the static CPU manager policy must be used, which provides exclusive CPU allocation. Just enable the policy in the Kubelet configuration file:
cpuManagerPolicy: static
With the flag set to “static,” containers are allocated exclusive CPUs on the nodes in a cluster. Kubernetes assigns the container specific CPUs and it runs only on those cores. The big challenges with CPU pinning are overhead and scalability. When you manage pinned workloads it requires detailed planning to avoid fragmentation and underutilization. CPU pinning is good for workloads that are sensitive to CPU throttling but not ideal for volatile and dynamic workloads.

CPU Throttling is a Double-Edged Sword in Kubernetes

While CPU throttling plays a crucial role in resource management and stability, it can also hinder application performance if not managed correctly. By understanding how CPU throttling works and implementing best practices, you can optimize your Kubernetes environment, ensuring efficient resource use and enhanced application performance. As Kubernetes continues to grow and evolve, keeping a close eye on resource management will be key to maintaining robust and responsive applications.

Forem