DEV Community

devops
devops

Posted on

Kubernetes Cost-Saving Secrets: A 50% Workload Cost Reduction Story

Kubernetes scalling and cost optimization

In my current role, we encountered significant latency issues in our API responses during peak traffic. Upon investigation, we identified that the bottleneck was the system's inability to handle the increased load efficiently.
To address this, I implemented Karpenter, an open-source Kubernetes cluster autoscaler, to dynamically scale nodes based on workload demands. This solution not only resolved the latency issue by ensuring sufficient resources during high-traffic periods but also optimized resource usage, leading to significant cost savings during low-traffic times.

What is Karpenter?
Karpenter is a CNCF (Cloud Native Computing Foundation) project designed to dynamically provision and scale Kubernetes nodes based on workload demands.

How Karpenter Works 

Karpenter uses Custom Resource Definitions (CRDs) and cloud provider APIs to dynamically provision and scale nodes in Kubernetes clusters. Here's a step-by-step explanation of how Karpenter operates, illustrated with configuration examples:
Custom Resource Definitions (CRDs) are a powerful tool in Kubernetes that allows you to extend the Kubernetes API by defining your own resources. This flexibility enables complex automation, customized workloads.

1. Installation

Install Karpenter using Helm or YAML manifests. For example:

helm repo add karpenter https://charts.karpenter.sh
helm install karpenter karpenter/karpenter --namespace karpenter --create-namespace \
  --set controller.clusterName=<cluster-name> \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=<karpenter-role-arn> \
  --set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile \
  --set settings.aws.clusterEndpoint=<cluster-endpoint>
Enter fullscreen mode Exit fullscreen mode

2. Provisioner Configuration

The core of Karpenter's functionality lies in the Provisioner. This CRD defines scaling policies, instance types, zones, and other preferences. Here's an example configuration:
Provisioner YAML

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  # Limit the maximum number of nodes Karpenter can provision
  limits:
    resources:
      cpu: 1000
  # Define the node lifecycle type
  requirements:
    - key: "karpenter.k8s.aws/instance-type"
      operator: In
      values: ["m5.large", "m5.xlarge"]
    - key: "topology.kubernetes.io/zone"
      operator: In
      values: ["us-east-1a", "us-east-1b"]
  provider:
    instanceProfile: "KarpenterNodeInstanceProfile"
    subnetSelector:
      kubernetes.io/cluster/<cluster-name>: "owned"
    securityGroupSelector:
      karpenter.sh/discovery: "<cluster-name>"
  ttlSecondsAfterEmpty: 30
Enter fullscreen mode Exit fullscreen mode

Key Configurations:

  1. limits.resources: Sets resource limits for scaling (e.g., maximum CPUs).
  2. requirements: Specifies node preferences, such as instance types or zones.
  3. provider: Configures AWS-specific settings like subnets and security groups.
  4. ttlSecondsAfterEmpty: Automatically terminates idle nodes after 30 seconds.

3. Triggering Node Provisioning

Karpenter observes unschedulable pods in the cluster. For example:
Pod Spec

apiVersion: v1
kind: Pod
metadata:
  name: compute-intensive
spec:
  containers:
    - name: busybox
      image: busybox
      resources:
        requests:
          memory: "512Mi"
          cpu: "1"
Enter fullscreen mode Exit fullscreen mode

When this pod cannot be scheduled due to insufficient resources, Karpenter:

  1. Detects the event.
  2. Matches the pod requirements with the Provisioner configuration.
  3. Launches a new node that meets the criteria (e.g., m5.large in us-east-1a).

4. Scaling Down Idle Nodes

  1. Karpenter continuously monitors cluster utilization. When nodes are no longer required:
  2. It consolidates workloads to fewer nodes.
  3. Terminates underutilized nodes based on ttlSecondsAfterEmpty or custom policies.

5. Observing Metrics and Logs

Monitor Karpenter using tools like Prometheus or CloudWatch. Example commands:
Check node provisioning:
kubectl get nodes

View Karpenter logs:
kubectl logs -n karpenter deploy/karpenter-controller

Conclusion

Karpenter simplifies dynamic scaling in Kubernetes clusters. By using real-time configuration files, it can:
Match workload demands.
Optimize resource usage.
Reduce costs.
Minimize operational overhead.

Its flexibility allows you to adapt quickly to changing application requirements, ensuring high availability and performance.

🚀 Ready to Master Kubernetes?

Take your Kubernetes journey to the next level with the Master Kubernetes: Zero to Hero course! 🌟 Whether you’re a beginner or aiming to sharpen your skills, this hands-on course covers:

✅ Kubernetes Basics — Grasp essential concepts like nodes, pods, and services.
✅ Advanced Scaling — Learn HPA, VPA, and resource optimization.
✅ Monitoring Tools — Master Prometheus, Grafana, and AlertManager.
✅ Real-World Scenarios — Build production-ready Kubernetes setups.

🎓 What You’ll Achieve

💡 Confidently deploy and manage Kubernetes clusters.
🛡️ Secure applications with ConfigMaps and Secrets.
📈 Optimize and monitor resources for peak performance.

🔥 Start Learning Now: Join the Master Kubernetes Course

Don’t miss your chance to become a Kubernetes expert! 💻✨

Top comments (0)