DEV Community

Migrating from EKS Cluster Autoscaler to Karpenter

Karpenter is an open-source, high-performance Kubernetes cluster autoscaler developed by AWS. Amazon Elastic Kubernetes Service (EKS) provides a powerful and flexible platform for running containerized applications. A key component in ensuring your cluster scales appropriately is the use of an autoscaler. Traditionally, EKS has relied on the Cluster Autoscaler (CA) to dynamically adjust node capacity based on the demand for resources. However, a newer tool called Karpenter is gaining traction due to its enhanced capabilities and efficiency.

In this blog, we will guide you through the process of migrating from EKS Cluster Autoscaler to Karpenter and explore the benefits of making the switch.

Image description

Why Migrate from EKS Cluster Autoscaler to Karpenter?

1. Improved Node Scheduling Efficiency
Karpenter automatically optimizes the type, size, and number of nodes required for your workloads. Unlike the Cluster Autoscaler, which operates on a fixed set of predefined node groups, Karpenter provides greater flexibility by dynamically selecting the most appropriate instance types and scaling in real-time.

2. Faster Scaling
Karpenter scales faster than Cluster Autoscaler. It responds to changes in your cluster within seconds, compared to the Cluster Autoscaler’s typically slower reactions to scaling events. This is especially helpful for workloads that need to scale quickly in response to demand spikes.

3. Cost Optimization
Karpenter is designed to maximize the use of available resources by selecting the most cost-efficient instance types and ensuring that only the resources necessary for your workload are provisioned. This makes Karpenter particularly beneficial for cost-conscious organizations.

4. Simpler Configuration
With Karpenter, you don’t have to manage separate node groups. Karpenter automatically adjusts the instance size and types needed based on your workloads. It simplifies the configuration process, making it more developer-friendly.

Step-by-Step Guide to Migrating from EKS Cluster Autoscaler to Karpenter

Step 1: Pre-Requisites

Before you start the migration process, ensure that the following pre-requisites are met:

  • You are running an EKS Cluster.
  • You have kubectl access to your cluster.
  • You have AWS CLI configured with the necessary permissions to manage your EKS cluster and resources.
  • You are familiar with the basics of both Cluster Autoscaler and Karpenter.

Step 2 : Create an IAM Role

Image description

  • Create a role and select the EC2 service

Image description

Role Name : KarpenterNodeRole-ashish"
Image description

  • KarpenterNodeRole-ashish" role attached below policy
    • AmazonEKSWorkerNodePolicy
    • AmazonEKS_CNI_Policy
    • AmazonEC2ContainerRegistryReadOnly
    • AmazonSSMManagedInstanceCore

Image description

Step 3 : Create a Controller IAM Role

  • Create a role
    Image description

  • Select Web identity and provide identity provider name

Image description

  • Role Name : KarpenterControllerRole-ashish"

Image description

  • Modify Trust relationships
    • Add OIDC
    • Account ID
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/oidc.eks.ap-south-1.amazonaws.com/id/6B407ED9BFC9CE681546033D7AD4156A"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "oidc.eks.ap-south-1.amazonaws.com/id/6B407ED9BFC9CE681546033D7AD4156A:aud": "sts.amazonaws.com",
                    "oidc.eks.ap-south-1.amazonaws.com/id/6B407ED9BFC9CE681546033D7AD4156A:sub": "system:serviceaccount:karpenter:karpenter"
                }
            }
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode
  • Create a KarpenterControllerPolicy-ashish" policy
cat << EOF > controller-policy.json
{
    "Statement": [
        {
            "Action": [
                "ssm:GetParameter",
                "ec2:DescribeImages",
                "ec2:RunInstances",
                "ec2:DescribeSubnets",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeLaunchTemplates",
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceTypes",
                "ec2:DescribeInstanceTypeOfferings",
                "ec2:DeleteLaunchTemplate",
                "ec2:CreateTags",
                "ec2:CreateLaunchTemplate",
                "ec2:CreateFleet",
                "ec2:DescribeSpotPriceHistory",
                "pricing:GetProducts"
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "Karpenter"
        },
        {
            "Action": "ec2:TerminateInstances",
            "Condition": {
                "StringLike": {
                    "ec2:ResourceTag/karpenter.sh/nodepool": "*"
                }
            },
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "ConditionalEC2Termination"
        },
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-ashish",
            "Sid": "PassNodeIAMRole"
        },
        {
            "Effect": "Allow",
            "Action": "eks:DescribeCluster",
            "Resource": "arn:${AWS_PARTITION}:eks:${AWS_REGION}:${AWS_ACCOUNT_ID}:cluster/ashish",
            "Sid": "EKSClusterEndpointLookup"
        },
        {
            "Sid": "AllowScopedInstanceProfileCreationActions",
            "Effect": "Allow",
            "Resource": "*",
            "Action": [
            "iam:CreateInstanceProfile"
            ],
            "Condition": {
            "StringEquals": {
                "aws:RequestTag/kubernetes.io/cluster/ashish": "owned",
                "aws:RequestTag/topology.kubernetes.io/region": "${AWS_REGION}"
            },
            "StringLike": {
                "aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*"
            }
            }
        },
        {
            "Sid": "AllowScopedInstanceProfileTagActions",
            "Effect": "Allow",
            "Resource": "*",
            "Action": [
            "iam:TagInstanceProfile"
            ],
            "Condition": {
            "StringEquals": {
                "aws:ResourceTag/kubernetes.io/cluster/ashish": "owned",
                "aws:ResourceTag/topology.kubernetes.io/region": "${AWS_REGION}",
                "aws:RequestTag/kubernetes.io/cluster/ashish": "owned",
                "aws:RequestTag/topology.kubernetes.io/region": "${AWS_REGION}"
            },
            "StringLike": {
                "aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*",
                "aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*"
            }
            }
        },
        {
            "Sid": "AllowScopedInstanceProfileActions",
            "Effect": "Allow",
            "Resource": "*",
            "Action": [
            "iam:AddRoleToInstanceProfile",
            "iam:RemoveRoleFromInstanceProfile",
            "iam:DeleteInstanceProfile"
            ],
            "Condition": {
            "StringEquals": {
                "aws:ResourceTag/kubernetes.io/cluster/ashish": "owned",
                "aws:ResourceTag/topology.kubernetes.io/region": "${AWS_REGION}"
            },
            "StringLike": {
                "aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*"
            }
            }
        },
        {
            "Sid": "AllowInstanceProfileReadActions",
            "Effect": "Allow",
            "Resource": "*",
            "Action": "iam:GetInstanceProfile"
        }
    ],
    "Version": "2012-10-17"
}
EOF
Enter fullscreen mode Exit fullscreen mode
  • Attach KarpenterControllerPolicy-ashish to the controller role Image description

Step 4 : Add tags to subnets and security groups

  • Collect the subnet details
[ec2-user@ip-172-31-0-244 ~]$ aws eks describe-nodegroup --cluster-name "ashish" --nodegroup-name "ashish-workers" --query 'nodegroup.subnets' --output text
subnet-0a968db0a4c73858d        subnet-0bcd684f5878c3282        subnet-061e107c1f8ebc361
Enter fullscreen mode Exit fullscreen mode

Collect the security group

[ec2-user@ip-172-31-0-244 ~]$ aws eks describe-cluster --name "ashish" --query "cluster.resourcesVpcConfig.clusterSecurityGroupId" --output text
sg-0e0ac4fa44824e1aa
Enter fullscreen mode Exit fullscreen mode
  • Create tags for Security Groups and subnets.
aws ec2 create-tags --tags "Key=karpenter.sh/discovery,Value=ashish" --resources "sg-0e0ac4fa44824e1aa"
Enter fullscreen mode Exit fullscreen mode

Image description

Image description

Step 5 : Update aws-auth ConfigMap

We need to allow nodes that are using the node IAM role we just created to join the cluster. To do that we have to modify the aws-auth ConfigMap in the cluster.

kubectl edit configmap aws-auth -n kube-system
Enter fullscreen mode Exit fullscreen mode

you will need to add a section to the mapRoles that looks something like this.

- groups:
  - system:bootstrappers
  - system:nodes
  # - eks:kube-proxy-windows
  rolearn: arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-ashish
  username: system:node:{{EC2PrivateDNSName}}
Enter fullscreen mode Exit fullscreen mode

The full aws-auth configmap should have two groups. One for your Karpenter node role and one for your existing node group.

Step 6: Deploy Karpenter

helm install karpenter oci://public.ecr.aws/karpenter/karpenter  --namespace "karpenter" --create-namespace \
    --set "settings.clusterName=ashish" \
    --set "serviceAccount.annotations.eks\.amazonaws\.com/role-arn=arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerRole-ashish" \
    --set controller.resources.requests.cpu=1 \
    --set controller.resources.requests.memory=1Gi \
    --set controller.resources.limits.cpu=1 \
    --set controller.resources.limits.memory=1Gi
Enter fullscreen mode Exit fullscreen mode

output:

[ec2-user@ip-172-31-0-244 ~]$ helm install karpenter oci://public.ecr.aws/karpenter/karpenter  --namespace "karpenter" --create-namespace \
    --set "settings.clusterName=ashish" \
    --set "serviceAccount.annotations.eks\.amazonaws\.com/role-arn=arn:aws:iam::256050093938:role/KarpenterControllerRole-ashish" \
    --set controller.resources.requests.cpu=1 \
    --set controller.resources.requests.memory=1Gi \
    --set controller.resources.limits.cpu=1 \
    --set controller.resources.limits.memory=1Gi
Pulled: public.ecr.aws/karpenter/karpenter:1.1.1
Digest: sha256:b42c6d224e7b19eafb65e2d440734027a8282145569d4d142baf10ba495e90d0
NAME: karpenter
LAST DEPLOYED: Sat Jan 18 01:51:41 2025
NAMESPACE: karpenter
STATUS: deployed
REVISION: 1
TEST SUITE: None

[ec2-user@ip-172-31-0-244 ~]$  kubectl get po -A
NAMESPACE     NAME                         READY   STATUS    RESTARTS   AGE
karpenter     karpenter-7d4c9cbd84-vpbfw   1/1     Running   0          29m
karpenter     karpenter-7d4c9cbd84-zjwz4   1/1     Running   0          29m
kube-system   aws-node-889mt               2/2     Running   0          16m
kube-system   aws-node-rnzsk               2/2     Running   0          51m
kube-system   coredns-6c55b85fbb-4cj87     1/1     Running   0          54m
kube-system   coredns-6c55b85fbb-nxwrg     1/1     Running   0          54m
kube-system   kube-proxy-8jmbr             1/1     Running   0          16m
kube-system   kube-proxy-mt4nt             1/1     Running   0          51m
kube-system   metrics-server-5-4zwff       1/1     Running   0          54m
kube-system   cluster-autoscaler-lb7cw     1/1     Running   0          54m

Enter fullscreen mode Exit fullscreen mode

Step 7 : Create a Nodepool

We need to create a default NodePool so Karpenter knows what types of nodes we want for unscheduled workloads.

You can retrieve the image ID of the latest recommended Amazon EKS optimized Amazon Linux AMI with the following command

fetch AMI ID using command line:

aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.30/amazon-linux-2-arm64/recommended/image_id --query Parameter.Value --output text
aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.30/amazon-linux-2/recommended/image_id --query Parameter.Value --output text
aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.30/amazon-linux-2-gpu/recommended/image_id --query Parameter.Value --output text
Enter fullscreen mode Exit fullscreen mode

Note:

  • Please change instance type as per your requirment
  • This is unsafe for production workloads. Validate AMIs in lower environments before deploying them to production.
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["t","m"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      expireAfter: 720h # 30 * 24h = 720h
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m
####
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2 # Amazon Linux 2
  role: "KarpenterNodeRole-ashish" # replace with your cluster name
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "ashish" # replace with your cluster name
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "ashish" # replace with your cluster name
  amiSelectorTerms:
    - id: "${ARM_AMI_ID}"
    - id: "${AMD_AMI_ID}"
#   - id: "${GPU_AMI_ID}" # <- GPU Optimized AMD AMI 
#   - name: "amazon-eks-node-${K8S_VERSION}-*" # <- automatically upgrade when a new AL2 EKS Optimized AMI is released. 

Enter fullscreen mode Exit fullscreen mode

output:

[ec2-user@ip-172-31-0-244 ~]$ vim nodepool.yaml
[ec2-user@ip-172-31-0-244 ~]$ kubectl apply -f nodepool.yaml
nodepool.karpenter.sh/general-purpose created
ec2nodeclass.karpenter.k8s.aws/default created
Enter fullscreen mode Exit fullscreen mode

Increase the Nginx load.

  • Deploy a nginx
[ec2-user@ip-172-31-0-244 ~]$ kubectl create deploy nginx --image=nginx:1.7.8 -- replicas=2
deployment.apps/nginx created
[ec2-user@ip-172-31-0-244 ~]$ kubectl edit deployment nginx
deployment.apps/nginx edited
[ec2-user@ip-172-31-0-244 ~]$ kubectl get po -A
NAMESPACE     NAME                         READY   STATUS    RESTARTS   AGE
default       nginx-65757d685b-74h84       1/1     Running   0          78s
default       nginx-65757d685b-jzxrv       1/1     Running   0          78s
default       nginx-65757d685b-lqwf4       1/1     Running   0          3m11s
default       nginx-65757d685b-qfgcq       1/1     Running   0          78s
default       nginx-65757d685b-ssrrk       1/1     Running   0          78s
karpenter     karpenter-7d4c9cbd84-vpbfw   1/1     Running   0          25m
karpenter     karpenter-7d4c9cbd84-zjwz4   1/1     Running   0          25m
kube-system   aws-node-889mt               2/2     Running   0          11m
kube-system   aws-node-rnzsk               2/2     Running   0          46m
kube-system   coredns-6c55b85fbb-4cj87     1/1     Running   0          50m
kube-system   coredns-6c55b85fbb-nxwrg     1/1     Running   0          50m
kube-system   kube-proxy-8jmbr             1/1     Running   0          11m
kube-system   kube-proxy-mt4nt             1/1     Running   0          46m
kube-system   metrics-server-5-4zwff       1/1     Running   0          32m
kube-system   cluster-autoscaler-lb7cw     1/1     Running   0          35m
[ec2-user@ip-172-31-0-244 ~]$ kubectl get nodes
NAME                                            STATUS   ROLES    AGE   VERSION
ip-192-168-54-99.ap-south-1.compute.internal    Ready    <none>   51m   v1.30.8-eks-aeac579
ip-192-168-72-178.ap-south-1.compute.internal   Ready    <none>   17m   v1.30.8-eks-aeac579
Enter fullscreen mode Exit fullscreen mode

Increased load in Nginx

[ec2-user@ip-172-31-0-244 ~]$ kubectl edit deployment nginx
deployment.apps/nginx edited
[ec2-user@ip-172-31-0-244 ~]$ kubectl get po -A
NAMESPACE     NAME                         READY   STATUS    RESTARTS   AGE
default       nginx-65757d685b-5q9dn       1/1     Running   0          2m12s
default       nginx-65757d685b-6twr7       0/1     Pending   0          4s
default       nginx-65757d685b-74h84       1/1     Running   0          10m
default       nginx-65757d685b-7fqwh       0/1     Pending   0          4s
default       nginx-65757d685b-8s4vx       1/1     Running   0          4s
default       nginx-65757d685b-b46x9       1/1     Running   0          2m12s
default       nginx-65757d685b-b4vxx       1/1     Running   0          2m12s
default       nginx-65757d685b-c9xk2       0/1     Pending   0          4s
default       nginx-65757d685b-cfsg9       0/1     Pending   0          4s
default       nginx-65757d685b-cwcz4       0/1     Pending   0          4s
default       nginx-65757d685b-f9z6f       1/1     Running   0          3m38s
default       nginx-65757d685b-gprq7       0/1     Pending   0          4s
default       nginx-65757d685b-hcqlq       0/1     Pending   0          4s
default       nginx-65757d685b-jcd2b       0/1     Pending   0          4s
default       nginx-65757d685b-m6kbf       1/1     Running   0          3m38s
default       nginx-65757d685b-mvpcf       0/1     Pending   0          4s
default       nginx-65757d685b-nshbx       1/1     Running   0          2m12s
default       nginx-65757d685b-pt7fj       1/1     Running   0          2m12s
default       nginx-65757d685b-q6vnq       0/1     Pending   0          4s
default       nginx-65757d685b-qcx94       0/1     Pending   0          4s
default       nginx-65757d685b-qfgcq       1/1     Running   0          10m
default       nginx-65757d685b-sfhsn       0/1     Pending   0          4s
default       nginx-65757d685b-sj9vd       1/1     Running   0          3m38s
default       nginx-65757d685b-sk74g       0/1     Pending   0          4s
default       nginx-65757d685b-vptn5       1/1     Running   0          4s
karpenter     karpenter-7d4c9cbd84-74527   0/1     Pending   0          2m12s
karpenter     karpenter-7d4c9cbd84-zjwz4   1/1     Running   0          34m
kube-system   aws-node-rnzsk               2/2     Running   0          55m
kube-system   coredns-6c55b85fbb-4cj87     1/1     Running   0          59m
kube-system   coredns-6c55b85fbb-nxwrg     1/1     Running   0          59m
kube-system   kube-proxy-mt4nt             1/1     Running   0          55m
kube-system   metrics-server-5-4zwff       1/1     Running   0          54m
kube-system   cluster-autoscaler-lb7cw     1/1     Running   0          54m
Enter fullscreen mode Exit fullscreen mode
  • *New node created *
NAME                                            STATUS   ROLES    AGE   VERSION
ip-192-168-54-99.ap-south-1.compute.internal    Ready    <none>   57m   v1.30.8-eks-aeac579
ip-192-168-75-159.ap-south-1.compute.internal   Ready    <none>   95s   v1.30.8-eks-aeac579

Enter fullscreen mode Exit fullscreen mode

Image description

Step 8 :Remove Cluster Autoscalar

Now that karpenter is running we can disable the cluster autoscaler. To do that we will scale the number of replicas to zero.

kubectl scale deploy/cluster-autoscaler -n kube-system --replicas=0

Enter fullscreen mode Exit fullscreen mode

If you have a single multi-AZ node group, we suggest a minimum of 2 instances.

aws eks update-nodegroup-config --cluster-name "ashish" \
    --nodegroup-name "ashish-workers" \
    --scaling-config "minSize=2,maxSize=2,desiredSize=2"
Enter fullscreen mode Exit fullscreen mode

if you have multiple single-AZ node groups, we suggest a minimum of 1 instance each.

for NODEGROUP in $(aws eks list-nodegroups --cluster-name "ashish" \
    --query 'nodegroups' --output text); do aws eks update-nodegroup-config --cluster-name "ashish" \
    --nodegroup-name "ashish-workers" \
    --scaling-config "minSize=1,maxSize=1,desiredSize=1"
done

Enter fullscreen mode Exit fullscreen mode

Step 9 : Verify Karpenter

kubectl logs -f -n karpenter -c controller -l app.kubernetes.io/name=karpenter

Enter fullscreen mode Exit fullscreen mode

Pricing

EKS Management Fees:
`- Both EKS Cluster Autoscaler and Karpenter incur a $0.10 per hour fee for the EKS cluster management.

  • EKS Management Fee = $0.10 per hour

EC2 instance cost:

  • Assume you're running an instance type m5.large at $0.096 per hour for 5 nodes.
  • Cluster Autoscaler EC2 Cost = $0.096 * 5 * 24 hours = $11.52 per day
  • Karpenter EC2 Cost (optimized with Spot Instances): Assume a 60% discount on Spot pricing, so the cost would be $0.0384 per hour per instance.
    • Karpenter EC2 Cost = $0.0384 * 5 * 24 hours = $4.61 per day

Cost Comparison for 30 Days:

  • Cluster Autoscaler (with On-Demand EC2 instances):
    • $11.52 * 30 = $345.60 per month
  • Karpenter (with Spot Instances):
    • $4.61 * 30 = $138.30 per month

Total Cost Calculation (for one month):

  • Cluster Autoscaler Total:
    • EKS Fee ($0.10 * 24 hours * 30 days) = $72
    • EC2 Cost = $345.60
    • $72 + $345.60 = $417.60 per month
  • Karpenter Total:
    • EKS Fee ($0.10 * 24 hours * 30 days) = $72
    • EC2 Cost = $138.30
    • Total = $72 + $138.30 = $210.30 per month

Cost Savings with Karpenter:
By migrating to Karpenter, you can save approximately:

  • $417.60 - $210.30 = $207.30 per month

Conclusion

Migrating from EKS Cluster Autoscaler to Karpenter offers several benefits, including improved scaling speed, cost efficiency, and simplified management. By following the steps outlined in this blog, you should be able to successfully migrate your cluster to use Karpenter, enhancing both performance and scalability.

Remember, while Karpenter provides a more dynamic scaling solution, it’s important to continuously monitor your cluster to ensure it is optimizing resources effectively and making adjustments as your workloads evolve.

Ref:

  1. https://karpenter.sh/docs
  2. https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html

Top comments (0)