Karpenter is an open-source, high-performance Kubernetes cluster autoscaler developed by AWS. Amazon Elastic Kubernetes Service (EKS) provides a powerful and flexible platform for running containerized applications. A key component in ensuring your cluster scales appropriately is the use of an autoscaler. Traditionally, EKS has relied on the Cluster Autoscaler (CA) to dynamically adjust node capacity based on the demand for resources. However, a newer tool called Karpenter is gaining traction due to its enhanced capabilities and efficiency.
In this blog, we will guide you through the process of migrating from EKS Cluster Autoscaler to Karpenter and explore the benefits of making the switch.
Why Migrate from EKS Cluster Autoscaler to Karpenter?
1. Improved Node Scheduling Efficiency
Karpenter automatically optimizes the type, size, and number of nodes required for your workloads. Unlike the Cluster Autoscaler, which operates on a fixed set of predefined node groups, Karpenter provides greater flexibility by dynamically selecting the most appropriate instance types and scaling in real-time.
2. Faster Scaling
Karpenter scales faster than Cluster Autoscaler. It responds to changes in your cluster within seconds, compared to the Cluster Autoscaler’s typically slower reactions to scaling events. This is especially helpful for workloads that need to scale quickly in response to demand spikes.
3. Cost Optimization
Karpenter is designed to maximize the use of available resources by selecting the most cost-efficient instance types and ensuring that only the resources necessary for your workload are provisioned. This makes Karpenter particularly beneficial for cost-conscious organizations.
4. Simpler Configuration
With Karpenter, you don’t have to manage separate node groups. Karpenter automatically adjusts the instance size and types needed based on your workloads. It simplifies the configuration process, making it more developer-friendly.
Step-by-Step Guide to Migrating from EKS Cluster Autoscaler to Karpenter
Step 1: Pre-Requisites
Before you start the migration process, ensure that the following pre-requisites are met:
- You are running an EKS Cluster.
- You have kubectl access to your cluster.
- You have AWS CLI configured with the necessary permissions to manage your EKS cluster and resources.
- You are familiar with the basics of both Cluster Autoscaler and Karpenter.
Step 2 : Create an IAM Role
- Create a role and select the EC2 service
Role Name : KarpenterNodeRole-ashish"
- KarpenterNodeRole-ashish" role attached below policy
- AmazonEKSWorkerNodePolicy
- AmazonEKS_CNI_Policy
- AmazonEC2ContainerRegistryReadOnly
- AmazonSSMManagedInstanceCore
Step 3 : Create a Controller IAM Role
- Role Name : KarpenterControllerRole-ashish"
- Modify Trust relationships
- Add OIDC
- Account ID
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/oidc.eks.ap-south-1.amazonaws.com/id/6B407ED9BFC9CE681546033D7AD4156A"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.ap-south-1.amazonaws.com/id/6B407ED9BFC9CE681546033D7AD4156A:aud": "sts.amazonaws.com",
"oidc.eks.ap-south-1.amazonaws.com/id/6B407ED9BFC9CE681546033D7AD4156A:sub": "system:serviceaccount:karpenter:karpenter"
}
}
}
]
}
- Create a KarpenterControllerPolicy-ashish" policy
cat << EOF > controller-policy.json
{
"Statement": [
{
"Action": [
"ssm:GetParameter",
"ec2:DescribeImages",
"ec2:RunInstances",
"ec2:DescribeSubnets",
"ec2:DescribeSecurityGroups",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeInstances",
"ec2:DescribeInstanceTypes",
"ec2:DescribeInstanceTypeOfferings",
"ec2:DeleteLaunchTemplate",
"ec2:CreateTags",
"ec2:CreateLaunchTemplate",
"ec2:CreateFleet",
"ec2:DescribeSpotPriceHistory",
"pricing:GetProducts"
],
"Effect": "Allow",
"Resource": "*",
"Sid": "Karpenter"
},
{
"Action": "ec2:TerminateInstances",
"Condition": {
"StringLike": {
"ec2:ResourceTag/karpenter.sh/nodepool": "*"
}
},
"Effect": "Allow",
"Resource": "*",
"Sid": "ConditionalEC2Termination"
},
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-ashish",
"Sid": "PassNodeIAMRole"
},
{
"Effect": "Allow",
"Action": "eks:DescribeCluster",
"Resource": "arn:${AWS_PARTITION}:eks:${AWS_REGION}:${AWS_ACCOUNT_ID}:cluster/ashish",
"Sid": "EKSClusterEndpointLookup"
},
{
"Sid": "AllowScopedInstanceProfileCreationActions",
"Effect": "Allow",
"Resource": "*",
"Action": [
"iam:CreateInstanceProfile"
],
"Condition": {
"StringEquals": {
"aws:RequestTag/kubernetes.io/cluster/ashish": "owned",
"aws:RequestTag/topology.kubernetes.io/region": "${AWS_REGION}"
},
"StringLike": {
"aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*"
}
}
},
{
"Sid": "AllowScopedInstanceProfileTagActions",
"Effect": "Allow",
"Resource": "*",
"Action": [
"iam:TagInstanceProfile"
],
"Condition": {
"StringEquals": {
"aws:ResourceTag/kubernetes.io/cluster/ashish": "owned",
"aws:ResourceTag/topology.kubernetes.io/region": "${AWS_REGION}",
"aws:RequestTag/kubernetes.io/cluster/ashish": "owned",
"aws:RequestTag/topology.kubernetes.io/region": "${AWS_REGION}"
},
"StringLike": {
"aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*",
"aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*"
}
}
},
{
"Sid": "AllowScopedInstanceProfileActions",
"Effect": "Allow",
"Resource": "*",
"Action": [
"iam:AddRoleToInstanceProfile",
"iam:RemoveRoleFromInstanceProfile",
"iam:DeleteInstanceProfile"
],
"Condition": {
"StringEquals": {
"aws:ResourceTag/kubernetes.io/cluster/ashish": "owned",
"aws:ResourceTag/topology.kubernetes.io/region": "${AWS_REGION}"
},
"StringLike": {
"aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*"
}
}
},
{
"Sid": "AllowInstanceProfileReadActions",
"Effect": "Allow",
"Resource": "*",
"Action": "iam:GetInstanceProfile"
}
],
"Version": "2012-10-17"
}
EOF
- Attach KarpenterControllerPolicy-ashish to the controller role
Step 4 : Add tags to subnets and security groups
- Collect the subnet details
[ec2-user@ip-172-31-0-244 ~]$ aws eks describe-nodegroup --cluster-name "ashish" --nodegroup-name "ashish-workers" --query 'nodegroup.subnets' --output text
subnet-0a968db0a4c73858d subnet-0bcd684f5878c3282 subnet-061e107c1f8ebc361
Collect the security group
[ec2-user@ip-172-31-0-244 ~]$ aws eks describe-cluster --name "ashish" --query "cluster.resourcesVpcConfig.clusterSecurityGroupId" --output text
sg-0e0ac4fa44824e1aa
- Create tags for Security Groups and subnets.
aws ec2 create-tags --tags "Key=karpenter.sh/discovery,Value=ashish" --resources "sg-0e0ac4fa44824e1aa"
Step 5 : Update aws-auth ConfigMap
We need to allow nodes that are using the node IAM role we just created to join the cluster. To do that we have to modify the aws-auth ConfigMap in the cluster.
kubectl edit configmap aws-auth -n kube-system
you will need to add a section to the mapRoles that looks something like this.
- groups:
- system:bootstrappers
- system:nodes
# - eks:kube-proxy-windows
rolearn: arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-ashish
username: system:node:{{EC2PrivateDNSName}}
The full aws-auth configmap should have two groups. One for your Karpenter node role and one for your existing node group.
Step 6: Deploy Karpenter
helm install karpenter oci://public.ecr.aws/karpenter/karpenter --namespace "karpenter" --create-namespace \
--set "settings.clusterName=ashish" \
--set "serviceAccount.annotations.eks\.amazonaws\.com/role-arn=arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerRole-ashish" \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1Gi \
--set controller.resources.limits.cpu=1 \
--set controller.resources.limits.memory=1Gi
output:
[ec2-user@ip-172-31-0-244 ~]$ helm install karpenter oci://public.ecr.aws/karpenter/karpenter --namespace "karpenter" --create-namespace \
--set "settings.clusterName=ashish" \
--set "serviceAccount.annotations.eks\.amazonaws\.com/role-arn=arn:aws:iam::256050093938:role/KarpenterControllerRole-ashish" \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1Gi \
--set controller.resources.limits.cpu=1 \
--set controller.resources.limits.memory=1Gi
Pulled: public.ecr.aws/karpenter/karpenter:1.1.1
Digest: sha256:b42c6d224e7b19eafb65e2d440734027a8282145569d4d142baf10ba495e90d0
NAME: karpenter
LAST DEPLOYED: Sat Jan 18 01:51:41 2025
NAMESPACE: karpenter
STATUS: deployed
REVISION: 1
TEST SUITE: None
[ec2-user@ip-172-31-0-244 ~]$ kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
karpenter karpenter-7d4c9cbd84-vpbfw 1/1 Running 0 29m
karpenter karpenter-7d4c9cbd84-zjwz4 1/1 Running 0 29m
kube-system aws-node-889mt 2/2 Running 0 16m
kube-system aws-node-rnzsk 2/2 Running 0 51m
kube-system coredns-6c55b85fbb-4cj87 1/1 Running 0 54m
kube-system coredns-6c55b85fbb-nxwrg 1/1 Running 0 54m
kube-system kube-proxy-8jmbr 1/1 Running 0 16m
kube-system kube-proxy-mt4nt 1/1 Running 0 51m
kube-system metrics-server-5-4zwff 1/1 Running 0 54m
kube-system cluster-autoscaler-lb7cw 1/1 Running 0 54m
Step 7 : Create a Nodepool
We need to create a default NodePool so Karpenter knows what types of nodes we want for unscheduled workloads.
You can retrieve the image ID of the latest recommended Amazon EKS optimized Amazon Linux AMI with the following command
fetch AMI ID using command line:
aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.30/amazon-linux-2-arm64/recommended/image_id --query Parameter.Value --output text
aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.30/amazon-linux-2/recommended/image_id --query Parameter.Value --output text
aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.30/amazon-linux-2-gpu/recommended/image_id --query Parameter.Value --output text
Note:
- Please change instance type as per your requirment
- This is unsafe for production workloads. Validate AMIs in lower environments before deploying them to production.
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["t","m"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["2"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
expireAfter: 720h # 30 * 24h = 720h
limits:
cpu: 1000
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 1m
####
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2 # Amazon Linux 2
role: "KarpenterNodeRole-ashish" # replace with your cluster name
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "ashish" # replace with your cluster name
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "ashish" # replace with your cluster name
amiSelectorTerms:
- id: "${ARM_AMI_ID}"
- id: "${AMD_AMI_ID}"
# - id: "${GPU_AMI_ID}" # <- GPU Optimized AMD AMI
# - name: "amazon-eks-node-${K8S_VERSION}-*" # <- automatically upgrade when a new AL2 EKS Optimized AMI is released.
output:
[ec2-user@ip-172-31-0-244 ~]$ vim nodepool.yaml
[ec2-user@ip-172-31-0-244 ~]$ kubectl apply -f nodepool.yaml
nodepool.karpenter.sh/general-purpose created
ec2nodeclass.karpenter.k8s.aws/default created
Increase the Nginx load.
- Deploy a nginx
[ec2-user@ip-172-31-0-244 ~]$ kubectl create deploy nginx --image=nginx:1.7.8 -- replicas=2
deployment.apps/nginx created
[ec2-user@ip-172-31-0-244 ~]$ kubectl edit deployment nginx
deployment.apps/nginx edited
[ec2-user@ip-172-31-0-244 ~]$ kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default nginx-65757d685b-74h84 1/1 Running 0 78s
default nginx-65757d685b-jzxrv 1/1 Running 0 78s
default nginx-65757d685b-lqwf4 1/1 Running 0 3m11s
default nginx-65757d685b-qfgcq 1/1 Running 0 78s
default nginx-65757d685b-ssrrk 1/1 Running 0 78s
karpenter karpenter-7d4c9cbd84-vpbfw 1/1 Running 0 25m
karpenter karpenter-7d4c9cbd84-zjwz4 1/1 Running 0 25m
kube-system aws-node-889mt 2/2 Running 0 11m
kube-system aws-node-rnzsk 2/2 Running 0 46m
kube-system coredns-6c55b85fbb-4cj87 1/1 Running 0 50m
kube-system coredns-6c55b85fbb-nxwrg 1/1 Running 0 50m
kube-system kube-proxy-8jmbr 1/1 Running 0 11m
kube-system kube-proxy-mt4nt 1/1 Running 0 46m
kube-system metrics-server-5-4zwff 1/1 Running 0 32m
kube-system cluster-autoscaler-lb7cw 1/1 Running 0 35m
[ec2-user@ip-172-31-0-244 ~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-192-168-54-99.ap-south-1.compute.internal Ready <none> 51m v1.30.8-eks-aeac579
ip-192-168-72-178.ap-south-1.compute.internal Ready <none> 17m v1.30.8-eks-aeac579
Increased load in Nginx
[ec2-user@ip-172-31-0-244 ~]$ kubectl edit deployment nginx
deployment.apps/nginx edited
[ec2-user@ip-172-31-0-244 ~]$ kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default nginx-65757d685b-5q9dn 1/1 Running 0 2m12s
default nginx-65757d685b-6twr7 0/1 Pending 0 4s
default nginx-65757d685b-74h84 1/1 Running 0 10m
default nginx-65757d685b-7fqwh 0/1 Pending 0 4s
default nginx-65757d685b-8s4vx 1/1 Running 0 4s
default nginx-65757d685b-b46x9 1/1 Running 0 2m12s
default nginx-65757d685b-b4vxx 1/1 Running 0 2m12s
default nginx-65757d685b-c9xk2 0/1 Pending 0 4s
default nginx-65757d685b-cfsg9 0/1 Pending 0 4s
default nginx-65757d685b-cwcz4 0/1 Pending 0 4s
default nginx-65757d685b-f9z6f 1/1 Running 0 3m38s
default nginx-65757d685b-gprq7 0/1 Pending 0 4s
default nginx-65757d685b-hcqlq 0/1 Pending 0 4s
default nginx-65757d685b-jcd2b 0/1 Pending 0 4s
default nginx-65757d685b-m6kbf 1/1 Running 0 3m38s
default nginx-65757d685b-mvpcf 0/1 Pending 0 4s
default nginx-65757d685b-nshbx 1/1 Running 0 2m12s
default nginx-65757d685b-pt7fj 1/1 Running 0 2m12s
default nginx-65757d685b-q6vnq 0/1 Pending 0 4s
default nginx-65757d685b-qcx94 0/1 Pending 0 4s
default nginx-65757d685b-qfgcq 1/1 Running 0 10m
default nginx-65757d685b-sfhsn 0/1 Pending 0 4s
default nginx-65757d685b-sj9vd 1/1 Running 0 3m38s
default nginx-65757d685b-sk74g 0/1 Pending 0 4s
default nginx-65757d685b-vptn5 1/1 Running 0 4s
karpenter karpenter-7d4c9cbd84-74527 0/1 Pending 0 2m12s
karpenter karpenter-7d4c9cbd84-zjwz4 1/1 Running 0 34m
kube-system aws-node-rnzsk 2/2 Running 0 55m
kube-system coredns-6c55b85fbb-4cj87 1/1 Running 0 59m
kube-system coredns-6c55b85fbb-nxwrg 1/1 Running 0 59m
kube-system kube-proxy-mt4nt 1/1 Running 0 55m
kube-system metrics-server-5-4zwff 1/1 Running 0 54m
kube-system cluster-autoscaler-lb7cw 1/1 Running 0 54m
- *New node created *
NAME STATUS ROLES AGE VERSION
ip-192-168-54-99.ap-south-1.compute.internal Ready <none> 57m v1.30.8-eks-aeac579
ip-192-168-75-159.ap-south-1.compute.internal Ready <none> 95s v1.30.8-eks-aeac579
Step 8 :Remove Cluster Autoscalar
Now that karpenter is running we can disable the cluster autoscaler. To do that we will scale the number of replicas to zero.
kubectl scale deploy/cluster-autoscaler -n kube-system --replicas=0
If you have a single multi-AZ node group, we suggest a minimum of 2 instances.
aws eks update-nodegroup-config --cluster-name "ashish" \
--nodegroup-name "ashish-workers" \
--scaling-config "minSize=2,maxSize=2,desiredSize=2"
if you have multiple single-AZ node groups, we suggest a minimum of 1 instance each.
for NODEGROUP in $(aws eks list-nodegroups --cluster-name "ashish" \
--query 'nodegroups' --output text); do aws eks update-nodegroup-config --cluster-name "ashish" \
--nodegroup-name "ashish-workers" \
--scaling-config "minSize=1,maxSize=1,desiredSize=1"
done
Step 9 : Verify Karpenter
kubectl logs -f -n karpenter -c controller -l app.kubernetes.io/name=karpenter
Pricing
EKS Management Fees:
`- Both EKS Cluster Autoscaler and Karpenter incur a $0.10 per hour fee for the EKS cluster management.
- EKS Management Fee = $0.10 per hour
EC2 instance cost:
- Assume you're running an instance type m5.large at $0.096 per hour for 5 nodes.
- Cluster Autoscaler EC2 Cost = $0.096 * 5 * 24 hours = $11.52 per day
- Karpenter EC2 Cost (optimized with Spot Instances): Assume a 60% discount on Spot pricing, so the cost would be $0.0384 per hour per instance.
- Karpenter EC2 Cost = $0.0384 * 5 * 24 hours = $4.61 per day
Cost Comparison for 30 Days:
- Cluster Autoscaler (with On-Demand EC2 instances):
- $11.52 * 30 = $345.60 per month
- Karpenter (with Spot Instances):
- $4.61 * 30 = $138.30 per month
Total Cost Calculation (for one month):
- Cluster Autoscaler Total:
- EKS Fee ($0.10 * 24 hours * 30 days) = $72
- EC2 Cost = $345.60
- $72 + $345.60 = $417.60 per month
- Karpenter Total:
- EKS Fee ($0.10 * 24 hours * 30 days) = $72
- EC2 Cost = $138.30
- Total = $72 + $138.30 = $210.30 per month
Cost Savings with Karpenter:
By migrating to Karpenter, you can save approximately:
- $417.60 - $210.30 = $207.30 per month
Conclusion
Migrating from EKS Cluster Autoscaler to Karpenter offers several benefits, including improved scaling speed, cost efficiency, and simplified management. By following the steps outlined in this blog, you should be able to successfully migrate your cluster to use Karpenter, enhancing both performance and scalability.
Remember, while Karpenter provides a more dynamic scaling solution, it’s important to continuously monitor your cluster to ensure it is optimizing resources effectively and making adjustments as your workloads evolve.
Ref:
Top comments (0)