DEV Community

John  Ajera
John Ajera

Posted on

ECS Task Debugging Checklist

🔍 ECS Task Not Running - Debugging Checklist

If an ECS service is deployed but no tasks are running, follow this step-by-step checklist to systematically identify and fix the issue.


1️⃣ Check ECS Service Deployment Status

Run:

aws ecs describe-services --cluster <your-cluster-name> --services <your-service-name> \
  --query "services[0].{Status:status, DesiredCount:desiredCount, RunningCount:runningCount, PendingCount:pendingCount, Events:events}"
Enter fullscreen mode Exit fullscreen mode

Expected Output:

{
  "Status": "ACTIVE",
  "DesiredCount": 1,
  "RunningCount": 1,
  "PendingCount": 0,
  "Events": []
}
Enter fullscreen mode Exit fullscreen mode

🔴 If RunningCount = 0 and PendingCount = 0, continue debugging.

🔍 If the response contains:

{
  "message": "(service <service-name>) was unable to place a task because no container instance met all of its requirements. Reason: No Container Instances were found in your cluster."
}
Enter fullscreen mode Exit fullscreen mode

Step to check container instances:

Run:

aws ecs list-container-instances --cluster <your-cluster-name>
Enter fullscreen mode Exit fullscreen mode

Expected: List of registered container instances.

🔴 If empty ([]), EC2 instances are not joining ECS.

Fix: Proceed to the next steps to resolve.


2️⃣ Check If EC2 Instances Exist

Run:

aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names <asg-name> \
  --query "AutoScalingGroups[0].Instances[*].InstanceId"
Enter fullscreen mode Exit fullscreen mode

Expected: List of EC2 instance IDs.

🔴 If empty, ASG is not launching instances.

Fix: Scale up ASG:

aws autoscaling update-auto-scaling-group --auto-scaling-group-name <asg-name> --desired-capacity 2
Enter fullscreen mode Exit fullscreen mode

🔴 If instances exist but are not registered in ECS:


3️⃣ Ensure EC2 Instances Have Internet Access (NAT Gateway Check)

Run:

aws ec2 describe-route-tables --filters "Name=vpc-id,Values=<your-vpc-id>" --query "RouteTables[*].Routes[*].{Destination:DestinationCidrBlock, Target:NatGatewayId}"
Enter fullscreen mode Exit fullscreen mode

Expected: At least one route with NatGatewayId set.

🔴 If missing, tasks cannot reach ECS API to register.

Fix: Ensure NAT Gateway is correctly configured:

  1. Check if a NAT Gateway exists:
   aws ec2 describe-nat-gateways --query "NatGateways[*].NatGatewayId"
Enter fullscreen mode Exit fullscreen mode
  1. Ensure private subnets have a route to the NAT Gateway.
  2. If missing, create a NAT Gateway and add a route to private subnets.

4️⃣ Ensure EC2 Instances Join ECS Cluster

Run:

cat /etc/ecs/ecs.config
Enter fullscreen mode Exit fullscreen mode

Expected Output:

ECS_CLUSTER=<your-cluster-name>
Enter fullscreen mode Exit fullscreen mode

🔴 If missing, manually set it:

echo "ECS_CLUSTER=<your-cluster-name>" | sudo tee -a /etc/ecs/ecs.config
sudo systemctl restart ecs
Enter fullscreen mode Exit fullscreen mode

Fix: Ensure the Launch Template includes user data:

user_data = base64encode(<<EOF
#!/bin/bash
echo "ECS_CLUSTER=<your-cluster-name>" >> /etc/ecs/ecs.config
EOF
)
Enter fullscreen mode Exit fullscreen mode

Update Auto Scaling Group with the latest launch template:

aws autoscaling update-auto-scaling-group --auto-scaling-group-name <asg-name> --launch-template Name=<lt-name>,Version=$Latest
Enter fullscreen mode Exit fullscreen mode

5️⃣ Verify ECS Agent is Running on EC2 Instances

Run:

sudo systemctl status ecs
Enter fullscreen mode Exit fullscreen mode

Expected: active (running)

🔴 If not running, restart ECS agent:

sudo systemctl restart ecs
Enter fullscreen mode Exit fullscreen mode

Fix: If ECS agent is missing, install it:

sudo yum install -y ecs-init
sudo systemctl enable ecs
sudo systemctl start ecs
Enter fullscreen mode Exit fullscreen mode

6️⃣ Verify IAM Role for ECS Instances

Run:

aws iam list-attached-role-policies --role-name <ecs-instance-role>
Enter fullscreen mode Exit fullscreen mode

Expected: AmazonEC2ContainerServiceforEC2Role attached.

🔴 If missing, attach manually:

aws iam attach-role-policy --role-name <ecs-instance-role> --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role
Enter fullscreen mode Exit fullscreen mode

7️⃣ Force ECS to Re-Register Instances

Run:

aws ecs update-container-instances-state --cluster <your-cluster-name> --container-instances <instance-id> --status ACTIVE
Enter fullscreen mode Exit fullscreen mode

Expected: Instances move to ACTIVE state.

🔴 If instances are missing, force new deployment:

aws ecs update-service --cluster <your-cluster-name> --service <your-service-name> --force-new-deployment
Enter fullscreen mode Exit fullscreen mode

Final Steps

  1. Check service status:
   aws ecs describe-services --cluster <your-cluster-name> --services <your-service-name>
Enter fullscreen mode Exit fullscreen mode
  1. Check registered EC2 instances:
   aws ecs list-container-instances --cluster <your-cluster-name>
Enter fullscreen mode Exit fullscreen mode
  1. Describe failing task logs:
   aws logs describe-log-streams --log-group-name "/ecs/<your-service-name>"
Enter fullscreen mode Exit fullscreen mode

Once instances are properly registered, ECS should start running tasks. 🚀

Top comments (0)