DEV Community

Sadeek M
Sadeek M

Posted on • Edited on

Debugging Kubernetes cluster part 2

Debugging a Kubernetes cluster requires a deep understanding of its components and inter dependencies. Hereโ€™s a comprehensive Part 2 guide focusing on advanced debugging techniques for common cluster issues:

  1. Node Issues

A. Node Not Ready

Check Node Status:bash

kubectl get nodes
kubectl describe node <node-name>
Enter fullscreen mode Exit fullscreen mode

Inspect Kubelet Logs:
SSH into the node and review logs for errors:bash

journalctl -u kubelet -l
Enter fullscreen mode Exit fullscreen mode

Possible Causes:
Resource exhaustion (e.g., CPU, memory, disk).
Misconfigured networking (e.g., unable to reach the API server).
Issues with container runtime (Docker, containerd).
B. Node Disk Pressure or Memory Pressure

Check Allocations:bash

kubectl describe node <node-name> | grep Allocated
Enter fullscreen mode Exit fullscreen mode

Clean Up Disk Space:
Remove unused images and logs:bash

docker system prune
Enter fullscreen mode Exit fullscreen mode

Reconfigure Resource Limits:
Adjust resource requests and limits for pods.

  1. Pod Issues

A. Pod Stuck in Pending

Inspect Events:bash

kubectl describe pod <pod-name>
Enter fullscreen mode Exit fullscreen mode

Possible Causes:
Insufficient resources: Check node capacity and pod requests.
Scheduling constraints: Inspect nodeSelector, taints, and tolerations.
Networking issues: Ensure the CNI plugin is functioning correctly.
B. CrashLoopBackOff

View Logs:bash

kubectl logs <pod-name> --previous
Enter fullscreen mode Exit fullscreen mode

Check Events:bash

kubectl describe pod <pod-name>
Enter fullscreen mode Exit fullscreen mode

Debugging Steps:
Ensure the container's entrypoint is correct.
Verify environment variables and mounted volumes.
Test locally using the same image.
C. Container Image Pull Issues

Inspect Events:bash

kubectl describe pod <pod-name>
Enter fullscreen mode Exit fullscreen mode

Common Errors:
Unauthorized: Verify image pull secrets.
Image not found: Confirm the image exists in the registry.

  1. Networking Issues

A. Pods Can't Communicate

Ping Other Pods:bash

kubectl exec -it <pod-name> -- ping <pod-ip>
Enter fullscreen mode Exit fullscreen mode

Check Network Policies:bash

kubectl get networkpolicy -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Debugging CNI Plugins:
Inspect CNI logs:bash

cat /var/log/containers/<cni-plugin-name>*.log
Enter fullscreen mode Exit fullscreen mode

B. Service Not Accessible

Check Service Description:bash

kubectl describe svc <service-name>
Enter fullscreen mode Exit fullscreen mode

Inspect Endpoints:bash

kubectl get endpoints <service-name>
Enter fullscreen mode Exit fullscreen mode

Test Connectivity:
From within a pod:bash

curl http://<service-name>.<namespace>:<port>
Enter fullscreen mode Exit fullscreen mode
  1. API Server Issues

Inspect Logs:bash

journalctl -u kube-apiserver
Enter fullscreen mode Exit fullscreen mode

Test API Server Availability:bash

kubectl get --raw /healthz
Enter fullscreen mode Exit fullscreen mode

Common Causes:
SSL/TLS issues: Check certificates and CA bundle.
Resource bottlenecks: Monitor CPU/memory usage.

  1. Persistent Volume Issues

A. PVC Pending

Inspect Events:bash

kubectl describe pvc <pvc-name>
Enter fullscreen mode Exit fullscreen mode

Common Causes:
No matching StorageClass.
Insufficient storage on nodes.
B. PV Bound But Pod Can't Mount

Inspect Logs:bash

kubectl logs <pod-name>
Enter fullscreen mode Exit fullscreen mode

Debugging Steps:
Verify volume permissions.
Test mounting the volume manually on a node.

  1. Cluster DNS Issues
Test DNS Resolution:bash
Enter fullscreen mode Exit fullscreen mode

kubectl exec -it -- nslookup

Inspect CoreDNS Logs:bash

kubectl logs -n kube-system <coredns-pod-name>
Enter fullscreen mode Exit fullscreen mode

Common Fixes:
Restart CoreDNS pods if unresponsive.
Validate ConfigMap for CoreDNS (kubectl get cm -n kube-system coredns).

  1. Troubleshooting Tools

A. kubectl Debugging Tools

Debug running pods:bash

kubectl exec -it <pod-name> -- /bin/sh
Enter fullscreen mode Exit fullscreen mode

Debug containers with ephemeral containers (Kubernetes v1.18+):bash

kubectl debug -it <pod-name> --image=busybox
Enter fullscreen mode Exit fullscreen mode

B. Third-Party Tools

Lens: GUI for Kubernetes cluster monitoring.
K9s: Terminal-based cluster management.
kubectl-trace: System-level tracing for Kubernetes.
C. Logs Aggregation

Use tools like Fluentd, ELK Stack, or Loki for centralized logging.

  1. Proactive Cluster Monitoring

Implement monitoring systems like Prometheus, Grafana, or Datadog.
Set up alerting for critical metrics (e.g., node health, pod restarts).

Example: Debugging Workflow for a Non-Responsive Service

Check Pod Status:bash

kubectl get pods -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Describe the Service:bash

kubectl describe svc <service-name> -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Inspect Logs:bash

kubectl logs <pod-name> -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Test Connectivity:
From within a cluster:bash

curl http://<service-name>.<namespace>:<port>
Enter fullscreen mode Exit fullscreen mode

From outside:bash

curl http://<external-ip>:<port>
Enter fullscreen mode Exit fullscreen mode

This deeper dive equips you to troubleshoot and resolve complex Kubernetes issues effectively. Let me know if you'd like specific scenarios or additional examples!

Top comments (0)