Common Kubernetes Resource Management Issues
This guide covers frequently encountered issues when managing Kubernetes resources and provides practical solutions to resolve them.
Pod Issues
Pods Stuck in Pending State
Symptoms:
- Pods remain in
Pending
status indefinitely - Events show resource constraints or scheduling issues
Diagnostic Commands:
# Check pod status and events
kubectl describe pod <pod-name> -n <namespace>
# Check node resource availability
kubectl describe nodes | grep -A 5 "Allocated resources"
Common Causes and Solutions:
- Insufficient Resources
- Nodes don’t have enough CPU/memory to schedule the pod
```bash
Check resource requests vs node capacity
kubectl describe nodes | grep -A 10 “Capacity”
Adjust pod resource requests
kubectl patch deployment
-n --type=json -p='[{"op":"replace","path":"/spec/template/spec/containers/0/resources/requests","value":{"cpu":"100m","memory":"256Mi"}}]' ``` - Nodes don’t have enough CPU/memory to schedule the pod
```bash
- Node Selector/Affinity Constraints
- Pod has node selectors that can’t be satisfied
```bash
Check node labels
kubectl get nodes –show-labels
Modify node selector if needed
kubectl patch deployment
-n --type=json -p='[{"op":"remove","path":"/spec/template/spec/nodeSelector"}]' ``` - Pod has node selectors that can’t be satisfied
```bash
- PVC Binding Issues
- Pod requires PVC that can’t be bound
# Check PVC status kubectl get pvc -n <namespace> kubectl describe pvc <pvc-name> -n <namespace>
- Pod requires PVC that can’t be bound
Pods Stuck in Terminating State
Symptoms:
- Pod shows
Terminating
status for an extended period kubectl delete pod
hangs
Solutions:
# Force delete the pod
kubectl delete pod <pod-name> -n <namespace> --force --grace-period=0
# If pod has finalizers, remove them
kubectl patch pod <pod-name> -n <namespace> --type='json' -p='[{"op":"remove","path":"/metadata/finalizers"}]'
CrashLoopBackOff Errors
Symptoms:
- Pod status shows
CrashLoopBackOff
- Container repeatedly restarts
Diagnostic Commands:
# Check pod logs
kubectl logs <pod-name> -n <namespace> --previous
# Check pod events
kubectl describe pod <pod-name> -n <namespace>
Solutions:
- Fix application errors shown in logs
- Ensure resource limits are adequate:
kubectl patch deployment <deployment-name> -n <namespace> --type=json -p='[{"op":"replace","path":"/spec/template/spec/containers/0/resources/limits","value":{"cpu":"1","memory":"1Gi"}}]'
- Check for volume mount issues:
kubectl describe pod <pod-name> -n <namespace> | grep -A 10 "Volumes:"
Deployment Issues
Deployment Not Creating Pods
Symptoms:
- Deployment exists but no pods are created
- Replica count shows 0/N available
Diagnostic Commands:
# Check deployment status
kubectl describe deployment <deployment-name> -n <namespace>
# Check replica sets
kubectl get rs -n <namespace> -l app=<deployment-selector>
Solutions:
- Check for admission controller issues:
kubectl get validatingwebhookconfigurations kubectl get mutatingwebhookconfigurations
- Verify pod spec is valid:
kubectl apply --validate=true --dry-run=client -f deployment.yaml
- Check for PodDisruptionBudget conflicts:
kubectl get pdb -n <namespace>
Deployment Stuck on Rolling Update
Symptoms:
- Deployment shows partial rollout
- New pods don’t become ready
Solutions:
- Check readiness probe failures:
kubectl describe pod <new-pod-name> -n <namespace>
- Adjust rollout strategy:
kubectl patch deployment <deployment-name> -n <namespace> --type=json -p='[{"op":"replace","path":"/spec/strategy","value":{"type":"Recreate"}}]'
- Rollback to previous version:
kubectl rollout undo deployment/<deployment-name> -n <namespace>
Service and Networking Issues
Service Not Routing Traffic
Symptoms:
- Pods are running but service doesn’t route traffic
- Endpoint connections time out
Diagnostic Commands:
# Check service and endpoints
kubectl describe service <service-name> -n <namespace>
kubectl get endpoints <service-name> -n <namespace>
# Verify label selectors
kubectl get pods -n <namespace> -l <service-selector>
Solutions:
- Fix selector mismatch:
# Update service selector to match pod labels kubectl patch service <service-name> -n <namespace> --type=json -p='[{"op":"replace","path":"/spec/selector","value":{"app":"<correct-label>"}}]'
- Check pod readiness:
kubectl get pods -n <namespace> -o wide
- Test network connectivity:
kubectl run test-$RANDOM --rm -it --image=busybox -n <namespace> -- wget -O- <service-name>:<port>
Ingress Not Working
Symptoms:
- Service works internally but Ingress doesn’t route external traffic
- Ingress controller logs show errors
Solutions:
- Verify Ingress resource:
kubectl describe ingress <ingress-name> -n <namespace>
- Check Ingress controller logs:
kubectl logs -n <ingress-controller-namespace> -l app=<ingress-controller> --tail=100
- Verify TLS certificate if using HTTPS:
kubectl get secret <tls-secret-name> -n <namespace>
Volume and Storage Issues
PersistentVolumeClaim Stuck in Pending
Symptoms:
- PVC remains in
Pending
state - Pods requiring the PVC also stay in
Pending
Diagnostic Commands:
# Check PVC status
kubectl describe pvc <pvc-name> -n <namespace>
# Check storage classes
kubectl get storageclass
Solutions:
- Verify storage class exists and is default:
kubectl get sc -o yaml
- Check storage provisioner is running:
kubectl get pods -n kube-system | grep provisioner
- Create PV manually if using static provisioning:
kubectl apply -f persistent-volume.yaml
Volume Mount Failures
Symptoms:
- Pods fail to start with volume-related errors
- Events show “unable to mount volume”
Solutions:
- Check volume types and paths:
kubectl describe pod <pod-name> -n <namespace> | grep -A 15 "Volumes:"
- Verify permissions on host paths:
# For hostPath volumes on specific node kubectl debug node/<node-name> -it --image=ubuntu -- bash ls -la /path/on/host
- Check if the PV was deleted:
kubectl get pv | grep <pv-name>
ConfigMap and Secret Issues
ConfigMap or Secret Changes Not Reflected in Pods
Symptoms:
- Updated ConfigMap or Secret doesn’t affect running pods
- Applications still use old configurations
Solutions:
- Restart dependent pods:
kubectl rollout restart deployment <deployment-name> -n <namespace>
- Use latest Kubernetes best practices:
# Add checksum annotation to trigger automatic restarts CHECKSUM=$(kubectl get cm <configmap-name> -n <namespace> -o yaml | sha256sum) kubectl patch deployment <deployment-name> -n <namespace> --type=json -p="[{\"op\":\"add\",\"path\":\"/spec/template/metadata/annotations/checksum\",\"value\":\"${CHECKSUM}\"}]"
- Use ConfigMap subPath with caution - it won’t auto-update
Resource Quota and Limit Issues
Namespace Resource Quota Exceeded
Symptoms:
- New resources can’t be created
- Events show “exceeded quota” errors
Diagnostic Commands:
# Check resource quota usage
kubectl describe resourcequota -n <namespace>
Solutions:
- Identify resource hogs:
kubectl top pod -n <namespace>
- Adjust quota limits:
kubectl edit resourcequota <quota-name> -n <namespace>
- Clean up unused resources:
kubectl get all -n <namespace>
LimitRange Conflicts
Symptoms:
- Pods fail validation
- Events show limit range errors
Solutions:
# Check limit range settings
kubectl get limitrange -n <namespace> -o yaml
# Adjust deployment resources
kubectl patch deployment <deployment-name> -n <namespace> --type=json -p='[{"op":"replace","path":"/spec/template/spec/containers/0/resources","value":{"requests":{"memory":"64Mi","cpu":"50m"},"limits":{"memory":"128Mi","cpu":"100m"}}}]'