Common Kubernetes Resource Management Issues

This guide covers frequently encountered issues when managing Kubernetes resources and provides practical solutions to resolve them.

Pod Issues

Pods Stuck in Pending State

Symptoms:

Pods remain in Pending status indefinitely
Events show resource constraints or scheduling issues

Diagnostic Commands:

# Check pod status and events
kubectl describe pod <pod-name> -n <namespace>

# Check node resource availability
kubectl describe nodes | grep -A 5 "Allocated resources"

Common Causes and Solutions:

Insufficient Resources
- Nodes don’t have enough CPU/memory to schedule the pod ```bash
  Check resource requests vs node capacity
  
  kubectl describe nodes | grep -A 10 “Capacity”
Adjust pod resource requests

kubectl patch deployment -n --type=json -p='[{"op":"replace","path":"/spec/template/spec/containers/0/resources/requests","value":{"cpu":"100m","memory":"256Mi"}}]' ```
Node Selector/Affinity Constraints
- Pod has node selectors that can’t be satisfied ```bash
  Check node labels
  
  kubectl get nodes –show-labels
Modify node selector if needed

kubectl patch deployment -n --type=json -p='[{"op":"remove","path":"/spec/template/spec/nodeSelector"}]' ```

PVC Binding Issues

Pod requires PVC that can’t be bound

# Check PVC status
kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>

Pods Stuck in Terminating State

Symptoms:

Pod shows Terminating status for an extended period
kubectl delete pod hangs

Solutions:

# Force delete the pod
kubectl delete pod <pod-name> -n <namespace> --force --grace-period=0

# If pod has finalizers, remove them
kubectl patch pod <pod-name> -n <namespace> --type='json' -p='[{"op":"remove","path":"/metadata/finalizers"}]'

CrashLoopBackOff Errors

Symptoms:

Pod status shows CrashLoopBackOff
Container repeatedly restarts

Diagnostic Commands:

# Check pod logs
kubectl logs <pod-name> -n <namespace> --previous

# Check pod events
kubectl describe pod <pod-name> -n <namespace>

Solutions:

Fix application errors shown in logs

Ensure resource limits are adequate:

kubectl patch deployment <deployment-name> -n <namespace> --type=json -p='[{"op":"replace","path":"/spec/template/spec/containers/0/resources/limits","value":{"cpu":"1","memory":"1Gi"}}]'

Check for volume mount issues:

kubectl describe pod <pod-name> -n <namespace> | grep -A 10 "Volumes:"

Deployment Issues

Deployment Not Creating Pods

Symptoms:

Deployment exists but no pods are created
Replica count shows 0/N available

Diagnostic Commands:

# Check deployment status
kubectl describe deployment <deployment-name> -n <namespace>

# Check replica sets
kubectl get rs -n <namespace> -l app=<deployment-selector>

Solutions:

Check for admission controller issues:

kubectl get validatingwebhookconfigurations
kubectl get mutatingwebhookconfigurations

Verify pod spec is valid:

kubectl apply --validate=true --dry-run=client -f deployment.yaml

Check for PodDisruptionBudget conflicts:
```
kubectl get pdb -n <namespace>
```

Deployment Stuck on Rolling Update

Symptoms:

Deployment shows partial rollout
New pods don’t become ready

Solutions:

Check readiness probe failures:

kubectl describe pod <new-pod-name> -n <namespace>

Adjust rollout strategy:

kubectl patch deployment <deployment-name> -n <namespace> --type=json -p='[{"op":"replace","path":"/spec/strategy","value":{"type":"Recreate"}}]'

Rollback to previous version:

kubectl rollout undo deployment/<deployment-name> -n <namespace>

Service and Networking Issues

Service Not Routing Traffic

Symptoms:

Pods are running but service doesn’t route traffic
Endpoint connections time out

Diagnostic Commands:

# Check service and endpoints
kubectl describe service <service-name> -n <namespace>
kubectl get endpoints <service-name> -n <namespace>

# Verify label selectors
kubectl get pods -n <namespace> -l <service-selector>

Solutions:

Fix selector mismatch:

# Update service selector to match pod labels
kubectl patch service <service-name> -n <namespace> --type=json -p='[{"op":"replace","path":"/spec/selector","value":{"app":"<correct-label>"}}]'

Check pod readiness:

kubectl get pods -n <namespace> -o wide

Test network connectivity:

kubectl run test-$RANDOM --rm -it --image=busybox -n <namespace> -- wget -O- <service-name>:<port>

Ingress Not Working

Symptoms:

Service works internally but Ingress doesn’t route external traffic
Ingress controller logs show errors

Solutions:

Verify Ingress resource:

kubectl describe ingress <ingress-name> -n <namespace>

Check Ingress controller logs:

kubectl logs -n <ingress-controller-namespace> -l app=<ingress-controller> --tail=100

Verify TLS certificate if using HTTPS:

kubectl get secret <tls-secret-name> -n <namespace>

Volume and Storage Issues

PersistentVolumeClaim Stuck in Pending

Symptoms:

PVC remains in Pending state
Pods requiring the PVC also stay in Pending

Diagnostic Commands:

# Check PVC status
kubectl describe pvc <pvc-name> -n <namespace>

# Check storage classes
kubectl get storageclass

Solutions:

Verify storage class exists and is default:
```
kubectl get sc -o yaml
```

Check storage provisioner is running:

kubectl get pods -n kube-system | grep provisioner

Create PV manually if using static provisioning:
```
kubectl apply -f persistent-volume.yaml
```

Volume Mount Failures

Symptoms:

Pods fail to start with volume-related errors
Events show “unable to mount volume”

Solutions:

Check volume types and paths:

kubectl describe pod <pod-name> -n <namespace> | grep -A 15 "Volumes:"

Verify permissions on host paths:

# For hostPath volumes on specific node
kubectl debug node/<node-name> -it --image=ubuntu -- bash
ls -la /path/on/host

Check if the PV was deleted:
```
kubectl get pv | grep <pv-name>
```

ConfigMap and Secret Issues

ConfigMap or Secret Changes Not Reflected in Pods

Symptoms:

Updated ConfigMap or Secret doesn’t affect running pods
Applications still use old configurations

Solutions:

Restart dependent pods:

kubectl rollout restart deployment <deployment-name> -n <namespace>

Use latest Kubernetes best practices:

# Add checksum annotation to trigger automatic restarts
CHECKSUM=$(kubectl get cm <configmap-name> -n <namespace> -o yaml | sha256sum)
kubectl patch deployment <deployment-name> -n <namespace> --type=json -p="[{\"op\":\"add\",\"path\":\"/spec/template/metadata/annotations/checksum\",\"value\":\"${CHECKSUM}\"}]"

Use ConfigMap subPath with caution - it won’t auto-update

Resource Quota and Limit Issues

Namespace Resource Quota Exceeded

Symptoms:

New resources can’t be created
Events show “exceeded quota” errors

Diagnostic Commands:

# Check resource quota usage
kubectl describe resourcequota -n <namespace>

Solutions:

Identify resource hogs:
```
kubectl top pod -n <namespace>
```

Adjust quota limits:

kubectl edit resourcequota <quota-name> -n <namespace>

Clean up unused resources:
```
kubectl get all -n <namespace>
```

LimitRange Conflicts

Symptoms:

Pods fail validation
Events show limit range errors

Solutions:

# Check limit range settings
kubectl get limitrange -n <namespace> -o yaml

# Adjust deployment resources
kubectl patch deployment <deployment-name> -n <namespace> --type=json -p='[{"op":"replace","path":"/spec/template/spec/containers/0/resources","value":{"requests":{"memory":"64Mi","cpu":"50m"},"limits":{"memory":"128Mi","cpu":"100m"}}}]'

platform-support

A knowledge base where we share helpful tips for platform engineers and teams who write code for them

Common Kubernetes Resource Management Issues

Pod Issues

Pods Stuck in Pending State

Check resource requests vs node capacity

Adjust pod resource requests

Check node labels

Modify node selector if needed

Pods Stuck in Terminating State

CrashLoopBackOff Errors

Deployment Issues

Deployment Not Creating Pods

Deployment Stuck on Rolling Update

Service and Networking Issues

Service Not Routing Traffic

Ingress Not Working

Volume and Storage Issues

PersistentVolumeClaim Stuck in Pending

Volume Mount Failures

ConfigMap and Secret Issues

ConfigMap or Secret Changes Not Reflected in Pods

Resource Quota and Limit Issues

Namespace Resource Quota Exceeded

LimitRange Conflicts

See Also