AKS Pods crashing after turning on load balancer service

Question

AKS Pods crashing after turning on load balancer service

Mir Majeed 20

We are encountering an issue with our AKS workload where all pods for the WebUI service

enter a CrashLoopBackOff state after configuring Horizontal Pod Autoscaler and

LoadBalancer service.

Please see the attachment with implementation details.AKS Ticket.pdf

Himanshu Shekhar 255 Reputation points Microsoft External Staff Moderator

2025-08-14T23:29:24.7366667+00:00

Hello Mir Majeed - Could you please provide an update on this post?

Kindly let us know if the suggested steps helps or you need further assistance on this issue.
Mir Majeed 20 Reputation points

2025-08-15T14:32:28.9566667+00:00

Thank you. This helped to find the issue and resolve it.
Himanshu Shekhar 255 Reputation points Microsoft External Staff Moderator

2025-08-18T17:41:13.8066667+00:00

Hello Mir Majeed - Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

Accepted answer

1 additional answer

Your answer

Himanshu Shekhar 255 Reputation points Microsoft External Staff Moderator

2025-08-14T23:29:24.7366667+00:00

Hello Mir Majeed - Could you please provide an update on this post?

Kindly let us know if the suggested steps helps or you need further assistance on this issue.
Mir Majeed 20 Reputation points

2025-08-15T14:32:28.9566667+00:00

Thank you. This helped to find the issue and resolve it.
Himanshu Shekhar 255 Reputation points Microsoft External Staff Moderator

2025-08-18T17:41:13.8066667+00:00

Hello Mir Majeed - Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

Answer 1

Hi Mir Majeed

Welcome to Microsoft Q&A Platform. Thank you for reaching out & hope you are doing well.

Please begin by gathering information about the failing pods:

Check overall pod status : kubectl get pods -n <namespace> -o wide
Get detailed pod information : kubectl describe pod <pod-name> -n <namespace>
Check pod events for specific error messages : kubectl get events --sort-by=.metadata.creationTimestamp -n <namespace>

Collect and analyze logs from both current and previous container instances:

Current container logs : kubectl logs <pod-name> -n <namespace>
Previous container logs (crucial for crashed pods) : kubectl logs <pod-name> --previous -n <namespace>
Real-time log monitoring : kubectl logs -f <pod-name> -n <namespace>
For multi-container pods : kubectl logs <pod-name> -c <container-name> -n <namespace>

Verify resource consumption and limits:

Check current resource usage : kubectl top pods -n <namespace>
Examine resource requests and limits : kubectl describe pod <pod-name> -n <namespace> | grep -A 5 -B 5 "Limits|Requests"
Check node capacity: kubectl describe node <node-name>

Inspect the Load Balancer service configuration:

Check service status and endpoints : kubectl get svc -n <namespace> -o wide kubectl describe svc <service-name> -n <namespace>
Verify service endpoints: kubectl get endpoints -n <namespace>
Check external IP assignment: kubectl get svc <service-name> -n <namespace> -o jsonpath='{.status.loadBalancer.ingress.ip}'

Examine health probe settings and responses:

Check for health probe annotations : kubectl get svc <service-name> -n <namespace> -o yaml | grep -i probe
Test health probe endpoints manually: curl -I http://<pod-ip>:<port>/<health-path>
Validate probe configuration in deployment: kubectl get deployment <deployment-name> -n <namespace> -o yaml | grep -A 10 -B 10 "Probe"

Analyze HPA configuration and metrics collection:

Check HPA status: kubectl get hpa -n <namespace> kubectl describe hpa <hpa-name> -n <namespace>
Verify metrics server functionality:

kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods"

Check for HPA events

kubectl get events --field-selector involvedObject.kind=HorizontalPodAutoscaler -n <namespace>

Verify network security group (NSG) and firewall configurations:

Please check NSG rules affecting load balancer traffic az network nsg rule list --resource-group <resource-group> --nsg-name <nsg-name>

Verify Azure Load Balancer probe IP (168.63.129.16) is allowed

Please check internal connectivity :

kubectl exec -it <pod-name> -n <namespace> -- netstat -an

Test service connectivity from within cluster

kubectl run test-pod --image=busybox --rm -it -- wget -O- <service-name>.<namespace>:80

Pods failing due to insufficient CPU/memory resources

kubectl describe pod <pod-name> | grep -i "insufficient|exceeded|limit"
kubectl top pods --sort-by=cpu
kubectl top pods --sort-by=memory

Reference YAML File


resources:   
 requests:     
  cpu: "250m"     
  memory: "512Mi"   
 limits:     
  cpu: "500m"    
  memory: "1Gi"

Missing or Misconfigured Health Probes

kubectl describe pod <pod-name> | grep -A 5 -B 5 "Probe"
kubectl logs <pod-name> | grep -i "probe|health"

Reference YAML file:

spec:
  containers:
    - name: webui
      livenessProbe:
        httpGet:
          path: /health
          port: 80
        initialDelaySeconds: 30
        periodSeconds: 10
        timeoutSeconds: 5
        failureThreshold: 3
      readinessProbe:
        httpGet:
          path: /ready
          port: 80
        initialDelaySeconds: 15
        periodSeconds: 5
        timeoutSeconds: 3
        failureThreshold: 3

HPA Metrics Collection Failures HPA unable to scale due to missing metrics

kubectl describe hpa <hpa-name> | grep -i "fail|error|unable"
kubectl get deployment <deployment-name> -o yaml | grep resources

For reference : Ensure resource requests are properly configured

spec:
  containers:
    - name: webui
      resources:
        requests:
          cpu: "100m"
          memory: "128Mi"
        limits:
          cpu: "500m"
          memory: "512Mi"

Load Balancer Health Probe Conflicts

kubectl get svc <service-name> -o yaml | grep annotations
kubectl describe endpoints <service-name>

For reference : Disable health probes for specific ports if needed

metadata:
  annotations:
    service.beta.kubernetes.io/port_8080_no_probe_rule: "true"

Refer documentation below for details :

Health Probes: Configure comprehensive liveness and readiness probes : https://learn.microsoft.com/en-us/azure/aks/best-practices-app-cluster-reliability
Gradual Scaling: Use HPA behavior policies to control scaling velocity : https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
Monitoring: Implement comprehensive monitoring and alerting : https://learn.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-livedata-overview
Pod is stuck in CrashLoopBackOff mode : https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/pod-stuck-crashloopbackoff-mode

Please let us know if you need any further assistance.

Regards

Himanshu

Answer 2

Thanks for posting your question in the Microsoft Q&A forum.

It usually means the containers keep failing right after starting, so Kubernetes keeps restarting them with longer waits in between. Common causes include app misconfigurations, missing env variables, bad ConfigMaps/Secrets, low resource limits, probe failures, or networking issues after exposing the service. The quickest way to troubleshoot is to check pod logs and kubectl describe for errors, review probes, verify configs, and make sure resource limits and dependencies match the new scaling and traffic setup.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful

Share via

AKS Pods crashing after turning on load balancer service

1 additional answer

Your answer