AKS cluster is in Failed state. How can I revert the changes

Urmila Purohit 100 Reputation points
2025-08-11T07:31:46.18+00:00

Hii Team,
I have created AKS cluster. When I tried to change the cluster VNET then cluster goes to failed. However the app inside is still working. The AKS cluster and NodePool is in failed state while the Node inside is in ready state. How can I revert this? Will revert operation fix the issue cause due to VNET change? or Revert attempt will fail the existing working app inside cluster? Please let me know your answer

Azure Kubernetes Service
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
{count} votes

Accepted answer
  1. Satish Mada 350 Reputation points Microsoft External Staff Moderator
    2025-08-11T11:17:17.22+00:00

    Hi @Urmila Purohit

    AKS (Azure Kubernetes Service) does not support changing or replacing the virtual network (VNet) or subnet of a cluster after it has been created. If you attempt to modify the VNet settings post-deployment, it can break the connection between the control plane and the cluster nodes, leading to mismatched network configurations. This disruption causes operations like cluster updates and node pool management to fail, even though the underlying virtual machines (VMSS instances) may still be healthy enough to continue running workloads.

    Because reverting a VNet change on an existing cluster is unsupported and will not reliably restore cluster health, the recommended path is:

    1. Verify Cluster and Node Pool State  
      Azure Portal: Navigate to **Kubernetes services > [YourCluster] and check the Overview and Node pools blades.  
      Azure CLI:
      az aks show --resource-group MyRG --name MyAKSCluster --query provisioningState
      az aks nodepool list --resource-group MyRG --cluster-name MyAKSCluster
    2. Create a New AKS Cluster in the Correct VNet  
      Ensure you have a subnet delegated to Microsoft.ContainerService/workload and NSGs/UDRs that allow outbound HTTPS to required Azure endpoints (e.g., mcr.microsoft.com:443).
    3. Migrate Your Workloads  
      Export Kubernetes manifests, Helm charts, or use backup/restore tools such as Velero.   Deploy or restore your workloads into the new cluster.

    4.Validate Application Functionality  
    Test connectivity, service endpoints, and app functionality in the new cluster.  
    Confirm pods, services, and ingress (if used) are operating as expected.

    Once everything is confirmed to be working as expected, you should decommission the old, failed cluster to avoid unnecessary resource consumption.

     Please refer : Concepts - CNI networking in AKS - Azure Kubernetes Service | Microsoft Learn

                          Azure Kubernetes Service cluster/node is in a failed state - Azure | Microsoft Learn

                      

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Shikha Ghildiyal 6,715 Reputation points Microsoft Employee Moderator
    2025-08-11T08:43:42.7333333+00:00

    Hi Urmila Purohit

    Thanks for reaching out to Microsoft Q&A.

    I request you to go through this documentation for scenarios and troubleshooting - https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/cluster-node-virtual-machine-failed-state

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.