Azure AKS - Intermittent Image pull error with - "net/http: TLS handshake timeout"

Nikhil Agarwal 20 Reputation points
2025-08-22T03:40:28.8933333+00:00

I have a public azure container registry (Premium tier) with geolocation enabled and when i am pulling images in an AKS cluster i get the below cluster events while pulling the image. The error is intermittent but am getting the error very frequently.

Failed to pull image "[turboscale-bzb3ecasb8btg3c7.azurecr.io/browserstack/node-cypress:node-20-cypress-13.10.0-chrome-131-1.0.3(http://turboscale-bzb3ecasb8btg3c7.azurecr.io/browserstack/node-cypress:node-20-cypress-13.10.0-chrome-131-1.0.3%5C)": failed to pull and unpack image "[turboscale-bzb3ecasb8btg3c7.azurecr.io/browserstack/node-cypress:node-20-cypress-13.10.0-chrome-131-1.0.3(http://turboscale-bzb3ecasb8btg3c7.azurecr.io/browserstack/node-cypress:node-20-cypress-13.10.0-chrome-131-1.0.3%5C)": failed to resolve reference "[turboscale-bzb3ecasb8btg3c7.azurecr.io/browserstack/node-cypress:node-20-cypress-13.10.0-chrome-131-1.0.3(http://turboscale-bzb3ecasb8btg3c7.azurecr.io/browserstack/node-cypress:node-20-cypress-13.10.0-chrome-131-1.0.3%5C)": failed to do request: Head "[https://turboscale-bzb3ecasb8btg3c7.azurecr.io/v2/browserstack/node-cypress/manifests/node-20-cypress-13.10.0-chrome-131-1.0.3(https://turboscale-bzb3ecasb8btg3c7.azurecr.io/v2/browserstack/node-cypress/manifests/node-20-cypress-13.10.0-chrome-131-1.0.3%5C)": net/http: TLS handshake timeout

Back-off pulling image \"turboscale-bzb3ecasb8btg3c7.azurecr.io/browserstack/node-cypress:node-20-cypress-13.10.0-chrome-131-1.0.3\"

Azure Container Registry
Azure Container Registry
An Azure service that provides a registry of Docker and Open Container Initiative images.
{count} votes

2 answers

Sort by: Most helpful
  1. SUNOJ KUMAR YELURU 15,811 Reputation points MVP Volunteer Moderator
    2025-08-22T04:58:05.58+00:00

    Hello @Nikhil Agarwal

    The intermittent "net/http: TLS handshake timeout" error during image pulling in AKS is likely due to network connectivity issues, ACR authentication problems, TLS/SSL issues, AKS node resource constraints, Docker daemon issues, or ACR performance/throttling. Follow the troubleshooting steps to identify and resolve the root cause.

    Step 1: Check Network Connectivity - Verify that the AKS nodes can resolve the ACR's DNS name and can connect to the ACR endpoint.

    Step 2: Verify ACR Authentication - Ensure AKS has the necessary permissions to pull images from ACR using a Managed Identity or Service Principal.

    Step 3: Investigate TLS/SSL Issues - Check for issues with TLS/SSL negotiation between AKS and ACR.

    Step 4: Monitor AKS Node Resources - Check if AKS nodes are under resource pressure.


    If the Answer is helpful, please click Accept Answer and Up-Vote, so that it can help others in the community looking for help on similar topics.


  2. Anusree Nashetty 6,225 Reputation points Microsoft External Staff Moderator
    2025-08-22T05:59:50.0166667+00:00

    Hi Nikhil Agarwal,

    You are seeing the intermittent TLS handshake timeout while pulling images from a geo-replicated Azure Container Registry (ACR).

    This Intermittent “TLS Handshake Timeout” occur due to:

    • Network Latency & Routing: With geo-replication enabled, Azure Traffic Manager routes traffic to the network-closest registry replica. Sometimes, DNS on your client/cluster nodes may resolve to a distant replica, causing significant network latency and slow TLS handshakes, especially if your cluster is far from the resolved ACR instance.
    • DNS Server Location: The DNS server used by your AKS nodes might not be geographically close to your AKS cluster, resulting in requests being routed to a distant container registry replica.
    • Resource Overhead/Throttling: High cluster/node load or registry resource issues can intermittently affect performance and connectivity.
    • Proxy/Firewall Issues: If your Docker daemon proxy settings (on AKS nodes) or outbound firewall rules are off, secure connections can intermittently fail.
    • Image/geolocation Split: When pushing to geo-replicated ACR, layers/manifests may be distributed across multiple regions, leading to validation/pull issues if the manifest isn’t accessible in the closest replica.

    For detailed information, please check: https://learn.microsoft.com/en-us/azure/container-registry/container-registry-troubleshoot-performance

    Please check: https://collabnix.com/error-docker-pull-intermittent-tls-handshake-timeout/

    For troubleshooting, please check: https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/extensions/cannot-pull-image-from-acr-to-aks-cluster

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.