sql cluster failover slower failover when only cluster and client network fails much faster when all networks fail

Pal Ban 0 Reputation points
2025-08-12T14:58:24.01+00:00

We have setup a simple two node SQL Server Cluster using SQL cluster (not WSFC windows cluster), plus witness disk, with two network adaptors on each node Cluster network 1 = 'Cluster and Client' / Cluster network 2 = 'Cluster only'.

SQL server version - 2022 enterprise

MS Server - 2022 standard

During testing we have noted on this very simple setup that when we disconnect the network cable of the NIC 'Cluster network 1' (with cluster and client function) the time taken to failover to the second node is approx. 1 minute 45 seconds, if we however fail both network adaptors at the same time on node 1 (or power off the server), the time to failover to node 2 is approx. 20-25 seconds.

The time is counted from the failure event (as described above) to when the resources shown in the screenshot become all online (which is when the database can be accessed again by client).

User's image

My issue is that the first scenario takes so much time that the database is offline during this period and I don't know why or quite where to start diagnosing this issue.

SQL Server | Other
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Nam Bui (WICLOUD CORPORATION) 80 Reputation points Microsoft External Staff Moderator
    2025-08-28T07:03:27.9666667+00:00

    Hi Pal Ban,

    Here are a few suggestions to help you investigate and potentially improve failover performance:

    1. Network Configuration Review your NIC setup and cluster network settings. If the cluster is configured to be overly cautious, it may take longer to detect partial network failures.

    Cluster Network Thresholds Consider adjusting parameters like SameSubnetThreshold and SameSubnetDelay. These control how quickly the cluster reacts to connectivity issues and may help reduce failover time.

    Heartbeat Settings Check the frequency and sensitivity of heartbeat signals between nodes. Overly aggressive settings could introduce unnecessary delays.

    Client Connection Settings Ensure your client connection string includes MultiSubnetFailover=True. This allows clients to attempt connections to all IPs simultaneously, improving responsiveness during failover.

    Log Review Examine SQL Server and cluster logs for any warnings or errors that occur during the slower failover scenario. These may offer clues about what's causing the delay.

    I hope this gives you a solid starting point for troubleshooting.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.