We have deployed an Azure Local cluster (three nodes, HPE DL380 Gen11)
All of a suddent the storage network in Failover Cluster Manager is reporting as "Partitioned" and we're seeing errors in the Windows Admin Centre console related to a loss of quorum.
If I disallow the cluster to use the storage network the error disappears but we also lose a good amount of performance, when I enable it, we get some performance back (not all) and it very quickly transitions to partitioned.
The storage network is two different physical Mellanox NIC's connected to an Aruba switch via 25Gb DAC cables. All communication would appear to be fine, pings are fine between the IP's on the storage network, the interfaces are trunk ports allowing both Storage VLAN's to communicate accordingly.
On Windows Admin Centre we get these errors:
The cluster detected network connectivity issues that prevent Storage Spaces Direct from working properly.
If a maintenance operation is ongoing, please suspend it and restore access to all storage until the storage stabilizes.
The switch configuration is matched across all storage NIC ports and the settings across the nodes look to be identical for internal PFC/RDMA etc.