Hello Juan !
Thank you for posting on Microsoft Learn.
I don't think there is a mismatch since you are dealing with 2 different units.
In ARM/SDK, capacity is counted in units, and for o3-mini one unit = 10,000 TPM (and 1 RPM).
Your quota category is ten thousand tokens per minute o3-mini DataZoneStandard, so when ARM says you have 200 available, that means 200 × 10,000 TPM = 2,000,000 TPM, which is exactly what the portal displays.
https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota
What happened in your deploy ?
You asked for capacity=1000 that’s 10,000,000 TPM, but your approved quota is 200 units = 2,000,000 TPM. So the SDK/ARM error saying you need 1000 units but only 200 are available. https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota
So you need to set capacity to the number of 10k TPM chunks you want, and make sure the sum across deployments ≤ your quota:
If you want the full 2M TPM the capacity should be 200.
If you want 500k TPM the capacity should be 50.
If you want 2 deployments sharing the quota https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/quotas-limits
If you need more than 2M TPM, you’ll need a quota increase request via the Foundry quotas page (or support), but for this specific issue, just scale capacity using the 10k TPM per unit rule. https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/quotas-limits