Azure OpenAI – Quota mismatch: Portal shows 2M TPM but ARM API only allows 200 for o3-mini

Question

Azure OpenAI – Quota mismatch: Portal shows 2M TPM but ARM API only allows 200 for o3-mini

Juan Ayala 0

Hello,

I am working with Azure OpenAI deployments through the Python SDK (CognitiveServicesManagementClient). When I try to deploy o3-mini with capacity=1000, I get the following error:
This operation require 1000 new capacity in quota Ten Thousand Tokens Per Minute - o3-mini - DataZoneStandard,

which is bigger than the current available capacity 200.

The current quota usage is 0 and the quota limit is 200 for quota Ten Thousand Tokens Per Minute - o3-mini - DataZoneStandard.

However, in the Azure Portal → Quotas tab, the limit for o3-mini is 2M TPM, not 200. It seems that the ARM/SDK API is still validating against the default quota, not the approved quota.

Could you confirm if this is a known limitation? Do I need to open a support ticket to synchronize the quota between ARM and the actual applied quota in the portal?

Thanks in advance.

1 answer

Your answer

Answer 1

Hello Juan !

Thank you for posting on Microsoft Learn.

I don't think there is a mismatch since you are dealing with 2 different units.

In ARM/SDK, capacity is counted in units, and for o3-mini one unit = 10,000 TPM (and 1 RPM).

https://learn.microsoft.com/en-us/azure/ai-foundry/openai/quotas-limitshttps://learn.microsoft.com/en-us/azure/ai-foundry/openai/quotas-limits

Your quota category is ten thousand tokens per minute o3-mini DataZoneStandard, so when ARM says you have 200 available, that means 200 × 10,000 TPM = 2,000,000 TPM, which is exactly what the portal displays.

https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota

What happened in your deploy ?

You asked for capacity=1000 that’s 10,000,000 TPM, but your approved quota is 200 units = 2,000,000 TPM. So the SDK/ARM error saying you need 1000 units but only 200 are available. https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota

So you need to set capacity to the number of 10k TPM chunks you want, and make sure the sum across deployments ≤ your quota:

If you want the full 2M TPM the capacity should be 200.

If you want 500k TPM the capacity should be 50.

If you want 2 deployments sharing the quota https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/quotas-limits

If you need more than 2M TPM, you’ll need a quota increase request via the Foundry quotas page (or support), but for this specific issue, just scale capacity using the 10k TPM per unit rule. https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/quotas-limits

Share via

Azure OpenAI – Quota mismatch: Portal shows 2M TPM but ARM API only allows 200 for o3-mini

1 answer

Your answer