Azure OpenAI – Quota mismatch: Portal shows 2M TPM but ARM API only allows 200 for o3-mini

Juan Ayala 0 Reputation points
2025-08-20T00:04:18.53+00:00

Hello,

I am working with Azure OpenAI deployments through the Python SDK (CognitiveServicesManagementClient). When I try to deploy o3-mini with capacity=1000, I get the following error:
This operation require 1000 new capacity in quota Ten Thousand Tokens Per Minute - o3-mini - DataZoneStandard,

which is bigger than the current available capacity 200.

The current quota usage is 0 and the quota limit is 200 for quota Ten Thousand Tokens Per Minute - o3-mini - DataZoneStandard.

However, in the Azure Portal → Quotas tab, the limit for o3-mini is 2M TPM, not 200. It seems that the ARM/SDK API is still validating against the default quota, not the approved quota.

Could you confirm if this is a known limitation? Do I need to open a support ticket to synchronize the quota between ARM and the actual applied quota in the portal?

Thanks in advance.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 36,716 Reputation points Volunteer Moderator
    2025-08-29T18:38:30.63+00:00

    Hello Juan !

    Thank you for posting on Microsoft Learn.

    I don't think there is a mismatch since you are dealing with 2 different units.

    In ARM/SDK, capacity is counted in units, and for o3-mini one unit = 10,000 TPM (and 1 RPM).

    https://learn.microsoft.com/en-us/azure/ai-foundry/openai/quotas-limitshttps://learn.microsoft.com/en-us/azure/ai-foundry/openai/quotas-limits

    Your quota category is ten thousand tokens per minute o3-mini DataZoneStandard, so when ARM says you have 200 available, that means 200 × 10,000 TPM = 2,000,000 TPM, which is exactly what the portal displays.

    https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota

    What happened in your deploy ?

    You asked for capacity=1000 that’s 10,000,000 TPM, but your approved quota is 200 units = 2,000,000 TPM. So the SDK/ARM error saying you need 1000 units but only 200 are available. https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota

    So you need to set capacity to the number of 10k TPM chunks you want, and make sure the sum across deployments ≤ your quota:

    If you want the full 2M TPM the capacity should be 200.

    If you want 500k TPM the capacity should be 50.

    If you want 2 deployments sharing the quota https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/quotas-limits

    If you need more than 2M TPM, you’ll need a quota increase request via the Foundry quotas page (or support), but for this specific issue, just scale capacity using the 10k TPM per unit rule. https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/quotas-limits

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.