GPT 5 Rate Limits Decreased!

Question

GPT 5 Rate Limits Decreased!

WanisElabbar-4383 210

Hello,

I was quite happy to find the GPT 5 models available on launch day and managed to make them available to the company. However, the following day we started to notice the service degrading and then I was shocked to find that all models have had their rate limits capped to 20K a minute, down from 1M.

I deployed in the Sweden Central region using Data Zone Standard which worked well on launch day. Now all deployment types have very low rate limits.

When will you properly restore the GPT 5 service? Those token limits will not cut it anymore with the prevalent use of MCP tools, RAG and web searches that we have going on.

1 answer

Your answer

Answer 1

Pavankumar Purilla 10,680 Microsoft External Staff Moderator

Hi WanisElabbar-4383,

Thank you for your feedback and for adopting the GPT-5 models early. We understand how important throughput is for your workloads, particularly when using MCP tools, RAG, and web search features.

The recent reduction in rate limits you’ve observed is due to temporary quota adjustments made to ensure stable service and equitable access for all customers during the initial launch phase. While on launch day your deployments may have achieved much higher throughput, limits are currently being applied regionally and by deployment type while we monitor performance and scale infrastructure.

For context, the current global default limits for GPT-5 models are as follows: gpt-5 and gpt-5-mini have 1 M tokens per minute (TPM) globally by default and up to 10 M TPM for Global Enterprise/MCA-E customers, while Data Zone defaults are 300 K TPM and 3 M TPM for Data Zone Enterprise/MCA-E. gpt-5-nano offers 5 M TPM globally and 150 M TPM for Global Enterprise/MCA-E, with Data Zone limits of 2 M TPM and 50 M TPM for Enterprise/MCA-E. gpt-5-chat has 1 M TPM globally and 5 M TPM for Global Enterprise/MCA-E, with no Data Zone deployment option at this time.

If your workload requires higher throughput immediately, you can submit a quota increase request via the Azure portal or AI Foundry, selecting the relevant model, region, and deployment type. Approved requests can restore capacity above your current limits. While we cannot confirm an exact date for full restoration to launch-day limits, our engineering teams are actively increasing capacity, and we expect to raise regional quotas over the coming weeks. We remain committed to enabling the scale you need while ensuring platform reliability for all customers.

Pavankumar Purilla 10,680 Reputation points Microsoft External Staff Moderator

2025-08-13T03:30:48.5933333+00:00

Hi WanisElabbar-4383,
Did you get any chance to check the response. Thank you!
Pavankumar Purilla 10,680 Reputation points Microsoft External Staff Moderator

2025-08-14T05:05:05.6466667+00:00

Hi WanisElabbar-4383,
Just following up to see if you had a chance to review the above response. Thank you!
Jan 0 Reputation points

2025-08-26T07:03:50.88+00:00

I want to follow up on this. We are experiencing the exact same issue that WanisElabbar-4383 described. The quota limits you mentioned are not available to our account. Most we can select in Foundry for gpt-5-mini is 20k TPM. That's not even close to what we need for our usecase. I don't really get it since even gpt-5 has higher TPM limits.
Yulia Plaksina 0 Reputation points

2025-08-28T15:23:28.64+00:00

Hello, we have the same issue with gpt-5-mini and the full gpt-5: the maximum available for the Global Standard deployment for both models is 20k TPM. We are using Sweden Central. Is there a different region where the referenced 1 M tokens per minute (TPM) is available for these models?

Thank you,
Yulia

Share via

GPT 5 Rate Limits Decreased!

1 answer

Your answer