GPT 5 Rate Limits Decreased!

WanisElabbar-4383 210 Reputation points
2025-08-09T09:48:12.1666667+00:00

Hello,

I was quite happy to find the GPT 5 models available on launch day and managed to make them available to the company. However, the following day we started to notice the service degrading and then I was shocked to find that all models have had their rate limits capped to 20K a minute, down from 1M.

I deployed in the Sweden Central region using Data Zone Standard which worked well on launch day. Now all deployment types have very low rate limits.

When will you properly restore the GPT 5 service? Those token limits will not cut it anymore with the prevalent use of MCP tools, RAG and web searches that we have going on.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
0 comments No comments
{count} vote

1 answer

Sort by: Most helpful
  1. Pavankumar Purilla 10,680 Reputation points Microsoft External Staff Moderator
    2025-08-12T09:09:14.1066667+00:00

    Hi WanisElabbar-4383,

    Thank you for your feedback and for adopting the GPT-5 models early. We understand how important throughput is for your workloads, particularly when using MCP tools, RAG, and web search features.

    The recent reduction in rate limits you’ve observed is due to temporary quota adjustments made to ensure stable service and equitable access for all customers during the initial launch phase. While on launch day your deployments may have achieved much higher throughput, limits are currently being applied regionally and by deployment type while we monitor performance and scale infrastructure.

    For context, the current global default limits for GPT-5 models are as follows: gpt-5 and gpt-5-mini have 1 M tokens per minute (TPM) globally by default and up to 10 M TPM for Global Enterprise/MCA-E customers, while Data Zone defaults are 300 K TPM and 3 M TPM for Data Zone Enterprise/MCA-E. gpt-5-nano offers 5 M TPM globally and 150 M TPM for Global Enterprise/MCA-E, with Data Zone limits of 2 M TPM and 50 M TPM for Enterprise/MCA-E. gpt-5-chat has 1 M TPM globally and 5 M TPM for Global Enterprise/MCA-E, with no Data Zone deployment option at this time.

    If your workload requires higher throughput immediately, you can submit a quota increase request via the Azure portal or AI Foundry, selecting the relevant model, region, and deployment type. Approved requests can restore capacity above your current limits. While we cannot confirm an exact date for full restoration to launch-day limits, our engineering teams are actively increasing capacity, and we expect to raise regional quotas over the coming weeks. We remain committed to enabling the scale you need while ensuring platform reliability for all customers.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.