Thank you for your feedback and for adopting the GPT-5 models early. We understand how important throughput is for your workloads, particularly when using MCP tools, RAG, and web search features.
The recent reduction in rate limits you’ve observed is due to temporary quota adjustments made to ensure stable service and equitable access for all customers during the initial launch phase. While on launch day your deployments may have achieved much higher throughput, limits are currently being applied regionally and by deployment type while we monitor performance and scale infrastructure.
For context, the current global default limits for GPT-5 models are as follows: gpt-5 and gpt-5-mini have 1 M tokens per minute (TPM) globally by default and up to 10 M TPM for Global Enterprise/MCA-E customers, while Data Zone defaults are 300 K TPM and 3 M TPM for Data Zone Enterprise/MCA-E. gpt-5-nano offers 5 M TPM globally and 150 M TPM for Global Enterprise/MCA-E, with Data Zone limits of 2 M TPM and 50 M TPM for Enterprise/MCA-E. gpt-5-chat has 1 M TPM globally and 5 M TPM for Global Enterprise/MCA-E, with no Data Zone deployment option at this time.
If your workload requires higher throughput immediately, you can submit a quota increase request via the Azure portal or AI Foundry, selecting the relevant model, region, and deployment type. Approved requests can restore capacity above your current limits. While we cannot confirm an exact date for full restoration to launch-day limits, our engineering teams are actively increasing capacity, and we expect to raise regional quotas over the coming weeks. We remain committed to enabling the scale you need while ensuring platform reliability for all customers.