Azure AI Foundry Models quotas and limits

2025-08-15

This article provides a quick reference and detailed description of the quotas and limits for Azure AI Foundry Models. For quotas and limits specific to the Azure OpenAI in Foundry Models, see Quota and limits in Azure OpenAI.

Quotas and limits reference

Azure uses quotas and limits to prevent budget overruns due to fraud and to honor Azure capacity constraints. Consider these limits as you scale for production workloads. The following sections provide a quick guide to the default quotas and limits that apply to Azure AI model inference service in Azure AI Foundry:

Resource limits

Limit name	Limit value
Azure AI Foundry resources per region per Azure subscription	100
Max projects per resource	250
Max deployments per resource	32

Rate limits

The following table lists limits for Foundry Models for the following rates:

Tokens per minute
Requests per minute
Concurrent request

Models	Tokens per minute	Requests per minute	Concurrent requests
Azure OpenAI models	Varies per model and SKU. See limits for Azure OpenAI.	Varies per model and SKU. See limits for Azure OpenAI.	not applicable
- DeepSeek-R1 - DeepSeek-V3-0324	5,000,000	5,000	300
- Llama 3.3 70B Instruct - Llama-4-Maverick-17B-128E-Instruct-FP8 - Grok 3 - Grok 3 mini	400,000	1,000	300
- Flux-Pro 1.1 - Flux.1-Kontext Pro	not applicable	2 capacity units (6 requests per minute)	not applicable
Rest of models	400,000	1,000	300

To increase your quota:

For Azure OpenAI, use Azure AI Foundry Service: Request for Quota Increase to submit your request.
For other models, see request increases to the default limits.

Due to high demand, we evaluate limit increase requests per request.

Other limits

Limit name	Limit value
Max number of custom headers in API requests¹	10

¹ Our current APIs allow up to 10 custom headers, which the pipeline passes through and returns. If you exceed this header count, your request results in an HTTP 431 error. To resolve this error, reduce the header volume. Future API versions won't pass through custom headers. We recommend that you don't depend on custom headers in future system architectures.

Usage tiers

Global Standard deployments use Azure's global infrastructure to dynamically route customer traffic to the data center with best availability for the customer's inference requests. This infrastructure enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see more variabilities in response latency.

The Usage Limit determines the level of usage above which customers might see larger variability in response latency. A customer's usage is defined per model and is the total tokens consumed across all deployments in all subscriptions in all regions for a given tenant.

Request increases to the default limits

You can submit limit increase requests, which we evaluate one at a time. Open an online customer support request. When you request an endpoint limit increase, provide the following information:

Select Service and subscription limits (quotas) as the Issue type when you open the support request.
Select the subscription you want to use.
Select Cognitive Services as Quota type.
Select Next.
On the Additional details tab, provide detailed reasons for the limit increase so that your request can be processed. Be sure to add the following information to the reason for limit increase:
- Model name, model version (if applicable), and deployment type (SKU).
- Description of your scenario and workload.
- Rationale for the requested increase.
- Target throughput: Tokens per minute, requests per minute, and other relevant metrics.
- Planned time plan (by when you need increased limits).
Select Save and continue.

General best practices to stay within rate limits

To minimize issues related to rate limits, use the following techniques:

Implement retry logic in your application.
Avoid sharp changes in the workload. Increase the workload gradually.
Test different load increase patterns.
Increase the quota assigned to your deployment. Move quota from another deployment, if necessary.

Next steps

Learn more about the models available in Azure AI Foundry Models