Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article provides a summary of the latest releases and major documentation updates for Azure OpenAI.
August 2025
Provisioned spillover General Availability (GA)
Spillover is now Generally Available. Spillover manages traffic fluctuations on provisioned deployments by routing overages to a designated standard deployment. To learn more about how to maximize utilization for your provisioned deployments with spillover, see Manage traffic with spillover for provisioned deployments.
GPT-5 models available
gpt-5
,gpt-5-mini
,gpt-5-nano
To learn more, see the getting started with reasoning models page.gpt-5-chat
is now available. To learn more, see the models pagegpt-5
is now available for Provisioned Throughput Units (PTU).gpt-5-mini
,gpt-5-nano
, andgpt-5-chat
do not require registration.
New version of model-router
Model router now supports GPT-5 series models.
The latest version of model router is currently limited access only. You can request access using the
gpt-5 access
form: gpt-5 limited access model application. If you already haveo3 access
no request is required.Model router for Azure AI Foundry is a deployable AI chat model that automatically selects the best underlying chat model to respond to a given prompt. For more information on how model router works and its advantages and limitations, see the Model router concepts guide. To use model router with the Completions API, follow the How-to guide.
July 2025
GPT-image-1 update (preview)
Input fidelity parameter: The
input_fidelity
parameter in the image edits API lets you control how closely the model conveys the style and features of the subjects in the original (input) image. This is useful for:- Editing photos while preserving facial features; creating avatars that look like original person across different styles; combining faces from multiple people into one image.
- Maintaining brand identity in generated images for marketing assets, mockups, product photography.
- E-commerce and fashion, where you need to edit images of outfits or product details without compromising realism.
Partial image streaming: The image generation and image edits APIs support partial image streaming, where they return images with partially rendered content throughout the image generation process. Display these images to the user to provide earlier visual feedback and show the progress of the image generation operation.
June 2025
codex-mini & o3-pro models released
codex-mini
ando3-pro
are now available. To learn more, see the getting started with reasoning models page
May 2025
Sora video generation released (preview)
Sora (2025-05-02) is a video generation model from OpenAI that can create realistic and imaginative video scenes from text instructions.
Follow the Video generation quickstart to get started. For more information, see the Video generation concepts guide.
Spotlighting for prompt shields (preview)
Spotlighting is a sub-feature of prompt shields that enhances protection against indirect (embedded document) attacks by tagging input documents with special formatting to indicate lower trust to the model. For more information, see the Prompt shields filter documentation.
Model router (preview)
Model router for Azure AI Foundry is a deployable AI chat model that automatically selects the best underlying chat model to respond to a given prompt. For more information on how model router works and its advantages and limitations, see the Model router concepts guide. To use model router with the Completions API, follow the How-to guide.
April 2025
Realtime API (preview) support for WebRTC
The Realtime API (preview) now supports WebRTC, enabling real-time audio streaming and low-latency interactions. This feature is ideal for applications requiring immediate feedback, such as live customer support or interactive voice assistants. For more information, see the Realtime API (preview) documentation.
GPT-image-1 released (preview, limited access)
GPT-image-1 (2025-04-15
) is the latest image generation model from Azure OpenAI. It features major improvements over DALL-E, including:
- Better at responding to precise instructions.
- Reliably renders text.
- Accepts images as input, which enables the new capabilities of image editing and inpainting.
Request access: Limited access model application
Follow the image generation how-to guide to get started with the new model.
o4-mini and o3 models released
o4-mini
and o3
models are now available. These are the latest reasoning models from Azure OpenAI offering significantly enhanced reasoning, quality, and performance. For more information, see the getting started with reasoning models page.
GPT-4.1 released
GPT 4.1 and GPT 4.1-nano are now available. These are the latest models from Azure OpenAI. GPT 4.1 has a 1 million token context limit. For more information, see the models page.
gpt-4o audio models released
New audio models powered by GPT-4o are now available.
The
gpt-4o-transcribe
andgpt-4o-mini-transcribe
speech to text models are released. Use these models via the/audio
and/realtime
APIs.The
gpt-4o-mini-tts
text to speech model is released. Use thegpt-4o-mini-tts
model for text to speech generation via the/audio
API.
For more information about available models, see the models and versions documentation.
March 2025
Responses API & computer-use-preview model
The Responses API is a new stateful API from Azure OpenAI. It brings together the best capabilities from the chat completions and assistants API in one unified experience. The Responses API also adds support for the new computer-use-preview
model which powers the Computer use capability.
For access to computer-use-preview
registration is required, and access will be granted based on Microsoft's eligibility criteria. Customers who have access to other limited access models will still need to request access for this model.
Request access: computer-use-preview
limited access model application
For more information on model capabilities, and region availability see the models documentation.
Playwright integration demo code.
Provisioned spillover (preview)
Spillover manages traffic fluctuations on provisioned deployments by routing overages to a designated standard deployment. To learn more about how to maximize utilization for your provisioned deployments with spillover, see Manage traffic with spillover for provisioned deployments (preview).
Specify content filtering configurations
In addition to the deployment-level content filtering configuration, we now also provide a request header that allows you specify your custom configuration at request time for every API call. For more information, see Use content filters (preview).
February 2025
GPT-4.5 Preview
The latest GPT model that excels at diverse text and image tasks is now available on Azure OpenAI.
For more information on model capabilities, and region availability see the models documentation.
Stored completions API
Stored completions allow you to capture the conversation history from chat completions sessions to use as datasets for evaluations and fine-tuning.
o3-mini data zone standard deployments
o3-mini
is now available for global standard, and data zone standard deployments for registered limited access customers.
For more information, see our reasoning model guide.
gpt-4o mini audio released
The gpt-4o-mini-audio-preview
(2024-12-17
) model is the latest audio completions model. For more information, see the audio generation quickstart.
The gpt-4o-mini-realtime-preview
(2024-12-17
) model is the latest real-time audio model. The real-time models use the same underlying GPT-4o audio model as the completions API, but is optimized for low-latency, real-time audio interactions. For more information, see the real-time audio quickstart.
For more information about available models, see the models and versions documentation.
January 2025
o3-mini released
o3-mini
(2025-01-31
) is the latest reasoning model, offering enhanced reasoning abilities. For more information, see our reasoning model guide.
GPT-4o audio completions
The gpt-4o-audio-preview
model is now available for global deployments in East US 2 and Sweden Central regions. Use the gpt-4o-audio-preview
model for audio generation.
The gpt-4o-audio-preview
model introduces the audio modality into the existing /chat/completions
API. The audio model expands the potential for AI applications in text and voice-based interactions and audio analysis. Modalities supported in gpt-4o-audio-preview
model include: text, audio, and text + audio. For more information, see the audio generation quickstart.
Note
The Realtime API uses the same underlying GPT-4o audio model as the completions API, but is optimized for low-latency, real-time audio interactions.
GPT-4o Realtime API 2024-12-17
The gpt-4o-realtime-preview
model version 2024-12-17 is available for global deployments in East US 2 and Sweden Central regions. Use the gpt-4o-realtime-preview
version 2024-12-17 model instead of the gpt-4o-realtime-preview
version 2024-10-01-preview model for real-time audio interactions.
- Added support for prompt caching with the
gpt-4o-realtime-preview
model. - Added support for new voices. The
gpt-4o-realtime-preview
models now support the following voices:alloy
,ash
,ballad
,coral
,echo
,sage
,shimmer
,verse
. - Rate limits are no longer based on connections per minute. Rate limiting is now based on RPM (requests per minute) and TPM (tokens per minute) for the
gpt-4o-realtime-preview
model. The rate limits for eachgpt-4o-realtime-preview
model deployment are 100K TPM and 1K RPM. During the preview, Azure AI Foundry portal and APIs might inaccurately show different rate limits. Even if you try to set a different rate limit, the actual rate limit will be 100K TPM and 1K RPM.
For more information, see the GPT-4o real-time audio quickstart and the how-to guide.
December 2024
o1 reasoning model released for limited access
The latest o1
model is now available for API access and model deployment. Registration is required, and access will be granted based on Microsoft's eligibility criteria. Customers who previously applied and received access to o1-preview
, don't need to reapply as they're automatically on the wait-list for the latest model.
Request access: limited access model application
To learn more about the advanced o1
series models see, getting started with o1 series reasoning models.
Region availability
Model | Region |
---|---|
o1 (Version: 2024-12-17) |
East US2 (Global Standard) Sweden Central (Global Standard) |
Preference fine-tuning (preview)
Direct preference optimization (DPO) is a new alignment technique for large language models, designed to adjust model weights based on human preferences. Unlike reinforcement learning from human feedback (RLHF), DPO doesn't require fitting a reward model and uses simpler data (binary preferences) for training. This method is computationally lighter and faster, making it equally effective at alignment while being more efficient. DPO is especially useful in scenarios where subjective elements like tone, style, or specific content preferences are important. We’re excited to announce the public preview of DPO in Azure OpenAI, starting with the gpt-4o-2024-08-06
model.
For fine-tuning model region availability, see the models page.
Stored completions & distillation
Stored completions allow you to capture the conversation history from chat completions sessions to use as datasets for evaluations and fine-tuning.
GPT-4o 2024-11-20
gpt-4o-2024-11-20
is now available for global standard deployment in:
- East US
- East US 2
- North Central US
- South Central US
- West US
- West US 3
- Sweden Central
NEW data zone provisioned deployment type
Data zone provisioned deployments are available in the same Azure OpenAI resource as all other Azure OpenAI deployment types but allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. Data zone provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure infrastructure within Microsoft specified data zones. Data zone provisioned deployments are supported on gpt-4o-2024-08-06
, gpt-4o-2024-05-13
, and gpt-4o-mini-2024-07-18
models.
For more information, see the deployment types guide.
Next steps
Learn more about the underlying models that power Azure OpenAI.