Edit

Share via


How to deploy Azure OpenAI models with Azure AI Foundry

In this article, you learn how to create deployments for Azure OpenAI in Azure AI Foundry Models, using the Azure AI Foundry portal.

Azure OpenAI in Foundry Models offers a diverse set of models with different capabilities and price points. When you deploy Azure OpenAI models in the Azure AI Foundry portal, you can consume the deployments by using prompt flow or another tool. Model availability varies by region. For more information about the details of each model, see Azure OpenAI models.

To modify and interact with an Azure OpenAI model in the Azure AI Foundry playground, you first need to deploy a base Azure OpenAI model to your project. After you deploy the model and make it available in your project, you can consume its REST API endpoint as-is or customize it further with your own data and other components, such as embeddings and indexes.

Prerequisites

Deploy an Azure OpenAI model from the model catalog

Follow the steps in this section to deploy an Azure OpenAI model, such as gpt-4o-mini, to a real-time endpoint from the Azure AI Foundry portal model catalog:

  1. Sign in to Azure AI Foundry.
  2. If you’re not already in your project, select it.
  3. Select Model catalog from the left pane.
  1. In the Collections filter, select Azure OpenAI.

    A screenshot showing how to filter by Azure OpenAI models in the catalog.

  2. Select a model such as gpt-4o-mini from the Azure OpenAI collection.

  3. Select Use this model to open the deployment window.

  4. Select the resource that you want to deploy the model to. If you don't have a resource, create one.

  5. Specify the deployment name and modify other default settings depending on your requirements.

  6. Select Deploy.

  7. Go to the deployment details page. Select Open in playground.

  8. Select View Code to get code samples that you can use to consume the deployed model in your application.

Deploy an Azure OpenAI model from your project

You can also start deployment from your project in Azure AI Foundry portal.

Tip

Because you can customize the left pane in the Azure AI Foundry portal, you might see different items than shown in these steps. If you don't see what you're looking for, select ... More at the bottom of the left pane.

  1. Go to your project in Azure AI Foundry portal.
  2. From the left sidebar of your project, go to My assets > Models + endpoints.
  3. Select + Deploy model > Deploy base model.
  4. Search for and select a model such as gpt-4o-mini from the list of models.
  5. Select Confirm to open the deployment window.
  6. Specify the deployment name and modify other default settings depending on your requirements.
  7. Select Deploy.
  8. Go to the deployment details page. Select Open in playground.
  9. Select View Code to get code samples that you can use to consume the deployed model in your application.

Inferencing the Azure OpenAI model

To perform inferencing on the deployed model, use the playground or code samples. The playground is a web-based interface that lets you interact with the model in real-time. Use the playground to test the model with different prompts and see the model's responses.

For more examples of how to consume the deployed model in your application, see the Get started using chat completions with Azure OpenAI in Azure AI Foundry Models quickstart.

Regional availability and quota limits of a model

For Azure OpenAI models, the default quota for models varies by model and region. Certain models might only be available in some regions. For more information on availability and quota limits, see Azure OpenAI quotas and limits.

Quota for deploying and inferencing a model

For Azure OpenAI models, deploying and inferencing consume quota that Azure assigns to your subscription on a per-region, per-model basis in units of Tokens-per-Minute (TPM). When you sign up for Azure AI Foundry, you receive default quota for most of the available models. Then, you assign TPM to each deployment as you create it, which reduces the available quota for that model. You can continue to create deployments and assign them TPMs until you reach your quota limit.

When you reach your quota limit, you can only create new deployments of that model if you:

For more information about quota, see Azure AI Foundry quota and Manage Azure OpenAI quota.