Endpoints for Azure AI Foundry Models

2025-08-26

Azure AI Foundry Models enables you to access the most powerful models from leading model providers through a single endpoint and set of credentials. This capability lets you switch between models and use them in your application without changing any code.

This article explains how the Azure AI Foundry services (formerly known as Azure AI Services) organize models and how to use the inference endpoint to access them.

Deployments

Azure AI Foundry uses deployments to make models available. Deployments give a model a name and set specific configurations. You can access a model by using its deployment name in your requests.

A deployment includes:

A model name
A model version
A provisioning or capacity type¹
A content filtering configuration¹
A rate limiting configuration¹

¹ These configurations can change depending on the selected model.

An Azure AI Foundry resource can have many model deployments. You only pay for inference performed on model deployments. Deployments are Azure resources, so they're subject to Azure policies.

For more information about creating deployments, see Add and configure model deployments.

Endpoints

Azure AI Foundry services provide multiple endpoints depending on the type of work you want to perform:

Azure AI inference endpoint
Azure OpenAI inference endpoint

Azure AI inference endpoint

The Azure AI inference endpoint, usually of the form https://<resource-name>.services.ai.azure.com/models, enables you to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. All Foundry Models support this capability. This endpoint follows the Azure AI Model Inference API, which supports the following modalities:

Text embeddings
Image embeddings
Chat completions

Routing

The inference endpoint routes requests to a specific deployment by matching the name parameter in the request to the name of the deployment. This setup means that deployments work as an alias for a model under certain configurations. This flexibility lets you deploy a model multiple times in the service but with different configurations if needed.

For example, if you create a deployment named Mistral-large, you can invoke that deployment as follows:

Install the package azure-ai-inference using your package manager, like pip:

pip install azure-ai-inference

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

client = ChatCompletionsClient(
    endpoint="https://<resource>.services.ai.azure.com/models",
    credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]),
)

Explore our samples and read the API reference documentation to get yourself started.

Install the package @azure-rest/ai-inference using npm:

npm install @azure-rest/ai-inference

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

import ModelClient from "@azure-rest/ai-inference";
import { isUnexpected } from "@azure-rest/ai-inference";
import { AzureKeyCredential } from "@azure/core-auth";

const client = new ModelClient(
    "https://<resource>.services.ai.azure.com/models", 
    new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL)
);

Explore our samples and read the API reference documentation to get yourself started.

Install the Azure AI inference library with the following command:

dotnet add package Azure.AI.Inference --prerelease

Import the following namespaces:

using Azure;
using Azure.Identity;
using Azure.AI.Inference;

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

ChatCompletionsClient client = new ChatCompletionsClient(
    new Uri("https://<resource>.services.ai.azure.com/models"),
    new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL"))
);

Explore our samples and read the API reference documentation to get yourself started.

Add the package to your project:

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-ai-inference</artifactId>
    <version>1.0.0-beta.1</version>
</dependency>

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

ChatCompletionsClient client = new ChatCompletionsClientBuilder()
    .credential(new AzureKeyCredential("{key}"))
    .endpoint("https://<resource>.services.ai.azure.com/models")
    .buildClient();

Explore our samples and read the API reference documentation to get yourself started.

Use the reference section to explore the API design and which parameters are available. For example, the reference section for Chat completions details how to use the route /chat/completions to generate predictions based on chat-formatted instructions. Notice that the path /models is included to the root of the URL:

Request

POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
api-key: <api-key>
Content-Type: application/json

For a chat model, you can create a request as follows:

from azure.ai.inference.models import SystemMessage, UserMessage

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="Explain Riemann's conjecture in 1 paragraph"),
    ],
    model="mistral-large"
)

print(response.choices[0].message.content)

var messages = [
    { role: "system", content: "You are a helpful assistant" },
    { role: "user", content: "Explain Riemann's conjecture in 1 paragraph" },
];

var response = await client.path("/chat/completions").post({
    body: {
        messages: messages,
        model: "mistral-large"
    }
});

console.log(response.body.choices[0].message.content)

requestOptions = new ChatCompletionsOptions()
{
    Messages = {
        new ChatRequestSystemMessage("You are a helpful assistant."),
        new ChatRequestUserMessage("Explain Riemann's conjecture in 1 paragraph")
    },
    Model = "mistral-large"
};

response = client.Complete(requestOptions);
Console.WriteLine($"Response: {response.Value.Content}");

List<ChatRequestMessage> chatMessages = new ArrayList<>();
chatMessages.add(new ChatRequestSystemMessage("You are a helpful assistant"));
chatMessages.add(new ChatRequestUserMessage("Explain Riemann's conjecture in 1 paragraph"));

ChatCompletions chatCompletions = client.complete(new ChatCompletionsOptions(chatMessages));

for (ChatChoice choice : chatCompletions.getChoices()) {
    ChatResponseMessage message = choice.getMessage();
    System.out.println("Response:" + message.getContent());
}

Request

POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
api-key: <api-key>
Content-Type: application/json

{
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant"
        },
        {
            "role": "user",
            "content": "Explain Riemann's conjecture in 1 paragraph"
        }
    ],
    "model": "mistral-large"
}

If you specify a model name that doesn't match any model deployment, you get an error that the model doesn't exist. You control which models are available to users by creating model deployments. For more information, see add and configure model deployments.

Azure OpenAI inference endpoint

The Azure OpenAI API exposes the full capabilities of OpenAI models and supports more features like assistants, threads, files, and batch inference. You might also access non-OpenAI models through this route.

Azure OpenAI endpoints, usually of the form https://<resource-name>.openai.azure.com, work at the deployment level and each deployment has its own associated URL. However, you can use the same authentication mechanism to consume the deployments. For more information, see the reference page for Azure OpenAI API.

Each deployment has a URL that's formed by concatenating the Azure OpenAI base URL and the route /deployments/<model-deployment-name>.

Install the package openai using your package manager, like pip:

pip install openai --upgrade

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

import os
from openai import AzureOpenAI
    
client = AzureOpenAI(
    azure_endpoint = "https://<resource>.services.ai.azure.com"
    api_key=os.getenv("AZURE_INFERENCE_CREDENTIAL"),  
    api_version="2024-10-21",
)

Install the package openai using npm:

npm install openai

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

import { AzureKeyCredential } from "@azure/openai";

const endpoint = "https://<resource>.services.ai.azure.com";
const apiKey = new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL);
const apiVersion = "2024-10-21"

const client = new AzureOpenAI({ 
    endpoint, 
    apiKey, 
    apiVersion, 
    "deepseek-v3-0324"
});

Here, deepseek-v3-0324 is the name of a model deployment in the Azure AI Foundry resource.

Install the OpenAI library with the following command:

dotnet add package Azure.AI.OpenAI --prerelease

You can use the package to consume the model. The following example shows how to create a client to consume chat completions:

AzureOpenAIClient client = new(
    new Uri("https://<resource>.services.ai.azure.com"),
    new ApiKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL"))
);

Add the package to your project:

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-ai-openai</artifactId>
    <version>1.0.0-beta.16</version>
</dependency>

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

OpenAIClient client = new OpenAIClientBuilder()
    .credential(new AzureKeyCredential("{key}"))
    .endpoint("https://<resource>.services.ai.azure.com")
    .buildClient();

Request

POST https://<resource>.services.ai.azure.com/openai/deployments/deepseek-v3-0324/chat/completions?api-version=2024-10-21
api-key: <api-key>
Content-Type: application/json

Here, deepseek-v3-0324 is the name of a model deployment in the Azure AI Foundry resource.

response = client.chat.completions.create(
    model="deepseek-v3-0324", # Replace with your model dpeloyment name.
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain Riemann's conjecture in 1 paragraph"}
    ]
)

print(response.model_dump_json(indent=2)

var messages = [
    { role: "system", content: "You are a helpful assistant" },
    { role: "user", content: "Explain Riemann's conjecture in 1 paragraph" },
];

const response = await client.chat.completions.create({ messages, model: "deepseek-v3-0324" });

console.log(response.choices[0].message.content)

ChatCompletion response = chatClient.CompleteChat(
    [
        new SystemChatMessage("You are a helpful assistant."),
        new UserChatMessage("Explain Riemann's conjecture in 1 paragraph"),
    ]);

Console.WriteLine($"{response.Role}: {response.Content[0].Text}");

List<ChatRequestMessage> chatMessages = new ArrayList<>();
chatMessages.add(new ChatRequestSystemMessage("You are a helpful assistant"));
chatMessages.add(new ChatRequestUserMessage("Explain Riemann's conjecture in 1 paragraph"));

ChatCompletions chatCompletions = client.getChatCompletions("deepseek-v3-0324",
    new ChatCompletionsOptions(chatMessages));

System.out.printf("Model ID=%s is created at %s.%n", chatCompletions.getId(), chatCompletions.getCreatedAt());
for (ChatChoice choice : chatCompletions.getChoices()) {
    ChatResponseMessage message = choice.getMessage();
    System.out.printf("Index: %d, Chat Role: %s.%n", choice.getIndex(), message.getRole());
    System.out.println("Message:");
    System.out.println(message.getContent());
}

Here, deepseek-v3-0324 is the name of a model deployment in the Azure AI Foundry resource.

Request

POST https://<resource>.services.ai.azure.com/openai/deployments/deepseek-v3-0324/chat/completions?api-version=2024-10-21
api-key: <api-key>
Content-Type: application/json

{
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant"
        },
        {
            "role": "user",
            "content": "Explain Riemann's conjecture in 1 paragraph"
        }
    ]
}

Here, deepseek-v3-0324 is the name of a model deployment in the Azure AI Foundry resource.

For more information about how to use the Azure OpenAI endpoint, see Azure OpenAI in Azure AI Foundry Models documentation.

Keyless authentication

Models deployed to Azure AI Foundry Models in Azure AI services support keyless authorization by using Microsoft Entra ID. Keyless authorization enhances security, simplifies the user experience, reduces operational complexity, and provides robust compliance support for modern development. It makes keyless authorization a strong choice for organizations adopting secure and scalable identity management solutions.

To use keyless authentication, configure your resource and grant access to users to perform inference. After you configure the resource and grant access, authenticate as follows:

Install the package azure-ai-inference using your package manager, like pip:

pip install azure-ai-inference

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions with Entra ID:

import os
from azure.ai.inference import ChatCompletionsClient
from azure.identity import DefaultAzureCredential

client = ChatCompletionsClient(
    endpoint="https://<resource>.services.ai.azure.com/models",
    credential=DefaultAzureCredential(),
    credential_scopes=["https://cognitiveservices.azure.com/.default"],
)

Install the package @azure-rest/ai-inference using npm:

npm install @azure-rest/ai-inference

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions with Entra ID:

import ModelClient from "@azure-rest/ai-inference";
import { isUnexpected } from "@azure-rest/ai-inference";
import { DefaultAzureCredential } from "@azure/identity";

const clientOptions = { credentials: { "https://cognitiveservices.azure.com" } };

const client = new ModelClient(
    "https://<resource>.services.ai.azure.com/models", 
    new DefaultAzureCredential(),
    clientOptions,
);

Install the Azure AI inference library with the following command:

dotnet add package Azure.AI.Inference --prerelease

Install the Azure.Identity package:

dotnet add package Azure.Identity

Import the following namespaces:

using Azure;
using Azure.Identity;
using Azure.AI.Inference;

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions with Entra ID:

TokenCredential credential = new DefaultAzureCredential();
AzureAIInferenceClientOptions clientOptions = new AzureAIInferenceClientOptions();
BearerTokenAuthenticationPolicy tokenPolicy = new BearerTokenAuthenticationPolicy(credential, new string[] { "https://cognitiveservices.azure.com/.default" });
clientOptions.AddPolicy(tokenPolicy, HttpPipelinePosition.PerRetry);

ChatCompletionsClient client = new ChatCompletionsClient(
    new Uri("https://<resource>.services.ai.azure.com/models"),
    credential,
    clientOptions.
);

Add the package to your project:

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-ai-inference</artifactId>
    <version>1.0.0-beta.4</version>
</dependency>
<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-identity</artifactId>
    <version>1.15.3</version>
</dependency>

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

TokenCredential defaultCredential = new DefaultAzureCredentialBuilder().build();
ChatCompletionsClient client = new ChatCompletionsClientBuilder()
    .credential(defaultCredential)
    .endpoint("https://<resource>.services.ai.azure.com/models")
    .buildClient();

Explore our samples and read the API reference documentation to get yourself started.

Use the reference section to explore the API design and which parameters are available and indicate authentication token in the header Authorization. For example, the reference section for Chat completions details how to use the route /chat/completions to generate predictions based on chat-formatted instructions. Notice that the path /models is included to the root of the URL:

Request

POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json

Tokens have to be issued with scope https://cognitiveservices.azure.com/.default.

For testing purposes, the easiest way to get a valid token for your user account is to use the Azure CLI. In a console, run the following Azure CLI command:

az account get-access-token --resource https://cognitiveservices.azure.com --query "accessToken" --output tsv

Limitations

You can't use Azure OpenAI Batch with the Foundry Models endpoint. You have to use the dedicated deployment URL as explained in Batch API support in Azure OpenAI documentation.
Real-time API isn't supported in the inference endpoint. Use the dedicated deployment URL.

Share via

Endpoints for Azure AI Foundry Models

Deployments

Endpoints

Azure AI inference endpoint

Routing

Azure OpenAI inference endpoint

Keyless authentication

Limitations

Related content

Feedback

Additional resources