Supported metrics for Microsoft.CognitiveServices/accounts/projects

2025-08-28

The following table lists the metrics available for the Microsoft.CognitiveServices/accounts/projects resource type.

Table headings

Metric - The metric display name as it appears in the Azure portal.
Name in Rest API - Metric name as referred to in the REST API.
Unit - Unit of measure.
Aggregation - The default aggregation type. Valid values: Average, Minimum, Maximum, Total, Count.
Dimensions - Dimensions available for the metric.
Time Grains - Intervals at which the metric is sampled. For example, PT1M indicates that the metric is sampled every minute, PT30M every 30 minutes, PT1H every hour, and so on.
DS Export- Whether the metric is exportable to Azure Monitor Logs via Diagnostic Settings.

For information on exporting metrics, see - Metrics export using data collection rules and Create diagnostic settings in Azure Monitor.

For information on metric retention, see Azure Monitor Metrics overview.

Category: AI Agents

Metric	Name in REST API	Unit	Aggregation	Dimensions	Time Grains	DS Export
Agent Events (Preview) Number of events for AI Agents in this project.	`AgentEvents`	Count	Total (Sum), Average, Maximum, Minimum	`EventType`	PT1M	No
Agent Input Tokens (Preview) Number of input tokens for AI Agents in this project.	`AgentInputTokens`	Count	Total (Sum), Average, Maximum, Minimum	`AgentId`, `TokenType`	PT1M	No
Agent User Messages (Preview) Number of events for AI Agent user messages in this project.	`AgentMessages`	Count	Total (Sum), Average, Maximum, Minimum	`EventType`, `ThreadId`	PT1M	No
Agent Output Tokens (Preview) Number of output tokens for AI Agents in this project.	`AgentOutputTokens`	Count	Total (Sum), Average, Maximum, Minimum	`AgentId`, `TokenType`	PT1M	No
Agent Runs (Preview) Number of runs by AI Agents in this project.	`AgentRuns`	Count	Total (Sum), Average, Maximum, Minimum	`AgentId`, `RunStatus`, `StatusCode`, `ThreadId`, `StreamType`	PT1M	No
Agent Threads (Preview) Number of events for AI Agent threads in this project.	`AgentThreads`	Count	Total (Sum), Average, Maximum, Minimum	`EventType`	PT1M	No
Agent Tool Calls (Preview) Number of tool calls made by AI Agents in this project.	`AgentToolCalls`	Count	Total (Sum), Average, Maximum, Minimum	`AgentId`, `ToolName`	PT1M	No
Agent Usage Indexed Files (Preview) Number of files indexed for AI Agent usage like retrieval in this project.	`AgentUsageIndexedFiles`	Count	Total (Sum), Average, Maximum, Minimum	`ErrorCode`, `Status`, `VectorStoreId`	PT1M	No

Category: Models - HTTP Requests

Metric	Name in REST API	Unit	Aggregation	Dimensions	Time Grains	DS Export
Model Availability Rate Availability percentage with the following calculation: (Total Calls - Server Errors)/Total Calls. Server Errors include any HTTP responses >=500.	`ModelAvailabilityRate`	Percent	Minimum, Maximum, Average	`Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion`	PT1M	No
Model Requests Number of calls made to the model API over a period of time. Applies to PTU, PTU-Managed and Pay-as-you-go deployments.	`ModelRequests`	Count	Total (Sum)	`ApiName`, `OperationName`, `Region`, `StreamType`, `ModelDeploymentName`, `ModelName`, `ModelVersion`, `StatusCode`	PT1M	Yes

Category: Models - Latency

Metric	Name in REST API	Unit	Aggregation	Dimensions	Time Grains	DS Export
Time Between Tokens For streaming requests; Model token generation rate, measured in milliseconds. Applies to PTU and PTU-managed deployments.	`NormalizedTimeBetweenTokens`	MilliSeconds	Maximum, Minimum, Average	`ApiName`, `OperationName`, `Region`, `StreamType`, `ModelDeploymentName`, `ModelName`, `ModelVersion`	PT1M	Yes
Normalized Time to First Byte For streaming and non-streaming requests; time it takes for first byte of response data to be received after request is made by model, normalized by token. Applies to PTU, PTU-managed, and Pay-as-you-go deployments.	`NormalizedTimeToFirstToken`	MilliSeconds	Maximum, Minimum, Average	`ApiName`, `OperationName`, `Region`, `StreamType`, `ModelDeploymentName`, `ModelName`, `ModelVersion`	PT1M	Yes
Time to Last Byte For streaming and non-streaming requests; time it takes for last byte of response data to be received after request is made by model. Applies to PTU, PTU-managed, and Pay-as-you-go deployments.	`TimeToLastByte`	MilliSeconds	Maximum, Minimum, Average	`ApiName`, `OperationName`, `Region`, `StreamType`, `ModelDeploymentName`, `ModelName`, `ModelVersion`	PT1M	Yes
Time to Response Recommended latency (responsiveness) measure for streaming requests. Applies to PTU and PTU-managed deployments. Calculated as time taken for the first response to appear after a user sends a prompt, as measured by the API gateway. This number increases as the prompt size increases and/or cache hit size reduces. To breakdown time to response metric, you can add a filter or apply splitting by the following dimensions: ModelDeploymentName, ModelName, and ModelVersion. Note: this metric is an approximation as measured latency is heavily dependent on multiple factors, including concurrent calls and overall workload pattern. In addition, it does not account for any client-side latency that may exist between your client and the API endpoint. Please refer to your own logging for optimal latency tracking.	`TimeToResponse`	MilliSeconds	Minimum, Maximum, Average	`ApiName`, `OperationName`, `Region`, `StreamType`, `ModelDeploymentName`, `ModelName`, `ModelVersion`, `StatusCode`	PT1M	Yes
Tokens Per Second Enumerates the generation speed for a given model response. The total tokens generated is divided by the time to generate the tokens, in seconds. Applies to PTU and PTU-managed deployments.	`TokensPerSecond`	Count	Maximum, Minimum, Average	`ApiName`, `OperationName`, `Region`, `StreamType`, `ModelDeploymentName`, `ModelName`, `ModelVersion`	PT1M	Yes

Category: Models - Usage

Metric	Name in REST API	Unit	Aggregation	Dimensions	Time Grains	DS Export
Audio Output Tokens Number of audio prompt tokens generated (output) on an OpenAI model. Applies to PTU-managed model deployments.	`AudioOutputTokens`	Count	Total (Sum)	`ModelDeploymentName`, `ModelName`, `ModelVersion`, `Region`	PT1M	Yes
Input Tokens Number of prompt tokens processed (input) on a model. Applies to PTU, PTU-Managed and Pay-as-you-go deployments.	`InputTokens`	Count	Total (Sum)	`ApiName`, `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion`	PT1M	Yes
Output Tokens Number of tokens generated (output) from an OpenAI model. Applies to PTU, PTU-Managed and Pay-as-you-go deployments.	`OutputTokens`	Count	Total (Sum)	`ApiName`, `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion`	PT1M	Yes
Provisioned Utilization Utilization % for a provisoned-managed deployment, calculated as (PTUs consumed / PTUs deployed) x 100. When utilization is greater than or equal to 100%, calls are throttled and error code 429 returned.	`ProvisionedUtilization`	Percent	Minimum, Maximum, Average	`Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion`	PT1M	No
Total Tokens Number of inference tokens processed on a model. Calculated as prompt tokens (input) plus generated tokens (output). Applies to PTU, PTU-Managed and Pay-as-you-go deployments.	`TotalTokens`	Count	Total (Sum)	`ApiName`, `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion`	PT1M	Yes

Share via

Supported metrics for Microsoft.CognitiveServices/accounts/projects

Category: AI Agents

Category: Models - HTTP Requests

Category: Models - Latency

Category: Models - Usage

Next steps

Feedback

Additional resources