Edit

Share via


Supported metrics for Microsoft.CognitiveServices/accounts/projects

The following table lists the metrics available for the Microsoft.CognitiveServices/accounts/projects resource type.

Table headings

Metric - The metric display name as it appears in the Azure portal.
Name in Rest API - Metric name as referred to in the REST API.
Unit - Unit of measure.
Aggregation - The default aggregation type. Valid values: Average, Minimum, Maximum, Total, Count.
Dimensions - Dimensions available for the metric.
Time Grains - Intervals at which the metric is sampled. For example, PT1M indicates that the metric is sampled every minute, PT30M every 30 minutes, PT1H every hour, and so on.
DS Export- Whether the metric is exportable to Azure Monitor Logs via Diagnostic Settings.

For information on exporting metrics, see - Metrics export using data collection rules and Create diagnostic settings in Azure Monitor.

For information on metric retention, see Azure Monitor Metrics overview.

Category: AI Agents

Metric Name in REST API Unit Aggregation Dimensions Time Grains DS Export
Agent Events (Preview)

Number of events for AI Agents in this project.
AgentEvents Count Total (Sum), Average, Maximum, Minimum EventType PT1M No
Agent Input Tokens (Preview)

Number of input tokens for AI Agents in this project.
AgentInputTokens Count Total (Sum), Average, Maximum, Minimum AgentId, TokenType PT1M No
Agent User Messages (Preview)

Number of events for AI Agent user messages in this project.
AgentMessages Count Total (Sum), Average, Maximum, Minimum EventType, ThreadId PT1M No
Agent Output Tokens (Preview)

Number of output tokens for AI Agents in this project.
AgentOutputTokens Count Total (Sum), Average, Maximum, Minimum AgentId, TokenType PT1M No
Agent Runs (Preview)

Number of runs by AI Agents in this project.
AgentRuns Count Total (Sum), Average, Maximum, Minimum AgentId, RunStatus, StatusCode, ThreadId, StreamType PT1M No
Agent Threads (Preview)

Number of events for AI Agent threads in this project.
AgentThreads Count Total (Sum), Average, Maximum, Minimum EventType PT1M No
Agent Tool Calls (Preview)

Number of tool calls made by AI Agents in this project.
AgentToolCalls Count Total (Sum), Average, Maximum, Minimum AgentId, ToolName PT1M No
Agent Usage Indexed Files (Preview)

Number of files indexed for AI Agent usage like retrieval in this project.
AgentUsageIndexedFiles Count Total (Sum), Average, Maximum, Minimum ErrorCode, Status, VectorStoreId PT1M No

Category: Models - HTTP Requests

Metric Name in REST API Unit Aggregation Dimensions Time Grains DS Export
Model Availability Rate

Availability percentage with the following calculation: (Total Calls - Server Errors)/Total Calls. Server Errors include any HTTP responses >=500.
ModelAvailabilityRate Percent Minimum, Maximum, Average Region, ModelDeploymentName, ModelName, ModelVersion PT1M No
Model Requests

Number of calls made to the model API over a period of time. Applies to PTU, PTU-Managed and Pay-as-you-go deployments.
ModelRequests Count Total (Sum) ApiName, OperationName, Region, StreamType, ModelDeploymentName, ModelName, ModelVersion, StatusCode PT1M Yes

Category: Models - Latency

Metric Name in REST API Unit Aggregation Dimensions Time Grains DS Export
Time Between Tokens

For streaming requests; Model token generation rate, measured in milliseconds. Applies to PTU and PTU-managed deployments.
NormalizedTimeBetweenTokens MilliSeconds Maximum, Minimum, Average ApiName, OperationName, Region, StreamType, ModelDeploymentName, ModelName, ModelVersion PT1M Yes
Normalized Time to First Byte

For streaming and non-streaming requests; time it takes for first byte of response data to be received after request is made by model, normalized by token. Applies to PTU, PTU-managed, and Pay-as-you-go deployments.
NormalizedTimeToFirstToken MilliSeconds Maximum, Minimum, Average ApiName, OperationName, Region, StreamType, ModelDeploymentName, ModelName, ModelVersion PT1M Yes
Time to Last Byte

For streaming and non-streaming requests; time it takes for last byte of response data to be received after request is made by model. Applies to PTU, PTU-managed, and Pay-as-you-go deployments.
TimeToLastByte MilliSeconds Maximum, Minimum, Average ApiName, OperationName, Region, StreamType, ModelDeploymentName, ModelName, ModelVersion PT1M Yes
Time to Response

Recommended latency (responsiveness) measure for streaming requests. Applies to PTU and PTU-managed deployments. Calculated as time taken for the first response to appear after a user sends a prompt, as measured by the API gateway. This number increases as the prompt size increases and/or cache hit size reduces. To breakdown time to response metric, you can add a filter or apply splitting by the following dimensions: ModelDeploymentName, ModelName, and ModelVersion.

Note: this metric is an approximation as measured latency is heavily dependent on multiple factors, including concurrent calls and overall workload pattern. In addition, it does not account for any client-side latency that may exist between your client and the API endpoint. Please refer to your own logging for optimal latency tracking.
TimeToResponse MilliSeconds Minimum, Maximum, Average ApiName, OperationName, Region, StreamType, ModelDeploymentName, ModelName, ModelVersion, StatusCode PT1M Yes
Tokens Per Second

Enumerates the generation speed for a given model response. The total tokens generated is divided by the time to generate the tokens, in seconds. Applies to PTU and PTU-managed deployments.
TokensPerSecond Count Maximum, Minimum, Average ApiName, OperationName, Region, StreamType, ModelDeploymentName, ModelName, ModelVersion PT1M Yes

Category: Models - Usage

Metric Name in REST API Unit Aggregation Dimensions Time Grains DS Export
Audio Output Tokens

Number of audio prompt tokens generated (output) on an OpenAI model. Applies to PTU-managed model deployments.
AudioOutputTokens Count Total (Sum) ModelDeploymentName, ModelName, ModelVersion, Region PT1M Yes
Input Tokens

Number of prompt tokens processed (input) on a model. Applies to PTU, PTU-Managed and Pay-as-you-go deployments.
InputTokens Count Total (Sum) ApiName, Region, ModelDeploymentName, ModelName, ModelVersion PT1M Yes
Output Tokens

Number of tokens generated (output) from an OpenAI model. Applies to PTU, PTU-Managed and Pay-as-you-go deployments.
OutputTokens Count Total (Sum) ApiName, Region, ModelDeploymentName, ModelName, ModelVersion PT1M Yes
Provisioned Utilization

Utilization % for a provisoned-managed deployment, calculated as (PTUs consumed / PTUs deployed) x 100. When utilization is greater than or equal to 100%, calls are throttled and error code 429 returned.
ProvisionedUtilization Percent Minimum, Maximum, Average Region, ModelDeploymentName, ModelName, ModelVersion PT1M No
Total Tokens

Number of inference tokens processed on a model. Calculated as prompt tokens (input) plus generated tokens (output). Applies to PTU, PTU-Managed and Pay-as-you-go deployments.
TotalTokens Count Total (Sum) ApiName, Region, ModelDeploymentName, ModelName, ModelVersion PT1M Yes

Next steps