Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The following table lists the metrics available for the Microsoft.CognitiveServices/accounts/projects resource type.
Table headings
Metric - The metric display name as it appears in the Azure portal.
Name in Rest API - Metric name as referred to in the REST API.
Unit - Unit of measure.
Aggregation - The default aggregation type. Valid values: Average, Minimum, Maximum, Total, Count.
Dimensions - Dimensions available for the metric.
Time Grains - Intervals at which the metric is sampled. For example, PT1M
indicates that the metric is sampled every minute, PT30M
every 30 minutes, PT1H
every hour, and so on.
DS Export- Whether the metric is exportable to Azure Monitor Logs via Diagnostic Settings.
For information on exporting metrics, see - Metrics export using data collection rules and Create diagnostic settings in Azure Monitor.
For information on metric retention, see Azure Monitor Metrics overview.
Category: AI Agents
Metric | Name in REST API | Unit | Aggregation | Dimensions | Time Grains | DS Export |
---|---|---|---|---|---|---|
Agent Events (Preview) Number of events for AI Agents in this project. |
AgentEvents |
Count | Total (Sum), Average, Maximum, Minimum | EventType |
PT1M | No |
Agent Input Tokens (Preview) Number of input tokens for AI Agents in this project. |
AgentInputTokens |
Count | Total (Sum), Average, Maximum, Minimum | AgentId , TokenType |
PT1M | No |
Agent User Messages (Preview) Number of events for AI Agent user messages in this project. |
AgentMessages |
Count | Total (Sum), Average, Maximum, Minimum | EventType , ThreadId |
PT1M | No |
Agent Output Tokens (Preview) Number of output tokens for AI Agents in this project. |
AgentOutputTokens |
Count | Total (Sum), Average, Maximum, Minimum | AgentId , TokenType |
PT1M | No |
Agent Runs (Preview) Number of runs by AI Agents in this project. |
AgentRuns |
Count | Total (Sum), Average, Maximum, Minimum | AgentId , RunStatus , StatusCode , ThreadId , StreamType |
PT1M | No |
Agent Threads (Preview) Number of events for AI Agent threads in this project. |
AgentThreads |
Count | Total (Sum), Average, Maximum, Minimum | EventType |
PT1M | No |
Agent Tool Calls (Preview) Number of tool calls made by AI Agents in this project. |
AgentToolCalls |
Count | Total (Sum), Average, Maximum, Minimum | AgentId , ToolName |
PT1M | No |
Agent Usage Indexed Files (Preview) Number of files indexed for AI Agent usage like retrieval in this project. |
AgentUsageIndexedFiles |
Count | Total (Sum), Average, Maximum, Minimum | ErrorCode , Status , VectorStoreId |
PT1M | No |
Category: Models - HTTP Requests
Metric | Name in REST API | Unit | Aggregation | Dimensions | Time Grains | DS Export |
---|---|---|---|---|---|---|
Model Availability Rate Availability percentage with the following calculation: (Total Calls - Server Errors)/Total Calls. Server Errors include any HTTP responses >=500. |
ModelAvailabilityRate |
Percent | Minimum, Maximum, Average | Region , ModelDeploymentName , ModelName , ModelVersion |
PT1M | No |
Model Requests Number of calls made to the model API over a period of time. Applies to PTU, PTU-Managed and Pay-as-you-go deployments. |
ModelRequests |
Count | Total (Sum) | ApiName , OperationName , Region , StreamType , ModelDeploymentName , ModelName , ModelVersion , StatusCode |
PT1M | Yes |
Category: Models - Latency
Metric | Name in REST API | Unit | Aggregation | Dimensions | Time Grains | DS Export |
---|---|---|---|---|---|---|
Time Between Tokens For streaming requests; Model token generation rate, measured in milliseconds. Applies to PTU and PTU-managed deployments. |
NormalizedTimeBetweenTokens |
MilliSeconds | Maximum, Minimum, Average | ApiName , OperationName , Region , StreamType , ModelDeploymentName , ModelName , ModelVersion |
PT1M | Yes |
Normalized Time to First Byte For streaming and non-streaming requests; time it takes for first byte of response data to be received after request is made by model, normalized by token. Applies to PTU, PTU-managed, and Pay-as-you-go deployments. |
NormalizedTimeToFirstToken |
MilliSeconds | Maximum, Minimum, Average | ApiName , OperationName , Region , StreamType , ModelDeploymentName , ModelName , ModelVersion |
PT1M | Yes |
Time to Last Byte For streaming and non-streaming requests; time it takes for last byte of response data to be received after request is made by model. Applies to PTU, PTU-managed, and Pay-as-you-go deployments. |
TimeToLastByte |
MilliSeconds | Maximum, Minimum, Average | ApiName , OperationName , Region , StreamType , ModelDeploymentName , ModelName , ModelVersion |
PT1M | Yes |
Time to Response Recommended latency (responsiveness) measure for streaming requests. Applies to PTU and PTU-managed deployments. Calculated as time taken for the first response to appear after a user sends a prompt, as measured by the API gateway. This number increases as the prompt size increases and/or cache hit size reduces. To breakdown time to response metric, you can add a filter or apply splitting by the following dimensions: ModelDeploymentName, ModelName, and ModelVersion. Note: this metric is an approximation as measured latency is heavily dependent on multiple factors, including concurrent calls and overall workload pattern. In addition, it does not account for any client-side latency that may exist between your client and the API endpoint. Please refer to your own logging for optimal latency tracking. |
TimeToResponse |
MilliSeconds | Minimum, Maximum, Average | ApiName , OperationName , Region , StreamType , ModelDeploymentName , ModelName , ModelVersion , StatusCode |
PT1M | Yes |
Tokens Per Second Enumerates the generation speed for a given model response. The total tokens generated is divided by the time to generate the tokens, in seconds. Applies to PTU and PTU-managed deployments. |
TokensPerSecond |
Count | Maximum, Minimum, Average | ApiName , OperationName , Region , StreamType , ModelDeploymentName , ModelName , ModelVersion |
PT1M | Yes |
Category: Models - Usage
Metric | Name in REST API | Unit | Aggregation | Dimensions | Time Grains | DS Export |
---|---|---|---|---|---|---|
Audio Output Tokens Number of audio prompt tokens generated (output) on an OpenAI model. Applies to PTU-managed model deployments. |
AudioOutputTokens |
Count | Total (Sum) | ModelDeploymentName , ModelName , ModelVersion , Region |
PT1M | Yes |
Input Tokens Number of prompt tokens processed (input) on a model. Applies to PTU, PTU-Managed and Pay-as-you-go deployments. |
InputTokens |
Count | Total (Sum) | ApiName , Region , ModelDeploymentName , ModelName , ModelVersion |
PT1M | Yes |
Output Tokens Number of tokens generated (output) from an OpenAI model. Applies to PTU, PTU-Managed and Pay-as-you-go deployments. |
OutputTokens |
Count | Total (Sum) | ApiName , Region , ModelDeploymentName , ModelName , ModelVersion |
PT1M | Yes |
Provisioned Utilization Utilization % for a provisoned-managed deployment, calculated as (PTUs consumed / PTUs deployed) x 100. When utilization is greater than or equal to 100%, calls are throttled and error code 429 returned. |
ProvisionedUtilization |
Percent | Minimum, Maximum, Average | Region , ModelDeploymentName , ModelName , ModelVersion |
PT1M | No |
Total Tokens Number of inference tokens processed on a model. Calculated as prompt tokens (input) plus generated tokens (output). Applies to PTU, PTU-Managed and Pay-as-you-go deployments. |
TotalTokens |
Count | Total (Sum) | ApiName , Region , ModelDeploymentName , ModelName , ModelVersion |
PT1M | Yes |