Hi Ben Sherrod, Thank you for your question on Microsoft Q&A forum.
To help you identify the specific error that caused your Azure Monitor Alert, please follow the solution approach recommended below.
Identify the type of alert:
- Navigate to Azure Monitor → Alerts → [Your Alert] → Alert Details.
- Check the Condition: Determine whether it’s a Metric alert or a Log alert.
- Metric alert-Tracks counts or thresholds (e.g., request failures).
- Log alert- Triggered based on a Log Analytics query.
Review the source logs:
If it’s a metric alert, determine which table or service generates the metric (e.g., Application Insights, Azure Functions).
- Go to Application Insights (if your API or app is instrumented) or the relevant Log Analytics workspace.
- Query for exceptions or failed requests. For example, in Application Insights Failed requests requests | where success == false | order by timestamp desc | take 50 Exceptions exceptions | order by timestamp desc | take 50
- This will provide the actual error message, stack trace, or API response code that caused the alert to trigger.
But as per the provided screenshot we can conclude the error shown in your Azure alert is from a metric, The alert summary specifies that it was triggered because the total errors crossed the threshold
Check Diagnostic Settings
- Go to your Azure OpenAI resource → Monitoring → Diagnostic settings.
- Ensure logs are being sent to Log Analytics, Storage, or Event Hub and enable all log categories related to requests and responses.
Refer document: https://learn.microsoft.com/en-us/azure/azure-monitor/platform/diagnostic-settings?tabs=portal
Run Queries in Log Analytics:
Here's a sample query you can use to find the errors:
union isfuzzy=true AzureOpenAIServiceLogs, AzureDiagnostics
| where _ResourceId has "/providers/Microsoft.CognitiveServices/accounts/your_resource_name"
| where TimeGenerated > ago(24h)
| where tostring(ResponseCode) startswith "4" or tostring(ResponseCode) startswith "5"
| project TimeGenerated, OperationName, ApiName, DeploymentName, ResponseCode, ErrorCode, ErrorMessage, RequestId, ClientIp
| order by TimeGenerated desc
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| where Category == "RequestLogs"
| where statusCode >= 400
| project TimeGenerated, operationName, statusCode, resultSignature, callerIpAddress, requestUri_s, message
| order by TimeGenerated desc
Check Application Insights (if enabled)
- Go to Application Insights > Failures.
- Filter by optibot and the same time window.
- Look for exceptions or failed dependencies.
Review API Usage Metrics
- Go to Azure OpenAI > Metrics.
- Add a chart for
Total Errors
and filter by affected resource.
- This can help correlate spikes with specific operations
AzureDiagnostics
| where TimeGenerated between (datetime(2025-08-21 12:00)..datetime(2025-08-21 12:10))
| where statusCode >= 400
Make sure your account has Monitoring Reader or Log Analytics Reader roles for the resource group or subscription.
Match logs to the alert
Use the alert timestamp to filter logs. Often the reason you see “nothing” is because the logs queried don’t include the exact time when the alert fired.
Example: filter logs for the alert timeframe
exceptions
| where timestamp between(datetime(2025-08-21 12:00) .. datetime(2025-08-21 12:10))
| order by timestamp desc
Enable alert details in email
- Some alerts can include a link to the query results in the email. Enable this in the alert rule to quickly jump to the raw logs next time
To know more about Alerts you can refer https://learn.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-manage-alert-instances
I Hope provided steps help resolve your issue, please let me if you have any further queries.