instability around Azure AI Foundry Agent Threads Run with deployment using gpt-4o base model

dillon.bailey 25 Reputation points
2025-08-27T05:32:48.5933333+00:00

We have noticed an instability around Azure AI Foundry Agent Threads Run with deployment using gpt-4o base model on US East Region. In between agents.runs.create() and agentRun.stream() , there is at least 120s delay before the run has been created and processed.There is no way we can monitor on Azure AI Foundry or other observability services on Azure.

const client = new AIProjectClient(endpoint, new DefaultAzureCredential());

const agentRun: AgentRunResponse = client.agents.runs.create(thread.id, agentId)

const streamMessages = await agentRun.stream();

The endpoint would, occasionally, return rest API error

/Users/bedatse/Developer/ennie/healthis_be/node_modules/@typespec/ts-http-runtime/src/client/restError.ts:27
  return new RestError(message, {
         ^


RestError: Service invocation timed out. 
Request: POST obotoken.vienna-eastus.svc/obotoken/v1.0/saveusertoken/v2 
 Message: Unable to read data from the transport connection: Operation canceled. Time waited: 00:00:09.9986723
    at createRestError (/Users/bedatse/Developer/ennie/healthis_be/node_modules/@typespec/ts-http-runtime/src/client/restError.ts:27:10)
    at createRestError (/Users/bedatse/Developer/ennie/healthis_be/node_modules/@azure-rest/core-client/src/restError.ts:27:30)
    at _createRunDeserialize (/Users/bedatse/Developer/ennie/healthis_be/node_modules/@azure/ai-agents/src/api/runs/operations.ts:360:26)
    at executeCreateRun (/Users/bedatse/Developer/ennie/healthis_be/node_modules/@azure/ai-agents/src/api/runs/operations.ts:375:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async testAilmentExtractionAgent (/Users/bedatse/Developer/ennie/healthis_be/agents/evaluation/en-395-performance-issue.ts:93:24)
    at async <anonymous> (/Users/bedatse/Developer/ennie/healthis_be/agents/evaluation/en-395-performance-issue.ts:130:18) {
  code: 'SystemError.ServiceTimeoutException',
  statusCode: 500
  return new RestError(message, {
         ^


RestError: Unexpected status code: 500
    at createRestError (/Users/bedatse/Developer/ennie/healthis_be/node_modules/@typespec/ts-http-runtime/src/client/restError.ts:27:10)
    at createRestError (/Users/bedatse/Developer/ennie/healthis_be/node_modules/@azure-rest/core-client/src/restError.ts:27:30)
    at processStream (/Users/bedatse/Developer/ennie/healthis_be/node_modules/@azure/ai-agents/src/api/operations.ts:453:26)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async testAilmentExtractionAgent (/Users/bedatse/Developer/ennie/healthis_be/agents/evaluation/en-395-performance-issue.ts:96:26)
    at async <anonymous> (/Users/bedatse/Developer/ennie/healthis_be/agents/evaluation/en-395-performance-issue.ts:131:18) {
  code: undefined,
  statusCode: 500
}
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
0 comments No comments
{count} vote

1 answer

Sort by: Most helpful
  1. Sina Salam 23,931 Reputation points Volunteer Moderator
    2025-08-27T21:03:29.7166667+00:00

    Hello dillon.bailey,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you noticed instability around Azure AI Foundry Agent.

    Yea, there are many reports on this, but I think what I saw with your scenario can be managed. The steps and ways you can solve it depend on the following:

    1. You will need to use single-call streaming or createAndPoll / createThreadAndRun, eliminates the race where the run exists server-side but is not yet streamable; the SDK provides helpers that either stream the run in one call or poll on your behalf. Use these methods instead of create() then immediately .stream(). This is an exact JS/TS example (documented by Microsoft docs):
         // Install:
         // npm install @azure/ai-projects @azure/identity
         import { AIProjectClient } from "@azure/ai-projects";
         import { DefaultAzureCredential } from "@azure/identity";
         const endpoint = process.env.PROJECT_ENDPOINT!;
         const client = new AIProjectClient(endpoint, new DefaultAzureCredential(), {
           // optional: tune client retry policy
           retryOptions: { maxRetries: 5, retryDelayInMs: 1000, maxRetryDelayInMs: 10000 }
         });
         // Preferred: create + stream in one operation (SDK example)
         async function runAgentAndStream(threadId: string, agentId: string) {
           try {
             // SDK sample pattern (create then stream) that avoids the manual race when used as shown in docs:
             const stream = await client.runs.create(threadId, agentId).stream();
             for await (const event of stream) {
               // event is one of the RunStreamEvent types (message.delta, error, done, etc.)
               // handle messages and tool calls here
               console.log("Stream event:", event);
               if (event.type === "error") {
                 console.error("Run error:", event);
               }
               if (event.type === "done") {
                 console.log("Run finished.");
               }
             }
           } catch (err) {
             console.error("Stream failed:", err);
             throw err;
           }
         }
      
      This exact approach (client.runs.create(...).stream() / createAndPoll(...) / createThreadAndRun(...)) is present in Microsoft’s JS docs and samples. Use createAndPoll if you want the SDK to poll for readiness for you. - https://learn.microsoft.com/en-us/javascript/api/overview/azure/ai-agents-readme
    2. Robust fallback pattern is another way (if you must separate create > stream). If your runtime forces you to create() separately (legacy code or tooling), use:
      1. Idempotency metadata: before create, generate a client requestId and include it in the run metadata. The REST/SDK’s create-run supports metadata. If create() times out, listRuns(threadId) and search for a run whose metadata includes your requestId. That tells you the server actually created the run. - https://learn.microsoft.com/en-us/rest/api/aifoundry/aiagents/runs/create-run?view=rest-aifoundry-aiagents-v1
      2. Exponential backoff polling for run readiness, not a blind immediate .stream() call. TS/Node sample (robust):
              import { AIProjectClient } from "@azure/ai-projects";
              import { DefaultAzureCredential } from "@azure/identity";
              import { setTimeout as sleep } from "node:timers/promises";
              import { v4 as uuidv4 } from "uuid";
              const client = new AIProjectClient(process.env.PROJECT_ENDPOINT!, new DefaultAzureCredential(), {
                retryOptions: { maxRetries: 5, retryDelayInMs: 1000, maxRetryDelayInMs: 10000 }
              });
              async function createRunWithIdempotency(threadId: string, agentId: string, payload?: any) {
                const requestId = uuidv4();
                // attach metadata.requestId so we can detect a server-created run
                const createBody = {
                  ...payload,
                  metadata: { clientRequestId: requestId }
                };
                let run: any = undefined;
                try {
                  // SDK call - exact parameter shape depends on SDK version; many SDKs accept body or options
                  run = await client.runs.create(threadId, agentId, { body: createBody });
                } catch (err) {
                  // If create timed out or errored, try to find a run with our requestId
                  console.warn("create() failed — checking if run was created anyway:", err);
                  const runsIter = client.runs.list(threadId, { order: "desc", limit: 10 }); // page through recent runs
                  for await (const r of runsIter) {
                    if (r.metadata?.clientRequestId === requestId) {
                      run = r;
                      console.log("Found server-side run created despite create() error:", run.id);
                      break;
                    }
                  }
                  if (!run) throw err; // rethrow original error if no server-side run found
                }
                // Poll run readiness with exponential backoff
                const deadline = Date.now() + 5  60_000; // 5 minutes
                let delay = 1000;
                while (Date.now() < deadline) {
                  const current = await client.runs.get(threadId, run.id);
                  if (current.status === "in_progress" || current.status === "queued") {
                    // not ready yet
                  } else {
                    // if terminal (completed/failed/etc.) break to process
                    break;
                  }
                  await sleep(delay);
                  delay = Math.min(8000, Math.floor(delay  1.8));
                }
                // Now stream events (or GET messages if stream is not supported)
                try {
                  const eventStream = await client.runs.events(threadId, run.id);
                  for await (const ev of eventStream) {
                    console.log("event:", ev);
                  }
                } catch (evErr) {
                  // fallback: read final messages
                  console.warn("Streaming events failed, falling back to messages list:", evErr);
                  const messages = await client.threads.listMessages(threadId);
                  return messages;
                }
              }
        
        Check the following links - https://learn.microsoft.com/en-us/rest/api/aifoundry/aiagents/runs/list-runs?view=rest-aifoundry-aiagents-v1 and https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/tools/function-calling
    3. Set client retry options when constructing AIProjectClient, e.g. retryOptions: { maxRetries: 5, retryDelayInMs: 1000, maxRetryDelayInMs: 10000 }. Use them to safely retry transient 5xx/408/504 failures. - https://learn.microsoft.com/en-us/javascript/api/%40azure/ai-projects/aiprojectclientoptionalparams?view=azure-node-preview Do not blindly retry create() on client-side timeouts use metadata-requestId + listRuns to detect whether the server already created the run (de-dup). - https://learn.microsoft.com/en-us/rest/api/aifoundry/aiagents/runs/create-run?view=rest-aifoundry-aiagents-v1
    4. In the Azure AI Foundry project portal: Tracing > connect an Application Insights resource. Instrument your Node.js app with Azure Monitor OpenTelemetry distro. - https://learn.microsoft.com/en-us/azure/ai-foundry/agents/concepts/tracing
    5. Create alerts for spikes of StatusCode 5xx or Run failures. - https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/metrics
    6. Enable continuous evaluation and the Foundry dashboards to watch run success rate, latency, tool failures. - https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/continuous-evaluation-agents
    7. If you are using Managed Identity, ensure the identity has the right roles on the target resource. If the error is 403/500 on the internal token service, gather tracing/Request-IDs and file a support ticket with Microsoft via your Portal.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.