instability around Azure AI Foundry Agent Threads Run with deployment using gpt-4o base model

Question

instability around Azure AI Foundry Agent Threads Run with deployment using gpt-4o base model

dillon.bailey 25

We have noticed an instability around Azure AI Foundry Agent Threads Run with deployment using gpt-4o base model on US East Region. In between agents.runs.create() and agentRun.stream() , there is at least 120s delay before the run has been created and processed.There is no way we can monitor on Azure AI Foundry or other observability services on Azure.

const client = new AIProjectClient(endpoint, new DefaultAzureCredential());

const agentRun: AgentRunResponse = client.agents.runs.create(thread.id, agentId)

const streamMessages = await agentRun.stream();

The endpoint would, occasionally, return rest API error

/Users/bedatse/Developer/ennie/healthis_be/node_modules/@typespec/ts-http-runtime/src/client/restError.ts:27
  return new RestError(message, {
         ^


RestError: Service invocation timed out. 
Request: POST obotoken.vienna-eastus.svc/obotoken/v1.0/saveusertoken/v2 
 Message: Unable to read data from the transport connection: Operation canceled. Time waited: 00:00:09.9986723
    at createRestError (/Users/bedatse/Developer/ennie/healthis_be/node_modules/@typespec/ts-http-runtime/src/client/restError.ts:27:10)
    at createRestError (/Users/bedatse/Developer/ennie/healthis_be/node_modules/@azure-rest/core-client/src/restError.ts:27:30)
    at _createRunDeserialize (/Users/bedatse/Developer/ennie/healthis_be/node_modules/@azure/ai-agents/src/api/runs/operations.ts:360:26)
    at executeCreateRun (/Users/bedatse/Developer/ennie/healthis_be/node_modules/@azure/ai-agents/src/api/runs/operations.ts:375:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async testAilmentExtractionAgent (/Users/bedatse/Developer/ennie/healthis_be/agents/evaluation/en-395-performance-issue.ts:93:24)
    at async <anonymous> (/Users/bedatse/Developer/ennie/healthis_be/agents/evaluation/en-395-performance-issue.ts:130:18) {
  code: 'SystemError.ServiceTimeoutException',
  statusCode: 500

  return new RestError(message, {
         ^


RestError: Unexpected status code: 500
    at createRestError (/Users/bedatse/Developer/ennie/healthis_be/node_modules/@typespec/ts-http-runtime/src/client/restError.ts:27:10)
    at createRestError (/Users/bedatse/Developer/ennie/healthis_be/node_modules/@azure-rest/core-client/src/restError.ts:27:30)
    at processStream (/Users/bedatse/Developer/ennie/healthis_be/node_modules/@azure/ai-agents/src/api/operations.ts:453:26)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async testAilmentExtractionAgent (/Users/bedatse/Developer/ennie/healthis_be/agents/evaluation/en-395-performance-issue.ts:96:26)
    at async <anonymous> (/Users/bedatse/Developer/ennie/healthis_be/agents/evaluation/en-395-performance-issue.ts:131:18) {
  code: undefined,
  statusCode: 500
}

1 answer

Your answer

Answer 1

Hello dillon.bailey,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you noticed instability around Azure AI Foundry Agent.

Yea, there are many reports on this, but I think what I saw with your scenario can be managed. The steps and ways you can solve it depend on the following:

You will need to use single-call streaming or createAndPoll / createThreadAndRun, eliminates the race where the run exists server-side but is not yet streamable; the SDK provides helpers that either stream the run in one call or poll on your behalf. Use these methods instead of create() then immediately .stream(). This is an exact JS/TS example (documented by Microsoft docs):

   // Install:
   // npm install @azure/ai-projects @azure/identity
   import { AIProjectClient } from "@azure/ai-projects";
   import { DefaultAzureCredential } from "@azure/identity";
   const endpoint = process.env.PROJECT_ENDPOINT!;
   const client = new AIProjectClient(endpoint, new DefaultAzureCredential(), {
     // optional: tune client retry policy
     retryOptions: { maxRetries: 5, retryDelayInMs: 1000, maxRetryDelayInMs: 10000 }
   });
   // Preferred: create + stream in one operation (SDK example)
   async function runAgentAndStream(threadId: string, agentId: string) {
     try {
       // SDK sample pattern (create then stream) that avoids the manual race when used as shown in docs:
       const stream = await client.runs.create(threadId, agentId).stream();
       for await (const event of stream) {
         // event is one of the RunStreamEvent types (message.delta, error, done, etc.)
         // handle messages and tool calls here
         console.log("Stream event:", event);
         if (event.type === "error") {
           console.error("Run error:", event);
         }
         if (event.type === "done") {
           console.log("Run finished.");
         }
       }
     } catch (err) {
       console.error("Stream failed:", err);
       throw err;
     }
   }

This exact approach (client.runs.create(...).stream() / createAndPoll(...) / createThreadAndRun(...)) is present in Microsoft’s JS docs and samples. Use createAndPoll if you want the SDK to poll for readiness for you. - https://learn.microsoft.com/en-us/javascript/api/overview/azure/ai-agents-readme

Robust fallback pattern is another way (if you must separate create > stream). If your runtime forces you to create() separately (legacy code or tooling), use:

Idempotency metadata: before create, generate a client requestId and include it in the run metadata. The REST/SDK’s create-run supports metadata. If create() times out, listRuns(threadId) and search for a run whose metadata includes your requestId. That tells you the server actually created the run. - https://learn.microsoft.com/en-us/rest/api/aifoundry/aiagents/runs/create-run?view=rest-aifoundry-aiagents-v1

Exponential backoff polling for run readiness, not a blind immediate .stream() call. TS/Node sample (robust):

      import { AIProjectClient } from "@azure/ai-projects";
      import { DefaultAzureCredential } from "@azure/identity";
      import { setTimeout as sleep } from "node:timers/promises";
      import { v4 as uuidv4 } from "uuid";
      const client = new AIProjectClient(process.env.PROJECT_ENDPOINT!, new DefaultAzureCredential(), {
        retryOptions: { maxRetries: 5, retryDelayInMs: 1000, maxRetryDelayInMs: 10000 }
      });
      async function createRunWithIdempotency(threadId: string, agentId: string, payload?: any) {
        const requestId = uuidv4();
        // attach metadata.requestId so we can detect a server-created run
        const createBody = {
          ...payload,
          metadata: { clientRequestId: requestId }
        };
        let run: any = undefined;
        try {
          // SDK call - exact parameter shape depends on SDK version; many SDKs accept body or options
          run = await client.runs.create(threadId, agentId, { body: createBody });
        } catch (err) {
          // If create timed out or errored, try to find a run with our requestId
          console.warn("create() failed — checking if run was created anyway:", err);
          const runsIter = client.runs.list(threadId, { order: "desc", limit: 10 }); // page through recent runs
          for await (const r of runsIter) {
            if (r.metadata?.clientRequestId === requestId) {
              run = r;
              console.log("Found server-side run created despite create() error:", run.id);
              break;
            }
          }
          if (!run) throw err; // rethrow original error if no server-side run found
        }
        // Poll run readiness with exponential backoff
        const deadline = Date.now() + 5  60_000; // 5 minutes
        let delay = 1000;
        while (Date.now() < deadline) {
          const current = await client.runs.get(threadId, run.id);
          if (current.status === "in_progress" || current.status === "queued") {
            // not ready yet
          } else {
            // if terminal (completed/failed/etc.) break to process
            break;
          }
          await sleep(delay);
          delay = Math.min(8000, Math.floor(delay  1.8));
        }
        // Now stream events (or GET messages if stream is not supported)
        try {
          const eventStream = await client.runs.events(threadId, run.id);
          for await (const ev of eventStream) {
            console.log("event:", ev);
          }
        } catch (evErr) {
          // fallback: read final messages
          console.warn("Streaming events failed, falling back to messages list:", evErr);
          const messages = await client.threads.listMessages(threadId);
          return messages;
        }
      }

Check the following links - https://learn.microsoft.com/en-us/rest/api/aifoundry/aiagents/runs/list-runs?view=rest-aifoundry-aiagents-v1 and https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/tools/function-calling

Set client retry options when constructing AIProjectClient, e.g. retryOptions: { maxRetries: 5, retryDelayInMs: 1000, maxRetryDelayInMs: 10000 }. Use them to safely retry transient 5xx/408/504 failures. - https://learn.microsoft.com/en-us/javascript/api/%40azure/ai-projects/aiprojectclientoptionalparams?view=azure-node-preview Do not blindly retry create() on client-side timeouts use metadata-requestId + listRuns to detect whether the server already created the run (de-dup). - https://learn.microsoft.com/en-us/rest/api/aifoundry/aiagents/runs/create-run?view=rest-aifoundry-aiagents-v1
In the Azure AI Foundry project portal: Tracing > connect an Application Insights resource. Instrument your Node.js app with Azure Monitor OpenTelemetry distro. - https://learn.microsoft.com/en-us/azure/ai-foundry/agents/concepts/tracing
Create alerts for spikes of StatusCode 5xx or Run failures. - https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/metrics
Enable continuous evaluation and the Foundry dashboards to watch run success rate, latency, tool failures. - https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/continuous-evaluation-agents
If you are using Managed Identity, ensure the identity has the right roles on the target resource. If the error is 403/500 on the internal token service, gather tracing/Request-IDs and file a support ticket with Microsoft via your Portal.

I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Carlos Rene Trujillo Rodriguez 0 Reputation points

2025-08-27T22:19:14.2466667+00:00

Hi, I'm experiencing the same error.
I'm trying with different GPT models and it generates the same error.
irosha_alahakoon 0 Reputation points

2025-08-28T05:15:21.8+00:00

Hi @Sina Salam . Thank you for the reply and appreciate your time on this. Thank you for all the helpful tips but we are already using client.runs.create(...).stream() approach in our code and only provide this example for the clarify. But we still experienced the same 120s+ delay. Since last night, AI Foundry service is performing as usual and we able to get previous performance.
Sina Salam 23,931 Reputation points Volunteer Moderator

2025-08-28T11:30:14.6066667+00:00

Hi,

While looking for more solutions, the below lists are common way of resolving error 500:

Update or upgrade all the models and API.

If the service typically takes longer than 10 seconds, increase the timeout to 30 seconds or more.

Use exponential backoff and retry policies for transient failures.

Enable detailed logging and telemetry (e.g., Application Insights) to track request durations and failures.

Profile and optimize the saveusertoken endpoint.

Ensure it scales properly under load.

Prevent cascading failures by temporarily halting requests when the service is unstable.

Ensure no firewall or proxy is terminating long-lived connections prematurely.

However, if you can perform the above check lists, let me know the outcome.

Cheers.

Share via

instability around Azure AI Foundry Agent Threads Run with deployment using gpt-4o base model

1 answer

Your answer