Intermittent CPU spikes and metric gaps

Olof Montin 0 Reputation points
2025-08-26T12:48:47.72+00:00

We have a Function App that runs with two timer triggers. One every 24 hour and one every fifth minute. It's doing lots of i/o against Azure resources and remote APIs with a timeout of 10 minutes. The service plan is an EP2 and it's written in F#/.Net.

We see two problems with this Function App.

  1. Intermittent CPU spikes where the CPU reaches 80-100% during 4 to 24 hours at a time. This does sometimes affect the function duration, but mostly not.
  2. The Function App is instrumented with custom metrics, using OpenTelemetry, which are emitted to Azure Insights. These metrics suddenly stops being emitted, until the Function App is restarted manually. There's no clear pattern on when these stops occur and don't seem to relate to the CPU spikes.

Screenshots of graphs

Things we've done but to no avail are

  • Trimmed the concurrency and threading. But we can't correlate the spikes with the amount of work and the timer triggers.
  • Trimmed the amount of data and thereby the i/o it does during each trigger. The CPU spikes come and go even when there's little compute being done.
  • Removed the OpenTelemetry, if that was causing the CPU spikes, which had no effect.
  • Recreated the Azure resources and tried different plans. What was interesting was that when running the function on a Y1 plan, we didn't see these problems. We ran the same workload on an Y1 and an EP1, and we only saw durations and gaps in metrics on the EP1 service plan. EP1 had CPU spikes as I described above, but as Y1 plans doesn't provide that metric, we couldn't compare the CPU usage. I assume the runtime doesn't always live between executions, as AWS Lambda functions, on Y1. And EPX lets the runtime live as long as it wants.

Unfortunately, we can't use Y1 long term, as it doesn't have the capacity to run the amount of workload we want per execution. Also, we really want to understand why the EPX service plans does this.

Can we somehow rule out the code or the service plan? We've done what we can on our end and want to know what's going on behind the scenes.

Windows for business | Windows Client for IT Pros | Performance | Other
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.