🔍 Root Cause Analysis
The key issue is that when the Azure Function times out, the Service Bus SDK assumes the messages being processed were not completed, and because autoComplete
is likely set to true
, it retries them immediately. However, since the function instance is terminating, it doesn't actually reprocess them, and the messages quickly hit the MaxDeliveryCount
and are dead-lettered.
This is especially problematic when:
- The function processes messages in a loop (not one per invocation).
- The function timeout is reached mid-batch.
-
autoComplete
is enabled (default behavior).
✅ Recommended Fixes
- Set
autoComplete
tofalse
This gives you manual control over when a message is marked as completed. You can then explicitly call CompleteAsync()
only after successful processing.
{
"extensions": {
"serviceBus": {
"messageHandlerOptions": {
"autoComplete": false,
"maxConcurrentCalls": 16,
"maxAutoRenewDuration": "00:20:00"
}
}
}
}
In your function code, you’ll need to explicitly complete the message:
public async Task Run(
[ServiceBusTrigger("queue-name", Connection = "ServiceBusConnection")] Message message,
MessageReceiver messageReceiver,
ILogger log)
{
try
{
// Process message
await ProcessMessageAsync(message);
await messageReceiver.CompleteAsync(message.SystemProperties.LockToken);
}
catch (Exception ex)
{
log.LogError($"Error processing message: {ex.Message}");
// Optionally abandon or dead-letter manually
await messageReceiver.AbandonAsync(message.SystemProperties.LockToken);
}
}
- Increase Function Timeout
If your function is expected to run longer, increase the timeout in host.json
:
{
"functionTimeout": "00:10:00"
}
Also ensure your App Service Plan supports longer timeouts (e.g., Premium Plan).
- Avoid Long-Running Loops in a Single Function Instance
Instead of processing hundreds of messages in one function instance, consider:
- One message per function invocation (default behavior).
- Or batch processing with a cap and checkpointing logic.
- Investigate Token Caching Logic
If your token caching logic is shared across invocations and fails after timeout, it could cause a cascade of failures. Ensure:
- Token refresh is thread-safe and fault-tolerant.
- Failures in token acquisition don’t affect unrelated messages.
🧪 Next Steps
Would you like help:
- Refactoring your function to use manual message completion?
- Reviewing your token caching logic?
- Setting up a retry policy or dead-letter monitoring?