Azure Fast Transcription API: InvalidAudioStream Error for M4A Files Over 10MB
Azure Fast Transcription API: InvalidAudioStream Error for M4A Files Over 10MB
Issue Summary
Azure Speech Service Fast Transcription API returns “InvalidAudioStream” error for M4A audio files larger than ~10MB, while smaller M4A files from the same source work perfectly. This affects mobile app users uploading voice recordings from iOS devices.
Environment Details
- Service: Azure Speech Service Fast Transcription API
- Region: Australia East
- Pricing Tier: Standard (S0) - recently upgraded from Free (F0)
- API Version:
2024-05-15-preview
(also tested with2024-11-15
) - Audio Source: iPhone Voice Memo recordings (M4A format)
- Implementation: Azure Functions (Flex Consumption) via REST API
Problem Description
What Works ✅
- M4A files under ~10MB transcribe successfully
- Same audio source, same encoding parameters
- No issues with smaller recordings
What Fails ❌
- M4A files over ~10-15MB consistently fail
- Error:
"Reason InvalidAudioStream, Details: The audio stream could not be decoded with the provided configuration"
- HTTP 400 BadRequest response
Complete Error Log:
[Error] Fast Transcription API returned error: BadRequest
[Error] Error response body: Reason InvalidAudioStream, Details: The audio stream could not be decoded with the provided configuration.
[Error] Error in Fast Transcription: Fast Transcription API returned error: BadRequest - Reason InvalidAudioStream, Details: The audio stream could not be decoded with the provided configuration.
[Error] AUDIT: Function error - RequestId: [REDACTED], Error: Fast Transcription API returned error: BadRequest - Reason InvalidAudioStream, Details: The audio stream could not be decoded with the provided configuration.
Audio File Details
- Format: M4A (iPhone Voice Memo)
- Size: 15.3MB (15,319,603 bytes)
- Header Analysis:
0000001C667479704D344120000000004D344120
(confirms M4A format) - Duration: Under 2 hours (well within 300MB limit)
Current Implementation
public async Task<string> TranscribeAudioAsync(byte[] audioData, ILogger log)
{
var handler = new HttpClientHandler();
using var httpClient = new HttpClient(handler);
httpClient.Timeout = TimeSpan.FromMinutes(10);
httpClient.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", _subscriptionKey);
using var content = new MultipartFormDataContent();
var audioContent = new StreamContent(new MemoryStream(audioData));
audioContent.Headers.ContentType = new MediaTypeHeaderValue("audio/mp4"); // Tried various types
content.Add(audioContent, "audio", "audio.m4a");
var definitionObject = new
{
locales = new[] { "en-US" },
profanityFilterMode = "Masked"
};
var definitionJson = JsonConvert.SerializeObject(definitionObject);
content.Add(new StringContent(definitionJson, Encoding.UTF8, "application/json"), "definition");
var response = await httpClient.PostAsync(_transcriptionEndpoint, content);
// Fails with 400 BadRequest for large M4A files
}
Attempted Solutions
- Content-Type variations:
audio/wav
,audio/mp4
,application/octet-stream
- API version updates:
2024-05-15-preview
→2024-11-15
- Configuration simplification: Removed channels array, profanity filter
- Timeout increases: Extended to 10 minutes
- HttpClient optimization: Added proper handlers and buffer settings
- Tier upgrade*m: Free (F0) → Standard (S0)
Questions
- Is there an undocumented size limit for M4A files in Fast Transcription API beyond the stated 300MB?
- Are there specific M4A encoding requirements (sample rate, channels, codec) for larger files?
- Is this a known issue with iPhone Voice Memo M4A format and Fast Transcription?
- Should we convert M4A to WAV** client-side, or is there a server-side solution?
Expected Behavior
According to documentation, M4A format is supported and files under 300MB should work. The inconsistent behavior based on file size suggests either:
- Hidden size thresholds for M4A format
- Different validation logic for larger files
- M4A-specific channel/encoding limitations
Any guidance on M4A file requirements or alternative approaches would be greatly appreciated.
Thanks
Tags: azure-speech-service, fast-transcription, m4a, audio-format, mobile-app