Azure Fast Transcription API: InvalidAudioStream Error for M4A Files Over 10MB

Ali Uthuman 0 Reputation points
2025-08-21T13:15:32.0633333+00:00

Azure Fast Transcription API: InvalidAudioStream Error for M4A Files Over 10MB

Issue Summary

Azure Speech Service Fast Transcription API returns “InvalidAudioStream” error for M4A audio files larger than ~10MB, while smaller M4A files from the same source work perfectly. This affects mobile app users uploading voice recordings from iOS devices.

Environment Details

  • Service: Azure Speech Service Fast Transcription API
  • Region: Australia East
  • Pricing Tier: Standard (S0) - recently upgraded from Free (F0)
  • API Version: 2024-05-15-preview (also tested with 2024-11-15)
  • Audio Source: iPhone Voice Memo recordings (M4A format)
  • Implementation: Azure Functions (Flex Consumption) via REST API

Problem Description

What Works ✅

  • M4A files under ~10MB transcribe successfully
  • Same audio source, same encoding parameters
  • No issues with smaller recordings

What Fails ❌

  • M4A files over ~10-15MB consistently fail
  • Error: "Reason InvalidAudioStream, Details: The audio stream could not be decoded with the provided configuration"
  • HTTP 400 BadRequest response

Complete Error Log:


[Error] Fast Transcription API returned error: BadRequest

[Error] Error response body: Reason InvalidAudioStream, Details: The audio stream could not be decoded with the provided configuration.

[Error] Error in Fast Transcription: Fast Transcription API returned error: BadRequest - Reason InvalidAudioStream, Details: The audio stream could not be decoded with the provided configuration.

[Error] AUDIT: Function error - RequestId: [REDACTED], Error: Fast Transcription API returned error: BadRequest - Reason InvalidAudioStream, Details: The audio stream could not be decoded with the provided configuration.

Audio File Details

  • Format: M4A (iPhone Voice Memo)
  • Size: 15.3MB (15,319,603 bytes)
  • Header Analysis: 0000001C667479704D344120000000004D344120 (confirms M4A format)
  • Duration: Under 2 hours (well within 300MB limit)

Current Implementation


public async Task<string> TranscribeAudioAsync(byte[] audioData, ILogger log)

{

    var handler = new HttpClientHandler();

    using var httpClient = new HttpClient(handler);

    httpClient.Timeout = TimeSpan.FromMinutes(10);

    httpClient.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", _subscriptionKey);

    

    using var content = new MultipartFormDataContent();

    var audioContent = new StreamContent(new MemoryStream(audioData));

    audioContent.Headers.ContentType = new MediaTypeHeaderValue("audio/mp4"); // Tried various types

    content.Add(audioContent, "audio", "audio.m4a");

    var definitionObject = new

    {

        locales = new[] { "en-US" },

        profanityFilterMode = "Masked"

    };

    var definitionJson = JsonConvert.SerializeObject(definitionObject);

    content.Add(new StringContent(definitionJson, Encoding.UTF8, "application/json"), "definition");

    var response = await httpClient.PostAsync(_transcriptionEndpoint, content);

    // Fails with 400 BadRequest for large M4A files

}

Attempted Solutions

  1. Content-Type variations: audio/wav, audio/mp4, application/octet-stream
  2. API version updates: 2024-05-15-preview2024-11-15
  3. Configuration simplification: Removed channels array, profanity filter
  4. Timeout increases: Extended to 10 minutes
  5. HttpClient optimization: Added proper handlers and buffer settings
  6. Tier upgrade*m: Free (F0) → Standard (S0)

Questions

  1. Is there an undocumented size limit for M4A files in Fast Transcription API beyond the stated 300MB?
  2. Are there specific M4A encoding requirements (sample rate, channels, codec) for larger files?
  3. Is this a known issue with iPhone Voice Memo M4A format and Fast Transcription?
  4. Should we convert M4A to WAV** client-side, or is there a server-side solution?

Expected Behavior

According to documentation, M4A format is supported and files under 300MB should work. The inconsistent behavior based on file size suggests either:

  • Hidden size thresholds for M4A format
  • Different validation logic for larger files
  • M4A-specific channel/encoding limitations

Any guidance on M4A file requirements or alternative approaches would be greatly appreciated.

Thanks


Tags: azure-speech-service, fast-transcription, m4a, audio-format, mobile-app

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.