Failed to upload data cnv_training_package_root_flat_TAB.zip. Error: Status: 400. We cannot pair your audio files with the transcripts. Make sure in the transcript file you have included the name of your audios correctly. Try again in a few moments.

Craig Lowther 0 Reputation points
2025-08-26T15:06:58.62+00:00

Hello. I have been trying for two hours to try and upload some wav files and the .txt files to accompany them. the descriptions match exactly. I have tried as many variations as I can think of. It is very frustrating having had the voice talent do the recordings. the transcripts match exactly but I keep getting the above error message no matter what I try.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 36,716 Reputation points Volunteer Moderator
    2025-08-29T18:51:59.34+00:00

    Hello Craig !

    Thank you for posting on Microsoft Learn.

    For short clips (<15s each), you need to choose Individual utterances + matching transcript. You upload one ZIP of .wav files and one .txt transcript file (not zipped). https://learn.microsoft.com/en-us/azure/ai-services/speech-service/professional-voice-create-training-set

    For long recordings, choose Long audio + transcript. You upload one ZIP of audio and a second ZIP of per-file .txt transcripts (each .txt has the same filename as its audio). https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-voice-training-data

    If you’re doing individual utterances + matching transcript :

    If you want Audio ZIP: put only .wav files at the root (no subfolders) and use unique filenames, Windows-safe characters (no \ / : * ? " < > |, not starting/ending with space). https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-voice-training-data

    PCM WAV, ≥16 kHz (24 kHz recommended), 16-bit, <15s per file. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-voice-training-data

    If you want transcript .txt (a single file), one line per utterance:

    0000000001<TAB>This is the waistline, and it's falling.
    0000000002<TAB>We have trouble scoring.
    0000000003<TAB>It was Janet Maslin.
    

    You can use a real tab character between the audio ID and the text (spaces or commas will fail). https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-voice-training-data

    The audio ID must match the .wav filename (usually the base name, without .wav) and is expected to be numeric. Duplicates are rejected. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/professional-voice-create-training-set

    The encodings allowed are UTF-8/UTF-8-BOM/UTF-16-LE/UTF-16-BE/ANSI/ASCII (zh-CN doesn’t allow ANSI/ASCII). https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-voice-training-data

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.