Failed to upload data cnv_training_package_root_flat_TAB.zip. Error: Status: 400. We cannot pair your audio files with the transcripts. Make sure in the transcript file you have included the name of your audios correctly. Try again in a few moments.

Question

Failed to upload data cnv_training_package_root_flat_TAB.zip. Error: Status: 400. We cannot pair your audio files with the transcripts. Make sure in the transcript file you have included the name of your audios correctly. Try again in a few moments.

Craig Lowther 0

Hello. I have been trying for two hours to try and upload some wav files and the .txt files to accompany them. the descriptions match exactly. I have tried as many variations as I can think of. It is very frustrating having had the voice talent do the recordings. the transcripts match exactly but I keep getting the above error message no matter what I try.

1 answer

Your answer

Answer 1

Hello Craig !

Thank you for posting on Microsoft Learn.

For short clips (<15s each), you need to choose Individual utterances + matching transcript. You upload one ZIP of .wav files and one .txt transcript file (not zipped). https://learn.microsoft.com/en-us/azure/ai-services/speech-service/professional-voice-create-training-set

For long recordings, choose Long audio + transcript. You upload one ZIP of audio and a second ZIP of per-file .txt transcripts (each .txt has the same filename as its audio). https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-voice-training-data

If you’re doing individual utterances + matching transcript :

If you want Audio ZIP: put only .wav files at the root (no subfolders) and use unique filenames, Windows-safe characters (no \ / : * ? " < > |, not starting/ending with space). https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-voice-training-data

PCM WAV, ≥16 kHz (24 kHz recommended), 16-bit, <15s per file. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-voice-training-data

If you want transcript .txt (a single file), one line per utterance:

0000000001<TAB>This is the waistline, and it's falling.
0000000002<TAB>We have trouble scoring.
0000000003<TAB>It was Janet Maslin.

You can use a real tab character between the audio ID and the text (spaces or commas will fail). https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-voice-training-data

The audio ID must match the .wav filename (usually the base name, without .wav) and is expected to be numeric. Duplicates are rejected. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/professional-voice-create-training-set

The encodings allowed are UTF-8/UTF-8-BOM/UTF-16-LE/UTF-16-BE/ANSI/ASCII (zh-CN doesn’t allow ANSI/ASCII). https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-voice-training-data

Share via

Failed to upload data cnv_training_package_root_flat_TAB.zip. Error: Status: 400. We cannot pair your audio files with the transcripts. Make sure in the transcript file you have included the name of your audios correctly. Try again in a few moments.

1 answer

Your answer