Azure en-US neural voices reads large numbers incorrectly

Question

Azure en-US neural voices reads large numbers incorrectly

Jason Horner 0

We are having a problem with the way Azure TTS reads out numbers for United States English neural voices. The Azure TTS engine reads 32,768 as "thirty-two thousand seven hundred and sixty-eight" for all English locales, it seems. This is the correct reading for the United Kingdom and Australian English locales, but this is not correct for the United States, where it should say "thirty-two thousand seven hundred sixty-eight" (with no "and").Attempts to use SSML to force the number to be pronounced a certain way have not helped. For example:
<say-as interpret-as="number">39873</say-as> --> reads "thirty nine thousand eight hundred and seventy three"
<say-as interpret-as="number">39,873</say-as> --> reads "thirty nine (pause) eight seventy three"

Using interpret-as cardinal, math, fraction, alphanumeric, and fraction produces similarly unhelpful results.

We know we can programmatically replace the digits with words before sending the text to Azure TTS, but that's a suboptimal solution, at best. That would increase our billable character counts, plus it could cause problems with years, ZIP (postal) codes, and phone numbers -- things that Azure reads correctly now.

Is there any way this can be fixed for United States English neural voices? Or a way to configure this with an option?

1 answer

Your answer

Answer 1

Hello Jason !

Thank you for posting on Microsoft Learn.

There isn’t a switch today that tells Azure en-US neural voices to drop the "and" in cardinals. The US/UK preference here is controlled by the service text-normalization rules, and SSML<say-as interpret-as="number|cardinal"> doesn’t expose an option to change that behavior. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-pronunciation

This same issue has been raised before and there isn’t an official toggle exposed today. If this is critical for you, open a support ticket and reference the US cardinal formatting request so it can be prioritized by the Speech team.

For other numeric types, use the dedicated modes to avoid odd readings: number_digit (read digits), telephone, currency, address, date/time, fraction... These do work as intended in en-US. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-pronunciation

You can avoid commas in plain numerals when you want a number since commas can cause segmentation or pauses (“39,873” may split). Prefer 39873 wrapped in <say-as interpret-as="cardinal">.

For specific troublesome numbers, wrap the original digits in <sub alias="thirty-two thousand seven hundred sixty-eight">32768</sub>. This forces the US style. (Trade-off: more billable characters because the alias text counts.)

Share via

Azure en-US neural voices reads large numbers incorrectly

1 answer

Your answer