Azure en-US neural voices reads large numbers incorrectly

Jason Horner 0 Reputation points
2025-08-21T18:17:54.5133333+00:00

We are having a problem with the way Azure TTS reads out numbers for United States English neural voices. The Azure TTS engine reads 32,768 as "thirty-two thousand seven hundred and sixty-eight" for all English locales, it seems. This is the correct reading for the United Kingdom and Australian English locales, but this is not correct for the United States, where it should say "thirty-two thousand seven hundred sixty-eight" (with no "and").Attempts to use SSML to force the number to be pronounced a certain way have not helped. For example:
<say-as interpret-as="number">39873</say-as> --> reads "thirty nine thousand eight hundred and seventy three"
<say-as interpret-as="number">39,873</say-as> --> reads "thirty nine (pause) eight seventy three"

Using interpret-as cardinal, math, fraction, alphanumeric, and fraction produces similarly unhelpful results.

We know we can programmatically replace the digits with words before sending the text to Azure TTS, but that's a suboptimal solution, at best. That would increase our billable character counts, plus it could cause problems with years, ZIP (postal) codes, and phone numbers -- things that Azure reads correctly now.

Is there any way this can be fixed for United States English neural voices? Or a way to configure this with an option?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 36,716 Reputation points Volunteer Moderator
    2025-08-25T16:10:10.78+00:00

    Hello Jason !

    Thank you for posting on Microsoft Learn.

    There isn’t a switch today that tells Azure en-US neural voices to drop the "and" in cardinals. The US/UK preference here is controlled by the service text-normalization rules, and SSML<say-as interpret-as="number|cardinal"> doesn’t expose an option to change that behavior. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-pronunciation

    This same issue has been raised before and there isn’t an official toggle exposed today. If this is critical for you, open a support ticket and reference the US cardinal formatting request so it can be prioritized by the Speech team.

    For other numeric types, use the dedicated modes to avoid odd readings: number_digit (read digits), telephone, currency, address, date/time, fraction... These do work as intended in en-US. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-pronunciation

    You can avoid commas in plain numerals when you want a number since commas can cause segmentation or pauses (“39,873” may split). Prefer 39873 wrapped in <say-as interpret-as="cardinal">.

    For specific troublesome numbers, wrap the original digits in <sub alias="thirty-two thousand seven hundred sixty-eight">32768</sub>. This forces the US style. (Trade-off: more billable characters because the alias text counts.)

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.