Configure Conversation – Clean Speech πŸ—£οΈπŸ’¬

Normalize Text for Speech Synthesis

The Normalize Text for Speech Synthesis feature converts certain parts of the text (such as numbers, currencies, or dates) into their spoken form. This ensures more consistent speech synthesis, since TTS models can sometimes misinterpret text if it isn’t normalized.

For example, before generating the audio, the phrase:

β€œCall my number 2137112342 on Jul 5th, 2024 for the $24.12 payment”

will be transformed into:

β€œCall my number two one three seven one one two three four two on July fifth, twenty twenty four for the twenty four dollars twelve cents payment”

It’s important to note that this feature adds a small latency (approximately 100 ms) to the overall process.


Language Configuration

Currently, speech normalization is supported in the following languages:

  • English

  • Spanish

  • French

  • German

For other languages, this feature will not make any modifications to the text. If you select a non-multilingual language, the normalization will use that language’s rules (for example, β€œ1” will be normalized to β€œone” if English is used). If you select the multilingual option, the system will automatically detect the appropriate language based on the generated text and normalize it accordingly.

Last updated