Configure Conversation – Clean Speech 🗣️💬

Normalize Text for Speech Synthesis

The Normalize Text for Speech Synthesis feature converts certain parts of the text (such as numbers, currencies, or dates) into their spoken form. This ensures more consistent speech synthesis, since TTS models can sometimes misinterpret text if it isn’t normalized.

For example, before generating the audio, the phrase:

“Call my number 2137112342 on Jul 5th, 2024 for the $24.12 payment”

will be transformed into:

“Call my number two one three seven one one two three four two on July fifth, twenty twenty four for the twenty four dollars twelve cents payment”

It’s important to note that this feature adds a small latency (approximately 100 ms) to the overall process.

Language Configuration

Currently, speech normalization is supported in the following languages:

English
Spanish
French
German

For other languages, this feature will not make any modifications to the text. If you select a non-multilingual language, the normalization will use that language’s rules (for example, “1” will be normalized to “one” if English is used). If you select the multilingual option, the system will automatically detect the appropriate language based on the generated text and normalize it accordingly.

PreviousConfigure Conversation – Transcript Formatting 📄NextConfigure Voice – Voice Model 🤖

Last updated 8 months ago

hashtagLanguage Configuration

Language Configuration