Configure Conversation β Clean Speech π£οΈπ¬
Normalize Text for Speech Synthesis
The Normalize Text for Speech Synthesis feature converts certain parts of the text (such as numbers, currencies, or dates) into their spoken form. This ensures more consistent speech synthesis, since TTS models can sometimes misinterpret text if it isnβt normalized.
For example, before generating the audio, the phrase:
βCall my number 2137112342 on Jul 5th, 2024 for the $24.12 paymentβ
will be transformed into:
βCall my number two one three seven one one two three four two on July fifth, twenty twenty four for the twenty four dollars twelve cents paymentβ
Itβs important to note that this feature adds a small latency (approximately 100 ms) to the overall process.
Language Configuration
Currently, speech normalization is supported in the following languages:
English
Spanish
French
German
For other languages, this feature will not make any modifications to the text. If you select a non-multilingual language, the normalization will use that languageβs rules (for example, β1β will be normalized to βoneβ if English is used). If you select the multilingual option, the system will automatically detect the appropriate language based on the generated text and normalize it accordingly.
Last updated