Text-to-speech service implementation using NVIDIA Riva
NVIDIA_API_KEY
.
TextFrame
- Text content to synthesize into speechTTSSpeakFrame
- Text that should be spoken immediatelyTTSUpdateSettingsFrame
- Runtime configuration updatesLLMFullResponseStartFrame
/ LLMFullResponseEndFrame
- LLM response boundariesTTSStartedFrame
- Signals start of synthesisTTSAudioRawFrame
- Generated audio data chunks (streaming)TTSStoppedFrame
- Signals completion of synthesisErrorFrame
- API or processing errorsModel | Description | Best For |
---|---|---|
magpie-tts-multilingual | Multilingual model with natural voices | Conversational AI, multiple languages |
fastpitch-hifigan-tts | High-quality English synthesis | English-only applications |
magpie-tts-multilingual
model is the default and recommended for most
use cases due to its multilingual capabilities and natural voice quality.magpie-tts-multilingual
model supports:
Language Code | Description | Service Code |
---|---|---|
Language.EN_US | English (US) | en-US |
Language.ES_US | Spanish (US) | es-US |
Language.FR_FR | French (France) | fr-FR |
Language.DE_DE | German (Germany) | de-DE |
Language.IT_IT | Italian (Italy) | it-IT |
Language.ZH_CN | Chinese (China) | zh-CN |
TTSUpdateSettingsFrame
for the RivaTTSService
:
model_function_map
during constructionFastPitchTTSService
is deprecated - use RivaTTSService
instead