AWS Nova Sonic
Real-time speech-to-speech service implementation using AWS Nova Sonic
The AWSNovaSonicLLMService
enables natural, real-time conversations with AWS Nova Sonic. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences. It provides:
Real-time Interaction
Stream audio in real-time with low latency response times
Speech Processing
Built-in speech-to-text and text-to-speech capabilities with multiple voice options
Voice Activity Detection
Automatic detection of speech start/stop for natural conversations
Context Management
Intelligent handling of conversation history and system instructions
Installation
To use AWSNovaSonicLLMService
, install the required dependencies:
We recommend setting up your AWS credentials as environment variables, as you’ll need them to initialize AWSNovaSonicLLMService
:
AWS_SECRET_ACCESS_KEY
AWS_ACCESS_KEY_ID
AWS_REGION
Basic Usage
Here’s a simple example of setting up a conversational AI bot with AWS Nova Sonic:
Configuration
Constructor Parameters
Your AWS secret access key
Your AWS access key ID
Specify the AWS region for the service (e.g., "us-east-1"
). Note that the
service may not be available in all AWS regions: check the AWS Bedrock User
Guide’s support
table.
AWS Nova Sonic model to use. Note that "amazon.nova-sonic-v1:0"
is the only
supported model as of 2025-05-08.
Voice for text-to-speech (options: "matthew"
, "tiffany"
, "amy"
)
Configuration for model parameters
High-level instructions that guide the model’s behavior. Note that more commonly these instructions will be included as part of the context provided to kick off the conversation.
List of function definitions for tool/function calling. Note that more commonly tools will be included as part of the context provided to kick off the conversation.
Whether to emit transcription frames
Model Parameters
The Params
object configures the behavior of the AWS Nova Sonic model.
It is strongly recommended to stick with default values (most easily by
omitting params
when constructing AWSNovaSonicLLMService
) unless you have
a good understanding of the parameters and their impact. Deviating from the
defaults may lead to unexpected behavior.
Controls randomness in responses. Range: 0.0 to 2.0
Maximum number of tokens to generate
Cumulative probability cutoff for token selection. Range: 0.0 to 1.0
Sample rate for input audio
Sample rate for output audio
Bit depth for input audio
Number of channels for input audio
Bit depth for output audio
Number of channels for output audio
Frame Types
Input Frames
Raw audio data for speech input
Contains conversation context
Signals the bot has stopped speaking
Output Frames
Generated speech audio
Signals the start of a response from the LLM
Signals the end of a response from the LLM
Signals start of speech synthesis (coincides with the start of the LLM response, as this is a speech-to-speech model)
Signals end of speech synthesis (coincides with the end of the LLM response, as this is a speech-to-speech model)
Generated text responses from the LLM
Generated text responses
Speech transcriptions. Only output if send_transcription_frames
is True
.
Function Calling
This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to:
- Check current weather conditions
- Query databases
- Access external APIs
- Perform custom actions
See the Function Calling guide for:
- Detailed implementation instructions
- Provider-specific function definitions
- Handler registration examples
- Control over function call behavior
- Complete usage examples
Next Steps
Examples
-
Foundational Example Basic implementation showing core features
-
Persistent Content Example Implementation showing saving and loading conversation history