Overview
AWSNovaSonicLLMService enables natural, real-time conversations with AWS Nova Sonic. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with bidirectional audio streaming, text generation, and function calling capabilities.
AWS Nova Sonic API Reference
Pipecat’s API methods for AWS Nova Sonic integration
Example Implementation
Complete AWS Nova Sonic conversation example
AWS Bedrock Documentation
Official AWS Bedrock and Nova Sonic documentation
AWS Console
Access AWS Bedrock and manage Nova Sonic models
Installation
To use AWS Nova Sonic services, install the required dependencies:Prerequisites
AWS Account Setup
Before using AWS Nova Sonic services, you need:- AWS Account: Set up at AWS Console
- Bedrock Access: Enable AWS Bedrock service in your region
- Model Access: Request access to Nova Sonic models in Bedrock
- IAM Credentials: Configure AWS access keys with Bedrock permissions
Required Environment Variables
AWS_SECRET_ACCESS_KEY: Your AWS secret access keyAWS_ACCESS_KEY_ID: Your AWS access key IDAWS_REGION: AWS region where Bedrock is available
Key Features
- Real-time Speech-to-Speech: Direct audio input to audio output processing
- Built-in Transcription: Automatic speech-to-text with real-time streaming
- Voice Activity Detection: Automatic detection of speech start/stop
- Function Calling: Support for external function and API integration
- Multiple Voices: Choose from matthew, tiffany, and amy voice options
Configuration
AWSNovaSonicLLMService
AWS secret access key for authentication.
AWS access key ID for authentication.
AWS session token for temporary credentials (e.g., when using AWS STS).
AWS region where the service is hosted. Supported regions for Nova 2 Sonic (default):
"us-east-1", "us-west-2", "ap-northeast-1". Supported regions for Nova Sonic (older model): "us-east-1", "ap-northeast-1".Model identifier. Use
"amazon.nova-2-sonic-v1:0" for the latest model or "amazon.nova-sonic-v1:0" for the older model.Voice ID for speech synthesis. Some voices are designed for specific languages. See AWS Nova 2 Sonic voice support for available voices.
Model parameters for audio configuration and inference. See Params below.
System-level instruction for the model.
Available tools/functions for the model to use.
Params
Audio and inference parameters that can be set at initialization via theparams constructor argument.
| Parameter | Type | Default | Description |
|---|---|---|---|
input_sample_rate | int | 16000 | Audio input sample rate in Hz. |
input_sample_size | int | 16 | Audio input sample size in bits. |
input_channel_count | int | 1 | Number of input audio channels. |
output_sample_rate | int | 24000 | Audio output sample rate in Hz. |
output_sample_size | int | 16 | Audio output sample size in bits. |
output_channel_count | int | 1 | Number of output audio channels. |
max_tokens | int | 1024 | Maximum number of tokens to generate. |
top_p | float | 0.9 | Nucleus sampling parameter. |
temperature | float | 0.7 | Sampling temperature for text generation. |
endpointing_sensitivity | str | None | Controls how quickly Nova Sonic decides the user has stopped speaking. Values: "LOW", "MEDIUM", or "HIGH". Only supported with Nova 2 Sonic (the default model). |
Usage
Basic Setup
With Custom Parameters
With Function Calling
Notes
- Model versions: Nova 2 Sonic (
amazon.nova-2-sonic-v1:0) is the default and recommended model. The older Nova Sonic (amazon.nova-sonic-v1:0) has fewer features and requires an assistant response trigger mechanism. - Endpointing sensitivity: Only supported with Nova 2 Sonic. Controls how quickly the model decides the user has stopped speaking —
"HIGH"causes the model to respond most quickly. - Transcription frames: User speech transcription frames are always emitted upstream.
- Connection resilience: If a connection error occurs while the service wants to stay connected, it automatically resets the conversation and reconnects.
- System instruction and tools precedence: Instructions and tools provided in the LLM context take precedence over those provided at initialization time.
- Audio format: Uses LPCM (Linear PCM) audio format for both input and output. Input defaults to 16kHz and output defaults to 24kHz.