The AWSNovaSonicLLMService enables natural, real-time conversations with AWS Nova Sonic. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences. It provides:

Real-time Interaction

Stream audio in real-time with low latency response times

Speech Processing

Built-in speech-to-text and text-to-speech capabilities with multiple voice options

Voice Activity Detection

Automatic detection of speech start/stop for natural conversations

Context Management

Intelligent handling of conversation history and system instructions

Installation

To use AWSNovaSonicLLMService, install the required dependencies:
pip install "pipecat-ai[aws-nova-sonic]"
We recommend setting up your AWS credentials as environment variables, as you’ll need them to initialize AWSNovaSonicLLMService:
  • AWS_SECRET_ACCESS_KEY
  • AWS_ACCESS_KEY_ID
  • AWS_REGION

Basic Usage

Here’s a simple example of setting up a conversational AI bot with AWS Nova Sonic:
from pipecat.services.aws_nova_sonic.aws import AWSNovaSonicLLMService

llm = AWSNovaSonicLLMService(
    secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    region=os.getenv("AWS_REGION")
    voice_id="tiffany",                    # Voices: matthew, tiffany, amy
)

Configuration

Constructor Parameters

secret_access_key
str
required
Your AWS secret access key
access_key_id
str
required
Your AWS access key ID
region
str
required
Specify the AWS region for the service (e.g., "us-east-1"). Note that the service may not be available in all AWS regions: check the AWS Bedrock User Guide’s support table.
model
str
default:"amazon.nova-sonic-v1:0"
AWS Nova Sonic model to use. Note that "amazon.nova-sonic-v1:0" is the only supported model as of 2025-05-08.
voice_id
str
default:"matthew"
Voice for text-to-speech (options: "matthew", "tiffany", "amy")
params
Params
Configuration for model parameters
system_instruction
str
High-level instructions that guide the model’s behavior. Note that more commonly these instructions will be included as part of the context provided to kick off the conversation.
tools
ToolsSchema
List of function definitions for tool/function calling. Note that more commonly tools will be included as part of the context provided to kick off the conversation.
send_transcription_frames
bool
default:"True"
Whether to emit transcription frames

Model Parameters

The Params object configures the behavior of the AWS Nova Sonic model.
It is strongly recommended to stick with default values (most easily by omitting params when constructing AWSNovaSonicLLMService) unless you have a good understanding of the parameters and their impact. Deviating from the defaults may lead to unexpected behavior.
temperature
float
default:"0.7"
Controls randomness in responses. Range: 0.0 to 2.0
max_tokens
int
default:"1024"
Maximum number of tokens to generate
top_p
float
default:"0.9"
Cumulative probability cutoff for token selection. Range: 0.0 to 1.0
input_sample_rate
int
default:"16000"
Sample rate for input audio
output_sample_rate
int
default:"24000"
Sample rate for output audio
input_sample_size
int
default:"16"
Bit depth for input audio
input_channel_count
int
default:"1"
Number of channels for input audio
output_sample_size
int
default:"16"
Bit depth for output audio
output_channel_count
int
default:"1"
Number of channels for output audio

Frame Types

Input Frames

InputAudioRawFrame
Frame
Raw audio data for speech input
OpenAILLMContextFrame
Frame
Contains conversation context
BotStoppedSpeakingFrame
Frame
Signals the bot has stopped speaking

Output Frames

TTSAudioRawFrame
Frame
Generated speech audio
LLMFullResponseStartFrame
Frame
Signals the start of a response from the LLM
LLMFullResponseEndFrame
Frame
Signals the end of a response from the LLM
TTSStartedFrame
Frame
Signals start of speech synthesis (coincides with the start of the LLM response, as this is a speech-to-speech model)
TTSStoppedFrame
Frame
Signals end of speech synthesis (coincides with the end of the LLM response, as this is a speech-to-speech model)
LLMTextFrame
Frame
Generated text responses from the LLM
TTSTextFrame
Frame
Generated text responses
TranscriptionFrame
Frame
Speech transcriptions. Only output if send_transcription_frames is True.

Function Calling

This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to:
  • Check current weather conditions
  • Query databases
  • Access external APIs
  • Perform custom actions
See the Function Calling guide for:
  • Detailed implementation instructions
  • Provider-specific function definitions
  • Handler registration examples
  • Control over function call behavior
  • Complete usage examples

Next Steps

Examples