The AWSNovaSonicLLMService enables natural, real-time conversations with AWS Nova Sonic. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences. It provides:

Real-time Interaction

Stream audio in real-time with low latency response times

Speech Processing

Built-in speech-to-text and text-to-speech capabilities with multiple voice options

Voice Activity Detection

Automatic detection of speech start/stop for natural conversations

Context Management

Intelligent handling of conversation history and system instructions

Installation

To use AWSNovaSonicLLMService, install the required dependencies:

pip install "pipecat-ai[aws-nova-sonic]"

We recommend setting up your AWS credentials as environment variables, as you’ll need them to initialize AWSNovaSonicLLMService:

  • AWS_SECRET_ACCESS_KEY
  • AWS_ACCESS_KEY_ID
  • AWS_REGION

Basic Usage

Here’s a simple example of setting up a conversational AI bot with AWS Nova Sonic:

from pipecat.services.aws_nova_sonic.aws import AWSNovaSonicLLMService

llm = AWSNovaSonicLLMService(
    secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    region=os.getenv("AWS_REGION")
    voice_id="tiffany",                    # Voices: matthew, tiffany, amy
)

Configuration

Constructor Parameters

secret_access_key
str
required

Your AWS secret access key

access_key_id
str
required

Your AWS access key ID

region
str
required

Specify the AWS region for the service (e.g., "us-east-1"). Note that the service may not be available in all AWS regions: check the AWS Bedrock User Guide’s support table.

model
str
default:"amazon.nova-sonic-v1:0"

AWS Nova Sonic model to use. Note that "amazon.nova-sonic-v1:0" is the only supported model as of 2025-05-08.

voice_id
str
default:"matthew"

Voice for text-to-speech (options: "matthew", "tiffany", "amy")

params
Params

Configuration for model parameters

system_instruction
str

High-level instructions that guide the model’s behavior. Note that more commonly these instructions will be included as part of the context provided to kick off the conversation.

tools
ToolsSchema

List of function definitions for tool/function calling. Note that more commonly tools will be included as part of the context provided to kick off the conversation.

send_transcription_frames
bool
default:"True"

Whether to emit transcription frames

Model Parameters

The Params object configures the behavior of the AWS Nova Sonic model.

It is strongly recommended to stick with default values (most easily by omitting params when constructing AWSNovaSonicLLMService) unless you have a good understanding of the parameters and their impact. Deviating from the defaults may lead to unexpected behavior.

temperature
float
default:"0.7"

Controls randomness in responses. Range: 0.0 to 2.0

max_tokens
int
default:"1024"

Maximum number of tokens to generate

top_p
float
default:"0.9"

Cumulative probability cutoff for token selection. Range: 0.0 to 1.0

input_sample_rate
int
default:"16000"

Sample rate for input audio

output_sample_rate
int
default:"24000"

Sample rate for output audio

input_sample_size
int
default:"16"

Bit depth for input audio

input_channel_count
int
default:"1"

Number of channels for input audio

output_sample_size
int
default:"16"

Bit depth for output audio

output_channel_count
int
default:"1"

Number of channels for output audio

Frame Types

Input Frames

InputAudioRawFrame
Frame

Raw audio data for speech input

OpenAILLMContextFrame
Frame

Contains conversation context

BotStoppedSpeakingFrame
Frame

Signals the bot has stopped speaking

Output Frames

TTSAudioRawFrame
Frame

Generated speech audio

LLMFullResponseStartFrame
Frame

Signals the start of a response from the LLM

LLMFullResponseEndFrame
Frame

Signals the end of a response from the LLM

TTSStartedFrame
Frame

Signals start of speech synthesis (coincides with the start of the LLM response, as this is a speech-to-speech model)

TTSStoppedFrame
Frame

Signals end of speech synthesis (coincides with the end of the LLM response, as this is a speech-to-speech model)

LLMTextFrame
Frame

Generated text responses from the LLM

TTSTextFrame
Frame

Generated text responses

TranscriptionFrame
Frame

Speech transcriptions. Only output if send_transcription_frames is True.

Function Calling

This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to:

  • Check current weather conditions
  • Query databases
  • Access external APIs
  • Perform custom actions

See the Function Calling guide for:

  • Detailed implementation instructions
  • Provider-specific function definitions
  • Handler registration examples
  • Control over function call behavior
  • Complete usage examples

Next Steps

Examples