Skip to main content

Overview

AWSNovaSonicLLMService enables natural, real-time conversations with AWS Nova Sonic. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with bidirectional audio streaming, text generation, and function calling capabilities.

Installation

To use AWS Nova Sonic services, install the required dependencies:
pip install "pipecat-ai[aws-nova-sonic]"

Prerequisites

AWS Account Setup

Before using AWS Nova Sonic services, you need:
  1. AWS Account: Set up at AWS Console
  2. Bedrock Access: Enable AWS Bedrock service in your region
  3. Model Access: Request access to Nova Sonic models in Bedrock
  4. IAM Credentials: Configure AWS access keys with Bedrock permissions

Required Environment Variables

  • AWS_SECRET_ACCESS_KEY: Your AWS secret access key
  • AWS_ACCESS_KEY_ID: Your AWS access key ID
  • AWS_REGION: AWS region where Bedrock is available

Key Features

  • Real-time Speech-to-Speech: Direct audio input to audio output processing
  • Built-in Transcription: Automatic speech-to-text with real-time streaming
  • Voice Activity Detection: Automatic detection of speech start/stop
  • Function Calling: Support for external function and API integration
  • Multiple Voices: Choose from matthew, tiffany, and amy voice options

Configuration

AWSNovaSonicLLMService

secret_access_key
str
required
AWS secret access key for authentication.
access_key_id
str
required
AWS access key ID for authentication.
session_token
str
default:"None"
AWS session token for temporary credentials (e.g., when using AWS STS).
region
str
required
AWS region where the service is hosted. Supported regions for Nova 2 Sonic (default): "us-east-1", "us-west-2", "ap-northeast-1". Supported regions for Nova Sonic (older model): "us-east-1", "ap-northeast-1".
model
str
default:"amazon.nova-2-sonic-v1:0"
Model identifier. Use "amazon.nova-2-sonic-v1:0" for the latest model or "amazon.nova-sonic-v1:0" for the older model.
voice_id
str
default:"matthew"
Voice ID for speech synthesis. Some voices are designed for specific languages. See AWS Nova 2 Sonic voice support for available voices.
params
Params
default:"Params()"
Model parameters for audio configuration and inference. See Params below.
system_instruction
str
default:"None"
System-level instruction for the model.
tools
ToolsSchema
default:"None"
Available tools/functions for the model to use.

Params

Audio and inference parameters that can be set at initialization via the params constructor argument.
ParameterTypeDefaultDescription
input_sample_rateint16000Audio input sample rate in Hz.
input_sample_sizeint16Audio input sample size in bits.
input_channel_countint1Number of input audio channels.
output_sample_rateint24000Audio output sample rate in Hz.
output_sample_sizeint16Audio output sample size in bits.
output_channel_countint1Number of output audio channels.
max_tokensint1024Maximum number of tokens to generate.
top_pfloat0.9Nucleus sampling parameter.
temperaturefloat0.7Sampling temperature for text generation.
endpointing_sensitivitystrNoneControls how quickly Nova Sonic decides the user has stopped speaking. Values: "LOW", "MEDIUM", or "HIGH". Only supported with Nova 2 Sonic (the default model).

Usage

Basic Setup

import os
from pipecat.services.aws.nova_sonic import AWSNovaSonicLLMService

llm = AWSNovaSonicLLMService(
    secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    region=os.getenv("AWS_REGION"),
    voice_id="matthew",
    system_instruction="You are a helpful assistant.",
)

With Custom Parameters

from pipecat.services.aws.nova_sonic import AWSNovaSonicLLMService, Params

llm = AWSNovaSonicLLMService(
    secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    region="us-east-1",
    model="amazon.nova-2-sonic-v1:0",
    voice_id="tiffany",
    system_instruction="You are a helpful assistant.",
    params=Params(
        temperature=0.5,
        max_tokens=2048,
        input_sample_rate=16000,
        output_sample_rate=24000,
        endpointing_sensitivity="MEDIUM",
    ),
)

With Function Calling

from pipecat.services.aws.nova_sonic import AWSNovaSonicLLMService

llm = AWSNovaSonicLLMService(
    secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    region="us-east-1",
    voice_id="matthew",
    system_instruction="You are a helpful assistant that can check the weather.",
    tools=tools,  # ToolsSchema instance
)

@llm.function("get_weather")
async def get_weather(function_name, tool_call_id, args, llm, context, result_callback):
    location = args.get("location", "unknown")
    await result_callback({"temperature": 72, "condition": "sunny", "location": location})

Notes

  • Model versions: Nova 2 Sonic (amazon.nova-2-sonic-v1:0) is the default and recommended model. The older Nova Sonic (amazon.nova-sonic-v1:0) has fewer features and requires an assistant response trigger mechanism.
  • Endpointing sensitivity: Only supported with Nova 2 Sonic. Controls how quickly the model decides the user has stopped speaking — "HIGH" causes the model to respond most quickly.
  • Transcription frames: User speech transcription frames are always emitted upstream.
  • Connection resilience: If a connection error occurs while the service wants to stay connected, it automatically resets the conversation and reconnects.
  • System instruction and tools precedence: Instructions and tools provided in the LLM context take precedence over those provided at initialization time.
  • Audio format: Uses LPCM (Linear PCM) audio format for both input and output. Input defaults to 16kHz and output defaults to 24kHz.