> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# AWS Nova Sonic

> Real-time speech-to-speech service implementation using AWS Nova Sonic

## Overview

`AWSNovaSonicLLMService` enables natural, real-time conversations with AWS Nova Sonic. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with bidirectional audio streaming, text generation, and function calling capabilities.

<CardGroup cols={2}>
  <Card title="AWS Nova Sonic API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.aws.nova_sonic.llm.html">
    Pipecat's API methods for AWS Nova Sonic integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/realtime/realtime-aws-nova-sonic.py">
    Complete AWS Nova Sonic conversation example
  </Card>

  <Card title="AWS Bedrock Documentation" icon="book" href="https://docs.aws.amazon.com/bedrock/">
    Official AWS Bedrock and Nova Sonic documentation
  </Card>

  <Card title="AWS Console" icon="external-link" href="https://console.aws.amazon.com/bedrock/">
    Access AWS Bedrock and manage Nova Sonic models
  </Card>
</CardGroup>

## Installation

To use AWS Nova Sonic services, install the required dependencies:

```bash theme={null}
uv add "pipecat-ai[aws-nova-sonic]"
```

## Prerequisites

### AWS Account Setup

Before using AWS Nova Sonic services, you need:

1. **AWS Account**: Set up at [AWS Console](https://console.aws.amazon.com/)
2. **Bedrock Access**: Enable AWS Bedrock service in your region
3. **Model Access**: Request access to Nova Sonic models in Bedrock
4. **IAM Credentials**: Configure AWS access keys with Bedrock permissions

### Required Environment Variables

* `AWS_SECRET_ACCESS_KEY`: Your AWS secret access key
* `AWS_ACCESS_KEY_ID`: Your AWS access key ID
* `AWS_REGION`: AWS region where Bedrock is available

### Key Features

* **Real-time Speech-to-Speech**: Direct audio input to audio output processing
* **Built-in Transcription**: Automatic speech-to-text with real-time streaming
* **Voice Activity Detection**: Automatic detection of speech start/stop
* **Function Calling**: Support for external function and API integration
* **Multiple Voices**: Choose from matthew, tiffany, and amy voice options

## Configuration

### AWSNovaSonicLLMService

<ParamField path="secret_access_key" type="str" required>
  AWS secret access key for authentication.
</ParamField>

<ParamField path="access_key_id" type="str" required>
  AWS access key ID for authentication.
</ParamField>

<ParamField path="session_token" type="str" default="None">
  AWS session token for temporary credentials (e.g., when using AWS STS).
</ParamField>

<ParamField path="region" type="str" required>
  AWS region where the service is hosted. Supported regions for Nova 2 Sonic
  (default): `"us-east-1"`, `"us-west-2"`, `"ap-northeast-1"`. Supported regions
  for Nova Sonic (older model): `"us-east-1"`, `"ap-northeast-1"`.
</ParamField>

<ParamField path="model" type="str" default="amazon.nova-2-sonic-v1:0" deprecated>
  Model identifier. Use `"amazon.nova-2-sonic-v1:0"` for the latest model or
  `"amazon.nova-sonic-v1:0"` for the older model.

  *Deprecated in v0.0.105. Use `settings=AWSNovaSonicLLMService.Settings(model=...)` instead.*
</ParamField>

<ParamField path="voice_id" type="str" default="matthew" deprecated>
  Voice ID for speech synthesis. Some voices are designed for specific
  languages. See [AWS Nova 2 Sonic voice
  support](https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-language-support.html)
  for available voices.

  *Deprecated in v0.0.105. Use `settings=AWSNovaSonicLLMService.Settings(voice=...)` instead.*
</ParamField>

<ParamField path="params" type="Params" default="Params()" deprecated>
  Model parameters for audio configuration and inference. See [Params](#params)
  below.

  *Deprecated in v0.0.105. Use `settings=AWSNovaSonicLLMService.Settings(...)` for inference settings and `audio_config=AudioConfig(...)` for audio configuration.*
</ParamField>

<ParamField path="audio_config" type="AudioConfig" default="None">
  Audio configuration (sample rates, sample sizes, channel counts). If not
  provided, defaults are used (16kHz input, 24kHz output, 16-bit, mono). See
  [AudioConfig](#audioconfig) below.
</ParamField>

<ParamField path="settings" type="AWSNovaSonicLLMService.Settings" default="None">
  Runtime-configurable settings. See [Settings](#settings) below.
</ParamField>

<ParamField path="system_instruction" type="str" default="None" deprecated>
  System-level instruction for the model.

  *Deprecated in v0.0.105. Use `settings=AWSNovaSonicLLMService.Settings(system_instruction=...)` instead.*
</ParamField>

<ParamField path="tools" type="ToolsSchema" default="None">
  Available tools/functions for the model to use.
</ParamField>

<ParamField path="session_continuation" type="SessionContinuationParams" default="None">
  Configuration for automatic session continuation. When enabled (the default),
  sessions are seamlessly rotated before the AWS time limit (\~8 minutes) with no
  user-perceptible interruption. See
  [SessionContinuationParams](#sessioncontinuationparams) below.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `AWSNovaSonicLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter                 | Type          | Default     | Description                                                                                                                                                        |
| ------------------------- | ------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `model`                   | `str`         | `NOT_GIVEN` | Model identifier. *(Inherited from base settings.)*                                                                                                                |
| `system_instruction`      | `str`         | `NOT_GIVEN` | System instruction/prompt. *(Inherited from base settings.)*                                                                                                       |
| `temperature`             | `float`       | `NOT_GIVEN` | Sampling temperature for text generation. *(Inherited from base settings.)*                                                                                        |
| `max_tokens`              | `int`         | `NOT_GIVEN` | Maximum number of tokens to generate. *(Inherited from base settings.)*                                                                                            |
| `top_p`                   | `float`       | `NOT_GIVEN` | Nucleus sampling parameter. *(Inherited from base settings.)*                                                                                                      |
| `voice`                   | `str`         | `NOT_GIVEN` | Voice ID for speech synthesis.                                                                                                                                     |
| `endpointing_sensitivity` | `str \| None` | `NOT_GIVEN` | Controls how quickly Nova Sonic decides the user has stopped speaking. Values: `"LOW"`, `"MEDIUM"`, or `"HIGH"`. Only supported with Nova 2 Sonic (default model). |

<Note>
  `NOT_GIVEN` values are omitted, letting the service use its own defaults (e.g.
  `"amazon.nova-2-sonic-v1:0"` for model, `"matthew"` for voice, `0.7` for
  temperature, `1024` for max\_tokens). Only parameters that are explicitly set
  are included.
</Note>

### AudioConfig

Audio configuration passed via the `audio_config` constructor argument.

| Parameter              | Type  | Default | Description                       |
| ---------------------- | ----- | ------- | --------------------------------- |
| `input_sample_rate`    | `int` | `16000` | Audio input sample rate in Hz.    |
| `input_sample_size`    | `int` | `16`    | Audio input sample size in bits.  |
| `input_channel_count`  | `int` | `1`     | Number of input audio channels.   |
| `output_sample_rate`   | `int` | `24000` | Audio output sample rate in Hz.   |
| `output_sample_size`   | `int` | `16`    | Audio output sample size in bits. |
| `output_channel_count` | `int` | `1`     | Number of output audio channels.  |

### SessionContinuationParams

Configuration for automatic session continuation, passed via the `session_continuation` constructor argument. Nova Sonic sessions have an AWS-imposed time limit (\~8 minutes). When enabled, session continuation proactively creates a new session in the background before the limit is reached, buffers user audio during the transition, and seamlessly hands off — preserving conversation context with no user-perceptible gap.

| Parameter                       | Type    | Default | Description                                                                                                                                                                                                                 |
| ------------------------------- | ------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `enabled`                       | `bool`  | `True`  | Whether automatic session continuation is enabled.                                                                                                                                                                          |
| `transition_threshold_seconds`  | `float` | `360.0` | How many seconds into a session to begin monitoring for a transition opportunity. The transition will occur when the assistant next starts speaking after this threshold.                                                   |
| `audio_buffer_duration_seconds` | `float` | `3.0`   | Duration of the rolling audio buffer (in seconds) that captures user audio during the transition window. This audio is replayed into the new session so no user input is lost.                                              |
| `audio_start_timeout_seconds`   | `float` | `80.0`  | Maximum time to wait for the assistant to start speaking after the threshold is reached. If no assistant audio arrives within this window, the transition is forced. Set to `0` to disable the timeout (wait indefinitely). |

## Usage

### Basic Setup

```python theme={null}
import os
from pipecat.services.aws.nova_sonic import AWSNovaSonicLLMService

llm = AWSNovaSonicLLMService(
    secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    region=os.getenv("AWS_REGION"),
    settings=AWSNovaSonicLLMService.Settings(
        voice="matthew",
        system_instruction="You are a helpful assistant.",
    ),
)
```

### With Settings

```python theme={null}
from pipecat.services.aws.nova_sonic import AWSNovaSonicLLMService, AudioConfig

llm = AWSNovaSonicLLMService(
    secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    region="us-east-1",
    audio_config=AudioConfig(
        input_sample_rate=16000,
        output_sample_rate=24000,
    ),
    settings=AWSNovaSonicLLMService.Settings(
        model="amazon.nova-2-sonic-v1:0",
        voice="tiffany",
        system_instruction="You are a helpful assistant.",
        temperature=0.5,
        max_tokens=2048,
        endpointing_sensitivity="MEDIUM",
    ),
)
```

### With Function Calling

```python theme={null}
from pipecat.services.aws.nova_sonic import AWSNovaSonicLLMService

llm = AWSNovaSonicLLMService(
    secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    region="us-east-1",
    settings=AWSNovaSonicLLMService.Settings(
        voice="matthew",
        system_instruction="You are a helpful assistant that can check the weather.",
    ),
    tools=tools,  # ToolsSchema instance
)

@llm.function("get_weather")
async def get_weather(function_name, tool_call_id, args, llm, context, result_callback):
    location = args.get("location", "unknown")
    await result_callback({"temperature": 72, "condition": "sunny", "location": location})
```

### With Session Continuation

```python theme={null}
from pipecat.services.aws.nova_sonic import AWSNovaSonicLLMService
from pipecat.services.aws.nova_sonic.session_continuation import SessionContinuationParams

llm = AWSNovaSonicLLMService(
    secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    region="us-east-1",
    settings=AWSNovaSonicLLMService.Settings(
        voice="tiffany",
        system_instruction="You are a helpful assistant.",
    ),
    # Session continuation is enabled by default. You can customize the behavior:
    session_continuation=SessionContinuationParams(
        enabled=True,
        transition_threshold_seconds=360,  # Start transition after 6 minutes
        audio_buffer_duration_seconds=3.0,  # Buffer 3 seconds of audio during transition
        audio_start_timeout_seconds=80.0,  # Force transition if no response within 80s
    ),
)

# To disable session continuation:
# session_continuation=SessionContinuationParams(enabled=False)
```

<Tip>
  The `Params` / `params=` pattern is deprecated as of v0.0.105. Use `Settings`
  / `settings=` for inference settings and `AudioConfig` / `audio_config=` for
  audio configuration instead. See the [Service Settings
  guide](/pipecat/fundamentals/service-settings) for migration details.
</Tip>

## Notes

* **Model versions**: Nova 2 Sonic (`amazon.nova-2-sonic-v1:0`) is the default and recommended model. The older Nova Sonic (`amazon.nova-sonic-v1:0`) has fewer features and requires an assistant response trigger mechanism.
* **Session continuation**: Enabled by default to handle AWS's \~8-minute session limit. The service automatically rotates sessions in the background with no user-perceptible interruption, preserving conversation context and buffering user audio during the transition. You can tune the threshold or disable it via `session_continuation` parameter.
* **Endpointing sensitivity**: Only supported with Nova 2 Sonic. Controls how quickly the model decides the user has stopped speaking -- `"HIGH"` causes the model to respond most quickly.
* **Transcription frames**: User speech transcription frames are always emitted upstream. Assistant text transcripts are delivered in real-time using speculative text events, providing text synchronized with audio output for responsive client UIs.
* **Connection resilience**: If a connection error occurs while the service wants to stay connected, it automatically resets the conversation and reconnects.
* **System instruction precedence**: The `system_instruction` from service settings takes precedence over an initial system message in the LLM context. A warning is logged when both are set. Tools provided in the LLM context take precedence over those provided at initialization time.
* **Audio format**: Uses LPCM (Linear PCM) audio format for both input and output. Input defaults to 16kHz and output defaults to 24kHz.
