> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Smart Turn Overview

> Advanced conversational turn detection powered by the smart-turn model

## Overview

Smart Turn Detection is an advanced feature in Pipecat that determines when a user has finished speaking and the bot should respond. Unlike basic Voice Activity Detection (VAD) which only detects speech vs. non-speech, Smart Turn Detection uses a machine learning model to recognize natural conversational cues like intonation patterns and linguistic signals.

<CardGroup cols={3}>
  <Card title="Smart Turn Model" icon="github" href="https://github.com/pipecat-ai/smart-turn">
    Open source model for advanced conversational turn detection. Contribute to
    model training and development.
  </Card>

  <Card title="Data Collector" icon="microphone" href="https://turn-training.pipecat.ai/">
    Contribute conversational data to improve the smart-turn model
  </Card>

  <Card title="Data Classifier" icon="check-circle" href="https://smart-turn-dataset.pipecat.ai/">
    Help classify turn completion patterns in conversations
  </Card>
</CardGroup>

Pipecat provides `LocalSmartTurnAnalyzerV3` which runs inference locally using ONNX. This is the recommended approach due to the fast CPU inference times in Smart Turn v3.

<Note>
  As of v0.0.102, `TurnAnalyzerUserTurnStopStrategy` with
  `LocalSmartTurnAnalyzerV3` is the **default** user turn stop strategy in
  Pipecat. You no longer need to explicitly configure it unless you want to
  customize its parameters.
</Note>

## Installation

Smart Turn dependencies (`transformers`, `onnxruntime`) are included with the core `pipecat-ai` package — no extra installation is needed.

```bash theme={null}
uv add pipecat-ai
```

The Smart Turn model weights are bundled with Pipecat, so no need to download these separately.

## Integration with User Turn Strategies

Smart Turn Detection is integrated into your application by configuring a `TurnAnalyzerUserTurnStopStrategy` with `LocalSmartTurnAnalyzerV3` in your context aggregator:

```python theme={null}
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
)
from pipecat.transports.base_transport import TransportParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies

transport = SmallWebRTCTransport(
    webrtc_connection=webrtc_connection,
    params=TransportParams(
        audio_in_enabled=True,
    ),
)

user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    user_params=LLMUserAggregatorParams(
        user_turn_strategies=UserTurnStrategies(
            stop=[TurnAnalyzerUserTurnStopStrategy(
                turn_analyzer=LocalSmartTurnAnalyzerV3()
            )]
        ),
        vad_analyzer=SileroVADAnalyzer(),
    ),
)
```

<Tip>
  Smart Turn Detection requires VAD to be enabled and works best when the VAD
  analyzer is set to a short `stop_secs` value. We recommend 0.2 seconds, which
  is the default value.
</Tip>

## Configuration

The `SmartTurnParams` class configures turn detection behavior:

<ParamField path="stop_secs" type="float" default="3.0">
  Duration of silence in seconds required before triggering a silence-based end
  of turn
</ParamField>

<ParamField path="pre_speech_ms" type="float" default="0.0">
  Amount of audio (in milliseconds) to include before speech is detected
</ParamField>

<ParamField path="max_duration_secs" type="float" default="8.0">
  Maximum allowed segment duration in seconds. For segments longer than this
  value, a rolling window is used.
</ParamField>

## Local Smart Turn

The `LocalSmartTurnAnalyzerV3` runs inference locally. Version 3 of the model supports fast CPU inference on ordinary cloud instances.

### Constructor Parameters

<ParamField path="smart_turn_model_path" type="str | None" default="None">
  Path to the Smart Turn v3 ONNX file containing the model weights. Download this from
  [https://huggingface.co/pipecat-ai/smart-turn-v3/tree/main](https://huggingface.co/pipecat-ai/smart-turn-v3/tree/main)

  This parameter is optional, as Pipecat includes a copy of the model internally, and this
  is used if the path is unset.
</ParamField>

<ParamField path="sample_rate" type="int | None" default="None">
  Audio sample rate (will be set by the transport if not provided)
</ParamField>

<ParamField path="params" type="SmartTurnParams" default="SmartTurnParams()">
  Configuration parameters for turn detection
</ParamField>

### Example

```python theme={null}
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
)
from pipecat.transports.base_transport import TransportParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies

# Create the transport
transport = SmallWebRTCTransport(
    webrtc_connection=webrtc_connection,
    params=TransportParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
    ),
)

# Configure Smart Turn Detection via user turn strategies
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    user_params=LLMUserAggregatorParams(
        user_turn_strategies=UserTurnStrategies(
            stop=[TurnAnalyzerUserTurnStopStrategy(
                turn_analyzer=LocalSmartTurnAnalyzerV3()
            )]
        ),
        vad_analyzer=SileroVADAnalyzer(),
    ),
)
```

## How It Works

Smart Turn Detection continuously analyzes audio streams to identify natural turn completion points:

1. **Audio Buffering**: The system continuously buffers audio with timestamps, maintaining a small buffer of pre-speech audio.
2. **VAD Processing**: Voice Activity Detection (using the Silero model) detects when there is a pause in the user's speech.
3. **Smart Turn Analysis**: When VAD detects a pause in speech, the Smart Turn model analyzes the audio from the most recent 8 seconds of the user's turn, and makes a decision about whether the turn is complete or incomplete.

The system includes a fallback mechanism: if a turn is classified as incomplete but silence continues for longer than `stop_secs`, the turn is automatically marked as complete.

## Notes

* The model supports 23 languages, see the [source repository](https://github.com/pipecat-ai/smart-turn) for more details
* Smart Turn generally provides a more natural conversational experience but is computationally more intensive than simple VAD
* `LocalSmartTurnAnalyzerV3` is designed to run on CPU, and inference can be performed on low-cost cloud instances in under 100ms. However, by installing the `onnxruntime-gpu` dependency, you can achieve higher performance by making use of GPU inference.
