# agent
Source: https://docs.pipecat.ai/api-reference/cli/cloud/agent
Manage agent deployments
The `agent` command provides sub-commands for managing your deployed agents. These commands allow you to view status, start agents, see logs, and manage deployments.
## start
Start a deployed agent instance, creating an active session.
**Usage:**
```shell theme={null}
pipecat cloud agent start [ARGS] [OPTIONS]
```
**Arguments:**
Unique string identifier for the agent deployment. Must not contain spaces.
**Options:**
Path to an alternate deploy config file. Defaults to `pcc-deploy.toml`.
Public API key to authenticate the agent deployment. Will default to any key
set in your config.
For more information, see [API keys](/pipecat-cloud/fundamentals/accounts-and-organizations#api-keys).
Stringified JSON object to pass to the agent deployment. This data will be
available to the agent as a `data` parameter in your `bot()` method.
More information [here](/pipecat-cloud/fundamentals/active-sessions#running-an-agent).
Skip summary confirmation before issuing start request.
Create a Daily WebRTC session for the agent.
Stringified JSON object with Daily room properties to customize the WebRTC
session. Only used when `--use-daily` is set to true.
See [Daily API
documentation](https://docs.daily.co/reference/rest-api/rooms/config) for
available properties.
Organization to start the agent for. If not provided, uses the current
organization from your configuration.
## stop
Stop an active agent session and clean up its resources.
**Usage:**
```shell theme={null}
pcc agent stop [ARGS] [OPTIONS]
```
**Arguments:**
Name of the agent. Must not contain spaces.
**Options:**
Path to an alternate deploy config file. Defaults to `pcc-deploy.toml`.
ID of the session to stop.
Organization which the agent belongs to. If not provided, uses the current
organization from your configuration.
Bypass prompt for confirmation before stopping the session.
## status
Shows the current status of an agent deployment, including health and conditions.
**Usage:**
```shell theme={null}
pipecat cloud agent status [ARGS]
```
**Arguments:**
Unique string identifier for the agent deployment. Must not contain spaces.
## deployments
Lists deployment history for an agent, including image versions and timestamps.
**Usage:**
```shell theme={null}
pipecat cloud agent deployments [ARGS]
```
**Arguments:**
Unique string identifier for the agent deployment. Must not contain spaces.
## logs
Displays combined logs from all agent instances, useful for debugging issues.
**Usage:**
```shell theme={null}
pipecat cloud agent logs [ARGS] [OPTIONS]
```
**Arguments:**
Unique string identifier for the agent deployment. Must not contain spaces.
**Options:**
Filter logs by severity: `ALL`, `DEBUG`, `INFO`, `WARNING`, `ERROR`,
`CRITICAL`.
Limit the number of log lines to display.
Filter results for specific agent deployment ID (obtainable from `pipecat
cloud agent deployments [agent-name]`).
Filter results for specific session ID (obtainable from `pipecat cloud agent
sessions [agent-name]`).
## list
Lists all agents in an organization with their details.
**Usage:**
```shell theme={null}
pipecat cloud agent list [OPTIONS]
```
**Options:**
Organization to list agents for. If not provided, uses the current
organization from your configuration.
Filter agents by region. Only agents deployed in the specified region will be
shown. If not provided, agents from all regions are listed.
## sessions
Lists active sessions for a specified agent. When there are no active sessions, it suggests how to start a new session.
When used with the `--id` option, displays detailed information about a specific session including CPU and memory usage with sparkline visualizations and percentile summaries.
**Usage:**
```shell theme={null}
pipecat cloud agent sessions [ARGS] [OPTIONS]
```
**Arguments:**
Name of the agent to list active sessions for.
**Options:**
Path to an alternate deploy config file. Defaults to `pcc-deploy.toml`.
Session ID to view detailed metrics for. When provided, displays CPU and
memory usage statistics including sparkline visualizations and percentile
summaries (p50, p90, p99).
Organization to list sessions for. If not provided, uses the current
organization from your configuration.
## delete
Deletes an agent deployment. This will prevent starting new agents and remove all associated data.
This action is irreversible. All data will be lost.
**Usage:**
```shell theme={null}
pipecat cloud agent delete [ARGS] [OPTIONS]
```
**Arguments:**
Unique string identifier for the agent deployment. Must not contain spaces.
**Options:**
Do not prompt for confirmation before deleting the agent.
# auth
Source: https://docs.pipecat.ai/api-reference/cli/cloud/auth
Authentication and authorization commands
The `auth` command group manages authentication with Pipecat Cloud, allowing you to login, logout, and check your current account identity.
## login
Begins an authorization flow to authenticate with Pipecat Cloud.
**Usage:**
```shell theme={null}
pipecat cloud auth login [OPTIONS]
```
**Options:**
Skip opening a browser window for authentication and print the URL instead.
Useful for remote or containerized environments.
This command initiates the authentication process by:
1. Generating a unique authentication URL
2. Opening your default web browser (if available)
3. Waiting for you to complete the sign-in process in the browser
4. Retrieving and storing your access token locally
If the browser doesn't open automatically, or if you use the `--headless` option, you can copy and paste the displayed URL into your browser manually.
On successful login, the CLI will store the access token in the local configuration file (defaults to `~/.config/pipecatcloud/pipecatcloud.toml`). This token will be used for all subsequent requests to the Pipecat Cloud API.
You can override the default location by setting the `PIPECAT_CONFIG_PATH` environment variable.
## logout
Signs out of the current Pipecat Cloud account by removing the access token from your local configuration file.
**Usage:**
```shell theme={null}
pipecat cloud auth logout
```
## use-pat
Authenticates with a [Personal Access Token](/pipecat-cloud/guides/personal-access-tokens) instead of interactive OAuth login. Validates the token against the API and stores it in your local config file.
**Usage:**
```shell theme={null}
pipecat cloud auth use-pat
```
**Arguments:**
Personal Access Token (must start with `pcc_pat_`).
You can also set the `PIPECAT_TOKEN` environment variable instead of storing
the token locally. See the [PAT guide](/pipecat-cloud/guides/personal-access-tokens)
for details.
## whoami
Displays information about the currently authenticated user, including user ID, active organization, auth method, and Daily API key.
**Usage:**
```shell theme={null}
pipecat cloud auth whoami
```
## Configuration
CLI configuration is stored in `~/.config/pipecatcloud/pipecatcloud.toml`. You can override this location by setting the `PIPECAT_CONFIG_PATH` environment variable.
The configuration stores your access token, active organization, and default API keys.
**View current configuration:**
```shell theme={null}
pipecat cloud --show-cli-config
```
We do not recommend manually editing the configuration file. If the file becomes malformed, run `pipecat cloud auth login` to regenerate it.
```toml theme={null}
token = "..."
org = "user-namespace"
[another-user-org]
default_public_key = "pk_..."
default_public_key_name = "Test Key"
[futher-user-org]
default_public_key = "pk_..."
default_public_key_name = "Pipecat Cloud Public Key"
```
***
Managing your account and collaborating on agents as part of a team
# build
Source: https://docs.pipecat.ai/api-reference/cli/cloud/build
Manage cloud builds
The `build` command provides tools for managing cloud builds on Pipecat Cloud. You can view build logs, check build status, and list recent builds.
Cloud builds are typically triggered automatically when you run [`pipecat cloud deploy`](/api-reference/cli/cloud/deploy) without specifying an image. These commands help you monitor and manage those builds.
## logs
View logs for a cloud build.
**Usage:**
```shell theme={null}
pipecat cloud build logs [OPTIONS] BUILD_ID
```
**Arguments:**
The ID of the build to get logs for. You can find build IDs using `pipecat cloud build list`.
**Options:**
Number of log lines to retrieve. Maximum is 10000.
Organization to use. If not provided, uses the current organization from your configuration.
## status
Get the current status of a cloud build.
**Usage:**
```shell theme={null}
pipecat cloud build status [OPTIONS] BUILD_ID
```
**Arguments:**
The ID of the build to check.
**Options:**
Organization to use. If not provided, uses the current organization from your configuration.
The status command displays detailed build information including:
* Current status (`pending`, `building`, `success`, `failed`, `timeout`)
* Region, context hash, and Dockerfile path
* Timestamps for creation, start, and completion
* Build duration and context/image sizes
* Error messages (if the build failed)
## list
List recent cloud builds.
**Usage:**
```shell theme={null}
pipecat cloud build list [OPTIONS]
```
**Options:**
Number of builds to list. Maximum is 100.
Filter by build status. Valid values: `pending`, `building`, `success`, `failed`, `timeout`.
Filter by region.
Organization to use. If not provided, uses the current organization from your configuration.
## Examples
**List all recent builds:**
```shell theme={null}
pipecat cloud build list
```
**List only failed builds:**
```shell theme={null}
pipecat cloud build list --status failed
```
**View logs for a specific build:**
```shell theme={null}
pipecat cloud build logs abc12345-6789-0abc-def0-123456789abc
```
**Check status of a build:**
```shell theme={null}
pipecat cloud build status abc12345-6789-0abc-def0-123456789abc
```
# deploy
Source: https://docs.pipecat.ai/api-reference/cli/cloud/deploy
Create or modify an agent deployment
The `deploy` command creates a new agent deployment or updates an existing one. It builds a deployment manifest with the provided parameters and monitors the deployment status until the agent is ready.
When no image is specified, the CLI will offer to build your agent using [Pipecat Cloud Build](/pipecat-cloud/guides/cloud-builds). This handles building and deploying without requiring you to manage a container registry. A `Dockerfile` must be present in your build context directory.
If the agent name already exists, you'll be prompted to confirm the update unless the `--force` flag is used.
This command will wait for the active deployment / revision to enter a ready state before returning. If the deployment fails, the command will exit with an [error](/pipecat-cloud/fundamentals/error-codes) with more information.
## Usage
```shell theme={null}
pipecat cloud deploy [ARGS] [OPTIONS]
```
**Arguments:**
Unique string identifier for the agent deployment. Must not contain spaces.
URL of the Docker image to deploy. Must be a valid Docker image URL. For
example: `docker.io/my-repo/my-image:latest`. Not required when using [cloud
builds](/pipecat-cloud/guides/cloud-builds) or `--build-id`.
**Options:**
Path to an alternate deploy config file. Defaults to `pcc-deploy.toml` in the
current directory. Can also be set via the `PIPECAT_DEPLOY_CONFIG_PATH`
environment variable, with the flag taking precedence.
Name of the image pull secret to use for accessing private repositories. The
secret must be previously created using the `pipecat cloud secrets
image-pull-secret` command.
Organization to deploy the agent to. If not provided, uses the current
organization from your configuration.
Name of the secret set to use for the deployment. The secret set must exist in
the specified organization.
Minimum number of agent instances to keep warm at all times. Default is 0,
which means the agent will scale down to zero when not in use. Setting this to
1 or higher avoids cold starts.
Maximum number of allowed agent instances. Must be between 1 and 50. If you
need more agents, please contact us at [help@daily.co](mailto:help@daily.co) or via
[Discord](https://discord.gg/dailyco).
Enable Krisp VIVA noise cancellation with the specified audio filter model.
Valid values are:
* `tel`: Telephony model (up to 16kHz)
* `pro`: WebRTC model (up to 32kHz)
In addition to this flag, you also need to enable the `KrispVivaFilter()` for your transport. See the [Krisp VIVA](/api-reference/server/utilities/audio/krisp-viva-filter) documentation for more information.
The agent profile to use for resource allocation. Valid values are:
`agent-1x`, `agent-2x`, `agent-3x`.
See [Agent Profiles](/pipecat-cloud/fundamentals/deploy#agent-profiles) for more information.
Region where the agent will be deployed. If not specified, uses your
organization's default region (typically `us-west`). Choose a region close to
your users for optimal latency.
Force deployment and skip confirmation prompts. Use with caution.
Skip all confirmation prompts, including cloud build prompts. Useful for CI/CD
pipelines. When used without an image, automatically triggers a cloud build.
Deploy using an existing cloud build ID instead of building a new image.
Cannot be used together with `--image`. You can find build IDs using `pipecat
cloud build list`.
Build context directory for cloud builds. Defaults to the current directory.
Path to Dockerfile for cloud builds. Defaults to `Dockerfile` in the build
context directory.
## Examples
**Deploy a new agent:**
```shell theme={null}
pipecat cloud deploy my-first-agent your-docker-repository/my-first-agent:0.1
```
**Update an existing agent with a new image:**
```shell theme={null}
pipecat cloud deploy my-first-agent your-docker-repository/my-first-agent:0.2
```
**Deploy with a specific secret set:**
```shell theme={null}
pipecat cloud deploy my-first-agent your-docker-repository/my-first-agent:0.1 --secrets my-secret-set
```
**Deploy a private image using image pull credentials:**
```shell theme={null}
pipecat cloud deploy my-first-agent your-docker-repository/my-first-agent:0.1 --credentials dockerhub-creds
```
**Keep one instance always warm to avoid cold starts:**
```shell theme={null}
pipecat cloud deploy my-first-agent your-docker-repository/my-first-agent:0.1 --min-agents 1
```
**Limit the maximum number of agent instances:**
```shell theme={null}
pipecat cloud deploy my-first-agent your-docker-repository/my-first-agent:0.1 --max-agents 5
```
**Deploy to a specific region:**
```shell theme={null}
pipecat cloud deploy my-first-agent your-docker-repository/my-first-agent:0.1 --region eu-central
```
**Deploy with Krisp VIVA noise cancellation:**
```shell theme={null}
pipecat cloud deploy my-first-agent your-docker-repository/my-first-agent:0.1 --krisp-viva-audio-filter tel
```
**Deploy using cloud build (interactive):**
```shell theme={null}
pipecat cloud deploy
```
**Deploy using cloud build in CI/CD (no prompts):**
```shell theme={null}
pipecat cloud deploy --yes
```
**Deploy using an existing build:**
```shell theme={null}
pipecat cloud deploy --build-id abc12345-6789-0abc-def0-123456789abc
```
**Deploy using an alternate config file:**
```shell theme={null}
pipecat cloud deploy --config-file pcc-deploy.staging.toml
```
## Configuration File (pcc-deploy.toml)
The `deploy` command supports a configuration file for repeatable deployments. Create a `pcc-deploy.toml` file in your project root to define deployment settings that can be shared across your team and version controlled.
### File Location
Place `pcc-deploy.toml` in the same directory where you run the `pipecat cloud deploy` command. The CLI will automatically detect and use this file.
To use an alternate config file (e.g. for different environments), pass `--config-file`:
```shell theme={null}
pipecat cloud deploy --config-file pcc-deploy.staging.toml
```
You can also set the `PIPECAT_DEPLOY_CONFIG_PATH` environment variable, with the `--config-file` flag taking precedence.
### Precedence
Values are applied with the following order of precedence:
1. CLI arguments (highest priority)
2. `pcc-deploy.toml` values
3. Default values (lowest priority)
This allows you to define defaults in the config file while still overriding specific values via CLI flags when needed.
### Configuration Options
#### Required Fields
Name of the agent to deploy. Must start with a lowercase letter or number, can
include hyphens, and must end with a lowercase letter or number.
```toml theme={null}
agent_name = "my-voice-agent"
```
Docker image URL with tag. Required when not using cloud builds or `build_id`.
```toml theme={null}
image = "your-dockername/my-agent:0.1"
```
You must specify either `image` or `build_id`, or omit both to trigger a cloud
build at deploy time.
#### Optional Fields
An existing cloud build ID to deploy. Cannot be used together with `image`.
```toml theme={null}
build_id = "abc12345-6789-0abc-def0-123456789abc"
```
Region where the agent will be deployed. If not specified, uses your
organization's default region (typically `us-west`).
```toml theme={null}
region = "us-east"
```
Name of the secret set to use for environment variables. The secret set must
exist in the same region as the agent.
```toml theme={null}
secret_set = "my-agent-secrets"
```
Name of the image pull secret for private registries. The image pull secret must exist in the same region as the agent.
```toml theme={null}
image_credentials = "dockerhub-credentials"
```
Agent profile for resource allocation. Valid values: `agent-1x`, `agent-2x`, `agent-3x`.
```toml theme={null}
agent_profile = "agent-2x"
```
**Deprecated:** Enable legacy Krisp noise cancellation. Use `krisp_viva` instead.
```toml theme={null}
enable_krisp = false
```
#### Scaling Configuration
Define auto-scaling behavior in a `[scaling]` section:
Minimum number of agent instances to keep warm. Setting to 0 allows scaling to zero but may result in cold starts.
```toml theme={null}
[scaling]
min_agents = 1
```
Maximum number of agent instances allowed.
```toml theme={null}
[scaling]
max_agents = 20
```
#### Krisp VIVA Configuration
Configure Krisp VIVA noise cancellation in a `[krisp_viva]` section:
Krisp VIVA audio filter model. Valid values: `tel` (telephony, up to 16kHz) or `pro` (WebRTC, up to 32kHz). Omit or set to `null` to disable.
```toml theme={null}
[krisp_viva]
audio_filter = "tel"
```
#### Cloud Build Configuration
Configure cloud build behavior in a `[build]` section:
Directory to use as the build context. Defaults to the current directory.
```toml theme={null}
[build]
context_dir = "."
```
Path to the Dockerfile within the build context.
```toml theme={null}
[build]
dockerfile = "Dockerfile"
```
Additional file patterns to exclude from the build context. Common patterns like `.git`, `.env`, `__pycache__`, and `.venv` are excluded automatically.
```toml theme={null}
[build.exclude]
patterns = ["*.md", "tests/", "docs/"]
```
### Complete Example
```toml theme={null}
# Basic agent configuration
agent_name = "my-voice-agent"
region = "us-west"
# Secrets
secret_set = "my-agent-secrets"
# Resource allocation
agent_profile = "agent-1x"
# Auto-scaling configuration
[scaling]
min_agents = 1
max_agents = 20
# Cloud build configuration (optional, shown with defaults)
[build]
context_dir = "."
dockerfile = "Dockerfile"
[build.exclude]
patterns = ["*.md", "tests/"]
```
```toml theme={null}
# Basic agent configuration
agent_name = "my-voice-agent"
image = "your-dockername/my-voice-agent:0.1"
region = "us-west"
# Secrets and credentials
secret_set = "my-agent-secrets"
image_credentials = "dockerhub-credentials"
# Resource allocation
agent_profile = "agent-1x"
# Auto-scaling configuration
[scaling]
min_agents = 1
max_agents = 20
# Krisp VIVA noise cancellation
[krisp_viva]
audio_filter = "tel"
```
### Using the Configuration File
Once you have a `pcc-deploy.toml` file, simply run:
```shell theme={null}
pipecat cloud deploy
```
The CLI will automatically load your configuration. You can still override any value using CLI flags:
```shell theme={null}
# Use config file but override the region
pipecat cloud deploy --region eu-central
# Use config file but force update without confirmation
pipecat cloud deploy --force
```
# docker
Source: https://docs.pipecat.ai/api-reference/cli/cloud/docker
Build and push Docker images for agent deployments
The `docker` command provides utilities for building, tagging, and pushing Docker images to container registries. This command automatically parses registry information from your deployment configuration and supports both Docker Hub and custom registries.
## build-push
Build, tag, and push a Docker image for your agent deployment. This command reads configuration from your `pcc-deploy.toml` file to automatically determine registry settings, image names, and versions.
**Usage:**
```shell theme={null}
pipecat cloud docker build-push [ARGS] [OPTIONS]
```
**Arguments:**
Name of the agent to build image for. If not provided, uses the `agent_name`
from your `pcc-deploy.toml` file.
**Options:**
Path to an alternate deploy config file. Defaults to `pcc-deploy.toml`.
**Registry Configuration:**
Registry type to push to. Supported values: `dockerhub`, `custom`. When not
specified, automatically detected from the `image` field in your
`pcc-deploy.toml` file.
Registry username for authentication. When not specified, automatically parsed
from the `image` field in your `pcc-deploy.toml` file (e.g., `myusername` from
`myusername/app:1.0`).
Custom registry URL (required for custom registries). When not specified,
automatically parsed from the `image` field for custom registries (e.g.,
`gcr.io` from `gcr.io/project/app:1.0`).
**Build Configuration:**
Version tag for the image. When not specified, automatically extracted from
the `image` field in your `pcc-deploy.toml` file (e.g., `1.0` from
`myusername/app:1.0`).
Build and tag only, do not push to registry. Useful for local testing or when
you want to push manually later.
Do not tag the image as `latest`. By default, images are tagged with both the
specified version and `latest`.
## Configuration
The `docker build-push` command reads configuration from your `pcc-deploy.toml` file to minimize required command-line arguments. Here's how different registry setups work:
### Docker Hub (Default)
For Docker Hub repositories, the minimal configuration is:
```toml theme={null}
agent_name = "my-agent"
image = "myusername/my-agent:1.0"
```
This automatically configures:
* Registry: `dockerhub`
* Username: `myusername`
* Agent name: `my-agent`
* Version: `1.0`
### Custom Registry
For custom registries like Google Container Registry, AWS ECR, or private registries:
```toml theme={null}
agent_name = "my-agent"
image = "gcr.io/my-project/my-agent:1.0"
```
This automatically configures:
* Registry: `custom`
* Registry URL: `gcr.io`
* Username/Project: `my-project`
* Agent name: `my-agent`
* Version: `1.0`
### Docker Configuration Section
For advanced configuration, add a `[docker]` section:
```toml theme={null}
agent_name = "my-agent"
image = "myusername/my-agent:1.0"
[docker]
auto_latest = false # Don't tag as 'latest'
```
Available `[docker]` options:
Whether to automatically tag the image as `latest` in addition to the
specified version.
## Examples
### Basic Usage (Recommended)
With a properly configured `pcc-deploy.toml`:
```shell theme={null}
# Build and push using all configuration from pcc-deploy.toml
pipecat cloud docker build-push
```
### Override Version
```shell theme={null}
# Use a different version than what's in pcc-deploy.toml
pipecat cloud docker build-push --version 2.0
```
### Build Only
```shell theme={null}
# Build and tag locally without pushing
pipecat cloud docker build-push --no-push
```
### Different Registry
```shell theme={null}
# Override registry settings for one-time builds
pipecat cloud docker build-push --registry custom --registry-url my-registry.com --username myuser
```
### Skip Latest Tag
```shell theme={null}
# Only tag with the specific version, not 'latest'
pipecat cloud docker build-push --no-latest
```
## Platform Support
All images are built for the `linux/arm64` platform, which is required for Pipecat Cloud deployments. This is automatically configured and cannot be changed.
## Error Handling
The command provides helpful error messages for common issues:
* **Authentication errors**: Suggests the appropriate `docker login` command
* **Missing Dockerfile**: Indicates that a Dockerfile must be present in the current directory
* **Registry access issues**: Provides guidance on checking permissions and authentication
# organizations
Source: https://docs.pipecat.ai/api-reference/cli/cloud/organizations
Organization and API key management commands
The `organizations` command group helps you manage your Pipecat Cloud organizations and API keys. You can list and select organizations, as well as create, list, and manage API keys for use with the platform.
Organization and user management is not available via the CLI. Please use the
[Pipecat Cloud Dashboard](https://pipecat.daily.co) to manage organizations
and users.
## list
Lists all organizations that your account has access to, highlighting the currently active one used for CLI operations.
**Usage:**
```shell theme={null}
pipecat cloud organizations list
```
## select
Changes your active organization for CLI operations.
This command either presents an interactive selection menu or directly sets a specified organization as your default. The selection is stored in your local configuration file (defaults to `~/.config/pipecatcloud/pipecatcloud.toml`) and used for all subsequent CLI commands.
**Usage:**
```shell theme={null}
pipecat cloud organizations select [OPTIONS]
```
**Options:**
Bypass prompt by directly specifying a namespace / organization string.
## keys
The `keys` sub-commands manage API keys for authenticating with Pipecat Cloud services.
### keys list
Lists all API keys for the current organization.
**Usage:**
```shell theme={null}
pipecat cloud organizations keys list [OPTIONS]
```
**Options:**
Organization ID to list keys for. If not provided, the default organization
will be used.
### keys create
Create a new public API key for account / organization. Command will prompt the user to enter
a human-readable name for the key.
**Usage:**
```shell theme={null}
pipecat cloud organizations keys create [OPTIONS]
```
**Options:**
Organization ID to create key for. If not provided, the default organization
will be used.
### keys delete
Delete an API key from your organization. Command will prompt the user to select which key they wish to delete.
**Usage:**
```shell theme={null}
pipecat cloud organizations keys delete [OPTIONS]
```
**Options:**
Organization ID to delete key for. If not provided, the default organization
will be used.
### keys use
Sets a specific API key as your default for CLI operations.
The selected key is stored in the local configuration file (defaults to `~/.config/pipecatcloud/pipecatcloud.toml`) and will be used for all subsequent requests to the Pipecat Cloud API.
Please note that the key must be associated with the same user account or organization as you are making requests to.
If the public key is revoked or deleted via the dashboard, the user will need to re-run this command to select a new key.
**Usage:**
```shell theme={null}
pipecat cloud organizations keys use [OPTIONS]
```
**Options:**
Organization ID to select default key from. If not provided, the default
organization will be used.
## properties
The `properties` sub-commands manage organization properties such as default region settings.
### properties list
Lists all current property values for your organization.
**Usage:**
```shell theme={null}
pipecat cloud organizations properties list [OPTIONS]
```
**Options:**
Organization ID to list properties for. If not provided, the default
organization will be used.
### properties schema
Shows available properties with detailed metadata including type information, current values, default values, and available values.
**Usage:**
```shell theme={null}
pipecat cloud organizations properties schema [OPTIONS]
```
**Options:**
Organization ID to show properties schema for. If not provided, the default
organization will be used.
### properties set
Updates a specific organization property.
**Usage:**
```shell theme={null}
pipecat cloud organizations properties set PROPERTY_NAME VALUE [OPTIONS]
```
**Arguments:**
Name of the property to set (e.g., `defaultRegion`)
Value to set for the property
**Options:**
Organization ID to update property for. If not provided, the default
organization will be used.
**Example:**
```shell theme={null}
pipecat cloud organizations properties set defaultRegion eu-central
```
## default-region
Convenience command to get or set the default region for your organization.
**Usage:**
```shell theme={null}
pipecat cloud organizations default-region [REGION] [OPTIONS]
```
**Arguments:**
Region to set as default. If omitted, displays the current default region and
available regions.
**Options:**
Organization ID to configure. If not provided, the default organization will
be used.
**Examples:**
View current default region:
```shell theme={null}
pipecat cloud organizations default-region
```
Set default region:
```shell theme={null}
pipecat cloud organizations default-region eu-central
```
***
Managing your account and collaborating on agents as part of a team
# regions
Source: https://docs.pipecat.ai/api-reference/cli/cloud/regions
View available deployment regions
The `regions` command helps you discover which regions are available for deploying agents and storing secrets in Pipecat Cloud.
## list
List all available regions with their codes and display names.
**Usage:**
```shell theme={null}
pipecat cloud regions list
```
This command displays a table of all regions where you can deploy agents and store secrets. Use the region codes shown in this list when specifying the `--region` flag in other commands.
**Example output:**
```
Code Name
us-west US West (Oregon)
us-east US East (Virginia)
eu-central Europe (Frankfurt)
ap-south Asia Pacific (Mumbai)
```
## Using regions
Once you know the available region codes, you can use them with other commands:
**Deploy an agent to a specific region:**
```shell theme={null}
pipecat cloud deploy my-agent my-image:latest --region eu-central
```
**Create secrets in a specific region:**
```shell theme={null}
pipecat cloud secrets set my-secrets API_KEY=abc123 --region eu-central
```
**List agents in a specific region:**
```shell theme={null}
pipecat cloud agent list --region us-east
```
Secrets and image pull secrets must be in the same region as the agents that
use them. When deploying multi-region applications, create separate secret
sets for each region.
Choose a region close to your users for optimal latency and performance.
# secrets
Source: https://docs.pipecat.ai/api-reference/cli/cloud/secrets
Secret sets and secret management commands
The `secrets` command group helps you manage sensitive information for your agent deployments. You can create and manage secret sets (key-value pairs) and image pull secrets (for private Docker registries).
## list
List secret sets and image pull secrets for active namespace / organization.
If provided with a valid secret set name, will show the keys of that set (values are hidden.)
**Usage:**
```shell theme={null}
pipecat cloud secrets list [ARGS] [OPTIONS]
```
**Arguments:**
Name of the secret set list keys for. Must be a valid string identifier.
**Options:**
Show secret sets only. Filter out image pull secrets from the results.
Organization to list secrets for. If not provided, uses the current
organization from your configuration.
Filter secrets by region. Only secrets in the specified region will be shown.
If not provided, secrets from all regions are listed.
## set
Create or update a secret set with the given name and key-value pairs. Secrets can be passed directly as key value pairs or loaded from a file.
**Usage:**
```shell theme={null}
pipecat cloud secrets set [ARGS] [OPTIONS]
```
**Arguments:**
Name of the secret set to create or modify. Must be a valid string identifier
containing only characters, numbers, and hyphens.
List of secret key-value pairs e.g. `KEY1=value1 KEY2="value with spaces"`.
See [this note](/pipecat-cloud/fundamentals/secrets#special-characters) on using special
characters in secret values.
Example:
```shell theme={null}
pipecat cloud secrets set my-secrets 'API_KEY=123 API_KEY_2="value with spaces"'
```
**Options:**
Relative path to a file with a list of secret key-value pairs. Each line in
the file should be in the format `KEY=value`.
Example:
```shell theme={null}
pipecat cloud secrets set my-secrets --file .env
```
Skip confirmations and proceed with the operation.
Organization to create/update the secret set in. If not provided, uses the
current organization from your configuration.
Region where the secret set will be stored. If not specified, uses your
organization's default region (typically `us-west`). Secrets must be in the
same region as the agents that use them.
**Example:**
Create a secret set in a specific region:
```shell theme={null}
pipecat cloud secrets set my-secrets API_KEY=abc123 --region eu-central
```
## unset
Removes a specific secret key from a secret set.
**Usage:**
```shell theme={null}
pipecat cloud secrets unset [ARGS] [OPTIONS]
```
**Arguments:**
Name of the secret set to remove the secret from.
The key of the secret to remove from the set.
Example:
```shell theme={null}
pipecat cloud secrets unset my-secret-set SOME_KEY
```
**Options:**
Skip confirmations and proceed with the operation.
Organization containing the secret set. If not provided, uses the current
organization from your configuration.
## delete
Deletes an entire secret set.
**Usage:**
```shell theme={null}
pipecat cloud secrets delete [ARGS] [OPTIONS]
```
**Arguments:**
Name of the secret set to delete. This action is irreversible.
**Options:**
Skip confirmations and proceed with the operation.
Organization containing the secret set. If not provided, uses the current
organization from your configuration.
## image-pull-secret
Creates or updates credentials for pulling images from private Docker registries.
This command encodes and securely stores your image repository credentials. These credentials are used with the [deploy](./deploy) command when pulling images from private repositories. If a secret with the same name already exists, its credentials will be updated in place. If you don't provide credentials directly, the command will prompt you for input and offer the option to encode them in base64 format for additional security.
**Usage:**
```shell theme={null}
pipecat cloud secrets image-pull-secret [ARGS] [OPTIONS]
```
**Arguments:**
Name of the credentials set to create or modify. Must be a valid string
identifier.
Host address of the image repository e.g. `https://index.docker.io/v1/`.
Credentials for the image repository in the form of `username:password`.
Will prompt you for the value if not provided.
Example:
```shell theme={null}
pipecat cloud secrets image-pull-secret my-registry-creds https://index.docker.io/v1/ my-username:my-password
```
**Options:**
Encode the credentials in base64 format.
Organization to create the image pull secret in. If not provided, uses the
current organization from your configuration.
Region where the image pull secret will be stored. If not specified, uses your
organization's default region (typically `us-west`). Image pull secrets must
be in the same region as the agents that use them.
**Examples:**
Create an image pull secret in a specific region:
```shell theme={null}
pipecat cloud secrets image-pull-secret my-registry-creds https://index.docker.io/v1/ --region eu-central
```
***
Learn more about managing application secrets
# init
Source: https://docs.pipecat.ai/api-reference/cli/init
Scaffold a new Pipecat project with an interactive wizard or CLI flags
Create a new Pipecat project with guided setup for bot type, transport, AI services, and deployment options. Supports both an interactive wizard and a fully non-interactive mode for automation.
**Usage:**
```shell theme={null}
pipecat init [OPTIONS]
```
**Options:**
Output directory where files will be saved. Defaults to current directory.
Project name. Providing this flag triggers non-interactive mode.
Bot type: `web` or `telephony`.
Transport provider. Repeatable for multiple transports (e.g. `-t daily -t
smallwebrtc`). Valid values: `daily`, `smallwebrtc`, `twilio`, `telnyx`,
`plivo`, `exotel`, `daily_pstn`, `twilio_daily_sip`.
Pipeline mode: `cascade` or `realtime`.
Speech-to-Text service (cascade mode). e.g. `deepgram_stt`, `openai_stt`.
Language model service (cascade mode). e.g. `openai_llm`, `anthropic_llm`.
Text-to-Speech service (cascade mode). e.g. `cartesia_tts`, `elevenlabs_tts`.
Realtime service (realtime mode). e.g. `openai_realtime`,
`gemini_live_realtime`.
Video avatar service (web bots only). e.g. `heygen_video`, `tavus_video`,
`simli_video`.
Client framework (web bots only): `react`, `vanilla`, or `none`.
Client dev server (when using `--client-framework react`): `vite` or `nextjs`.
Daily PSTN mode (required when transport is `daily_pstn`): `dial-in` or
`dial-out`.
Twilio + Daily SIP mode (required when transport is `twilio_daily_sip`):
`dial-in` or `dial-out`.
Enable audio recording.
Enable transcription logging.
Enable video input (web bots only).
Enable video output (web bots only).
Generate Pipecat Cloud deployment files (Dockerfile, pcc-deploy.toml).
Enable Krisp noise cancellation (requires cloud deployment).
Enable observability.
Path to a JSON config file. Triggers non-interactive mode. CLI flags override
file values.
Print the resolved configuration as JSON without generating any files.
Print all available service options as JSON and exit. Useful for CI scripts
and coding agents that need to discover valid values at runtime.
## Interactive Setup
When run without `--name` or `--config`, the CLI guides you through selecting:
* **Bot type and client framework** - Phone, web (Next.js, Vite, Vanilla JS), or mobile
* **Transport provider** - Daily, Twilio, Telnyx, Plivo, Exotel
* **Pipeline mode** - Cascade or Realtime
* **AI services** - STT, LLM, and TTS providers
* **Optional features** - Additional capabilities for your bot
* **Deployment target** - Local development or Pipecat Cloud
## Non-Interactive Mode
When `--name` or `--config` is provided, all configuration is taken from CLI flags or a JSON config file with no interactive prompts. This is useful for automation, scripting, and coding agents.
All required fields must be specified or the command exits with a list of all missing/invalid fields.
## Examples
### Interactive Wizard
```shell theme={null}
pipecat init
```
### Non-Interactive (Cascade)
```shell theme={null}
pipecat init --name my-bot --bot-type web --transport daily \
--mode cascade --stt deepgram_stt --llm openai_llm --tts cartesia_tts
```
### Non-Interactive (Realtime)
```shell theme={null}
pipecat init --name rt-bot --bot-type web --transport smallwebrtc \
--mode realtime --realtime openai_realtime
```
### Multiple Transports
```shell theme={null}
pipecat init --name my-bot --bot-type web \
--transport daily --transport smallwebrtc \
--mode cascade --stt deepgram_stt --llm openai_llm --tts cartesia_tts
```
### With React Client
```shell theme={null}
pipecat init --name my-bot --bot-type web --transport daily \
--mode cascade --stt deepgram_stt --llm openai_llm --tts cartesia_tts \
--client-framework react --client-server vite
```
### Telephony
```shell theme={null}
pipecat init --name call-bot --bot-type telephony --transport twilio \
--mode cascade --stt deepgram_stt --llm openai_llm --tts cartesia_tts
```
### Discover Available Options
```shell theme={null}
# List all valid service values as JSON
pipecat init --list-options
```
Output:
```json theme={null}
{
"bot_type": ["web", "telephony"],
"transports": {
"web": ["daily", "smallwebrtc"],
"telephony": ["twilio", "twilio_daily_sip_dialin", "twilio_daily_sip_dialout", ...]
},
"stt": ["deepgram_stt", "openai_stt", ...],
"llm": ["openai_llm", "anthropic_llm", ...],
"tts": ["cartesia_tts", "elevenlabs_tts", ...],
"realtime": ["openai_realtime", "gemini_live_realtime", ...],
"video": ["heygen_video", "tavus_video", "simli_video"]
}
```
This is useful for scripting — for example, to pick a random TTS provider:
```shell theme={null}
options=$(pipecat init --list-options)
tts=$(echo "$options" | jq -r '.tts[0]')
```
### Dry Run
```shell theme={null}
# Preview resolved config as JSON without generating files
pipecat init --name my-bot --bot-type web --transport daily \
--mode cascade --stt deepgram_stt --llm openai_llm --tts cartesia_tts \
--dry-run
```
### From Config File
```shell theme={null}
pipecat init --config project-config.json
```
Sample `project-config.json`:
```json theme={null}
{
"project_name": "my-bot",
"bot_type": "web",
"transports": ["daily"],
"mode": "cascade",
"stt_service": "deepgram_stt",
"llm_service": "openai_llm",
"tts_service": "cartesia_tts",
"recording": false,
"transcription": false,
"deploy_to_cloud": true,
"enable_krisp": false,
"enable_observability": false
}
```
CLI flags override any values in the file, so you can use a base config and customize per-run:
```shell theme={null}
pipecat init --config base-config.json --name custom-name --no-deploy-to-cloud
```
### Specify Output Directory
```shell theme={null}
pipecat init --output my-bot
```
## Generated Project Structure
```
mybot/
├── server/ # Python bot server
│ ├── bot.py # Main bot implementation
│ ├── pyproject.toml # Python dependencies
│ ├── .env.example # Environment variables template
│ ├── Dockerfile # Container image (if cloud enabled)
│ └── pcc-deploy.toml # Deployment config (if cloud enabled)
├── client/ # Web client (if generated)
│ ├── src/
│ ├── package.json
│ └── ...
├── .gitignore
└── README.md # Project setup instructions
```
# CLI Overview
Source: https://docs.pipecat.ai/api-reference/cli/overview
Command-line tool for scaffolding, deploying, and monitoring Pipecat bots
Create new phone or web/mobile bots with interactive setup
Push your bots to production with one command
Watch real-time logs, conversations, and metrics
## Requirements
* Python 3.10 or later
## Installation
Install the CLI globally using [uv](https://docs.astral.sh/uv/) (recommended) or [pipx](https://pipx.pypa.io/):
```bash theme={null}
uv tool install pipecat-ai-cli
# or
pipx install pipecat-ai-cli
```
Verify installation:
```bash theme={null}
pipecat --version
```
All commands can use either `pipecat` or the shorter `pc` alias.
## Commands
**[`pipecat init`](/api-reference/cli/init)** - Scaffold new projects with interactive setup
**[`pipecat tail`](/api-reference/cli/tail)** - Monitor sessions in real-time with a terminal dashboard
**[`pipecat cloud`](/api-reference/cli/cloud/auth)** - Deploy and manage bots on Pipecat Cloud
## Getting Help
View help for any command:
```bash theme={null}
pipecat --help
pipecat init --help
pipecat tail --help
pipecat cloud --help
```
## Next Steps
Scaffold a new project with pipecat init
# tail
Source: https://docs.pipecat.ai/api-reference/cli/tail
A terminal dashboard for monitoring Pipecat sessions in real-time
**Tail** is a terminal dashboard for monitoring your Pipecat sessions in real-time with logs, conversations, metrics, and audio levels all in one place.
With Tail you can:
* 📜 Follow system logs in real time
* 💬 Track conversations as they happen
* 🔊 Monitor user and agent audio levels
* 📈 Keep an eye on service metrics and usage
* 🖥️ Run locally as a pipeline runner or connect to a remote session
**Usage:**
```shell theme={null}
pipecat tail [OPTIONS]
```
**Options:**
WebSocket URL to connect to. Defaults to `ws://localhost:9292`.
## How to Use Tail
* Add `pipecat-ai-cli` to your project's dependencies.
* Update your Pipecat code to include the `TailObserver`:
```python theme={null}
from pipecat_cli.tail import TailObserver
task = PipelineTask(
pipeline,
observers=[TailObserver()]
)
```
* Start the Tail app separately:
```bash theme={null}
# Connect to local session (default)
pipecat tail
# Connect to remote session
pipecat tail --url wss://my-bot.example.com
```
# Create an Agent
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/agent-create
POST /agents
Create a new agent with a container image and configuration settings.
# Delete an Agent
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/agent-delete
DELETE /agents/{agentName}
Permanently delete an agent and its associated resources.
# Retrieve Agent Logs
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/agent-get-logs
GET /agents/{agentName}/logs
Get execution logs for an agent with filtering and pagination options.
# Get Session Details
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/agent-get-session
GET /agents/{agentName}/sessions/{sessionId}
Retrieve detailed information about a specific session including resource metrics and meeting IDs.
# List All Sessions
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/agent-get-sessions
GET /agents/{agentName}/sessions
Get sessions for an agent with filtering and pagination options.
# List All Agents
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/agent-list-all
GET /agents
Retrieve a list of all agents in your organization with their status and configuration.
# Get Agent Details
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/agent-list-one
GET /agents/{agentName}
Retrieve detailed information about a specific agent including its deployment status.
# Update an Agent
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/agent-update
POST /agents/{agentName}
Update an existing agent's configuration or deploy a new version.
# Create Build
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/build-create
POST /builds
Create a new build or return a cached build for your uploaded context
## Build Caching
Pipecat Cloud automatically caches successful builds based on three factors:
1. **Context hash** - A hash of your uploaded context archive
2. **Region** - The target deployment region
3. **Dockerfile path** - The path to your Dockerfile
If a successful build already exists with the same combination, you'll receive the cached build with `cached: true`. This makes repeated deployments much faster.
## Build Flow
```mermaid theme={null}
sequenceDiagram
participant Client
participant API
participant Storage
participant CodeBuild
Client->>API: POST /builds/upload-url
API-->>Client: uploadUrl, uploadId
Client->>Storage: Upload context.tar.gz
Client->>API: POST /builds
API->>Storage: Get context hash
API->>API: Check for cached build
alt Cached build exists
API-->>Client: 200 OK (cached: true)
else No cached build
API->>CodeBuild: Trigger build
API-->>Client: 201 Created (cached: false)
end
```
## Example: Complete Build Flow
```bash theme={null}
# 1. Get upload URL
UPLOAD_RESPONSE=$(curl -s -X POST "https://api.pipecat.daily.co/v1/builds/upload-url" \
-H "Authorization: Bearer $PIPECAT_API_KEY" \
-H "Content-Type: application/json" \
-d '{"region": "us-west"}')
UPLOAD_ID=$(echo $UPLOAD_RESPONSE | jq -r '.uploadId')
# 2. Upload context (see upload-url endpoint for details)
# 3. Create the build
BUILD_RESPONSE=$(curl -s -X POST "https://api.pipecat.daily.co/v1/builds" \
-H "Authorization: Bearer $PIPECAT_API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"uploadId\": \"$UPLOAD_ID\",
\"region\": \"us-west\",
\"dockerfilePath\": \"Dockerfile\"
}")
BUILD_ID=$(echo $BUILD_RESPONSE | jq -r '.build.id')
CACHED=$(echo $BUILD_RESPONSE | jq -r '.cached')
if [ "$CACHED" = "true" ]; then
echo "Using cached build!"
IMAGE_URI=$(echo $BUILD_RESPONSE | jq -r '.build.imageUri')
else
echo "Build started: $BUILD_ID"
echo "Poll GET /builds/$BUILD_ID for status"
fi
```
## Build Statuses
| Status | Description |
| ---------- | ----------------------------------------------------- |
| `pending` | Build created, waiting to start |
| `building` | Build is in progress |
| `success` | Build completed successfully, `imageUri` is available |
| `failed` | Build failed, check `errorMessage` for details |
| `timeout` | Build exceeded the time limit |
# Get Build
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/build-get
GET /builds/{buildId}
Get the current status of a build
## Status Reconciliation
When you poll a build that's in progress (`pending` or `building`), Pipecat Cloud automatically reconciles the status with the underlying build system. This means you always get the latest status without any delay.
## Polling for Completion
After creating a build, poll this endpoint until the build completes:
```bash theme={null}
BUILD_ID="123e4567-e89b-12d3-a456-426614174000"
while true; do
RESPONSE=$(curl -s "https://api.pipecat.daily.co/v1/builds/$BUILD_ID" \
-H "Authorization: Bearer $PIPECAT_API_KEY")
STATUS=$(echo $RESPONSE | jq -r '.build.status')
case $STATUS in
"success")
IMAGE_URI=$(echo $RESPONSE | jq -r '.build.imageUri')
echo "Build complete! Image: $IMAGE_URI"
break
;;
"failed"|"timeout")
ERROR=$(echo $RESPONSE | jq -r '.build.errorMessage')
echo "Build failed: $ERROR"
exit 1
;;
*)
echo "Status: $STATUS - waiting..."
sleep 10
;;
esac
done
```
Once your build succeeds, use the Pipecat CLI to deploy your agent. The CLI will automatically use the built image.
# Get Build Logs
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/build-get-logs
GET /builds/{buildId}/logs
Retrieve the logs for a specific build
## Streaming Logs During Build
You can poll this endpoint during a build to stream logs in real-time. The endpoint returns the latest logs each time you call it:
```bash theme={null}
BUILD_ID="123e4567-e89b-12d3-a456-426614174000"
while true; do
RESPONSE=$(curl -s "https://api.pipecat.daily.co/v1/builds/$BUILD_ID/logs?limit=100" \
-H "Authorization: Bearer $PIPECAT_API_KEY")
echo "$RESPONSE" | jq -r '.logs[]'
# Check if build is still in progress
BUILD_STATUS=$(curl -s "https://api.pipecat.daily.co/v1/builds/$BUILD_ID" \
-H "Authorization: Bearer $PIPECAT_API_KEY" | jq -r '.build.status')
if [[ "$BUILD_STATUS" == "success" || "$BUILD_STATUS" == "failed" || "$BUILD_STATUS" == "timeout" ]]; then
echo "Build finished with status: $BUILD_STATUS"
break
fi
sleep 5
done
```
## Pagination
Use the `limit` query parameter to control how many log lines are returned:
| Parameter | Default | Min | Max | Description |
| --------- | ------- | --- | ------ | ------------------------------ |
| `limit` | 500 | 1 | 10,000 | Number of log events to return |
Logs may be empty if the build was just created and hasn't started executing yet. Continue polling until logs appear or the build completes.
## Debugging Failed Builds
When a build fails, use this endpoint to diagnose the issue:
```bash theme={null}
BUILD_ID="123e4567-e89b-12d3-a456-426614174000"
# Get the full build logs
curl -s "https://api.pipecat.daily.co/v1/builds/$BUILD_ID/logs?limit=10000" \
-H "Authorization: Bearer $PIPECAT_API_KEY" | jq -r '.logs[]'
```
Common issues visible in build logs include:
* Missing dependencies in `requirements.txt`
* Dockerfile syntax errors
* Failed `pip install` commands
* Missing files referenced in `COPY` instructions
# List Builds
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/build-list
GET /builds
List all builds for your organization with optional filters
## Filtering Builds
Use query parameters to filter and paginate your build list:
```bash theme={null}
# List all successful builds
curl "https://api.pipecat.daily.co/v1/builds?status=success" \
-H "Authorization: Bearer $PIPECAT_API_KEY"
# List builds in a specific region
curl "https://api.pipecat.daily.co/v1/builds?region=us-west&limit=10" \
-H "Authorization: Bearer $PIPECAT_API_KEY"
# Find builds with a specific context hash
curl "https://api.pipecat.daily.co/v1/builds?contextHash=a1b2c3d4e5f6a7b8" \
-H "Authorization: Bearer $PIPECAT_API_KEY"
```
## Pagination
Results are paginated with `limit` and `offset` parameters:
```bash theme={null}
# Get page 2 (items 21-40)
curl "https://api.pipecat.daily.co/v1/builds?limit=20&offset=20" \
-H "Authorization: Bearer $PIPECAT_API_KEY"
```
The response includes `total` to help calculate pagination:
```json theme={null}
{
"builds": [...],
"total": 142,
"limit": 20,
"offset": 20
}
```
# Get Upload URL
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/build-upload-url
POST /builds/upload-url
Get a pre-signed URL to upload your Docker build context
## Uploading Your Context
After receiving the upload URL and fields, upload your context archive using a multipart form POST request:
```bash theme={null}
# Get the upload URL
RESPONSE=$(curl -s -X POST "https://api.pipecat.daily.co/v1/builds/upload-url" \
-H "Authorization: Bearer $PIPECAT_API_KEY" \
-H "Content-Type: application/json" \
-d '{"region": "us-west"}')
# Extract fields from response
UPLOAD_URL=$(echo $RESPONSE | jq -r '.uploadUrl')
UPLOAD_ID=$(echo $RESPONSE | jq -r '.uploadId')
# Upload the context archive (must be gzipped tar)
# Note: Field names are case-sensitive (X-Amz-*, not x-amz-*)
curl -X POST "$UPLOAD_URL" \
-F "key=$(echo $RESPONSE | jq -r '.uploadFields.key')" \
-F "bucket=$(echo $RESPONSE | jq -r '.uploadFields.bucket')" \
-F "Content-Type=application/gzip" \
-F "Policy=$(echo $RESPONSE | jq -r '.uploadFields.Policy')" \
-F "X-Amz-Algorithm=$(echo $RESPONSE | jq -r '.uploadFields[\"X-Amz-Algorithm\"]')" \
-F "X-Amz-Credential=$(echo $RESPONSE | jq -r '.uploadFields[\"X-Amz-Credential\"]')" \
-F "X-Amz-Date=$(echo $RESPONSE | jq -r '.uploadFields[\"X-Amz-Date\"]')" \
-F "X-Amz-Security-Token=$(echo $RESPONSE | jq -r '.uploadFields[\"X-Amz-Security-Token\"]')" \
-F "X-Amz-Signature=$(echo $RESPONSE | jq -r '.uploadFields[\"X-Amz-Signature\"]')" \
-F "file=@context.tar.gz"
echo "Upload ID: $UPLOAD_ID"
```
The context archive must be a gzipped tar file (`.tar.gz`). The upload URL validates both the content type and file size.
**Field names are case-sensitive.** When uploading to S3, you must use the exact field names returned in `uploadFields` (e.g., `X-Amz-Algorithm`, not `x-amz-algorithm`). Using incorrect casing will result in authentication errors.
## Creating the Context Archive
Your context archive should contain your Dockerfile and all files needed for the build:
```bash theme={null}
# Create a tar.gz of your project
tar -czvf context.tar.gz \
--exclude='.git' \
--exclude='node_modules' \
--exclude='__pycache__' \
--exclude='.venv' \
.
```
The maximum context size is **500MB**. Use a `.dockerignore` file to exclude unnecessary files and keep your context small for faster uploads and builds.
# Get Organization Properties
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/properties-get
GET /properties
Retrieve current values of configurable properties for your organization.
# Update Organization Properties
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/properties-update
PATCH /properties
Update configurable properties for your organization, such as the default deployment region.
# List Regions
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/regions-list
GET /regions
Retrieve a list of all available regions for deploying agents and storing secrets.
# Create or Update Secrets
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/secret-create-update
PUT /secrets/{setName}
Create or update a secret set and its values
# Delete Entire Secret Set
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/secret-delete-secret-set
DELETE /secrets/{setName}
Delete an entire secret set by its name. This operation removes the secret set and all associated key-value pairs.
# Delete Specific Secret From Set
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/secret-delete-specific-secret
DELETE /secrets/{setName}/{secretKey}
Delete a specific secret from a set by its key. This operation removes the key-value pair from the specified secret set.
# List All Secret Sets
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/secret-list-all
GET /secrets
Retrieve a list of all secret sets in your organization with their name and secret type.
# Get Secret Set Details
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/secret-list-one
GET /secrets/{setName}
Retrieve key and value pairs for a specific secret set.
# Session API
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/session-proxy
GET /{serviceName}/sessions/{sessionId}/{path}
Send HTTP requests directly to your running Pipecat Cloud agent sessions.
Send HTTP requests to endpoints defined in your running bot. Supports `GET`, `POST`, `PUT`, `PATCH`, `DELETE`, `OPTIONS`, and `HEAD` methods.
## Request Headers
Headers are forwarded to your bot with these exceptions:
* `host` - Excluded
* `content-length` - Excluded
* `authorization` - Excluded (authentication is handled by the API gateway)
Requires base image version `0.1.2` or later. See the [Session API
guide](/pipecat-cloud/guides/session-api) for setup instructions
and examples.
# Start an Agent session
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/start
POST /{agentName}/start
Start a new session with a deployed agent.
This endpoint starts a new instance of a deployed agent. You can optionally create a Daily room for the service to connect to.
# Stop an Agent session
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/rest-reference/endpoint/stop
DELETE /agents/{agentName}/sessions/{sessionId}
Stop a running agent session and clean up its resources.
# Examples
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/sdk-reference/examples
Common usage patterns for the Pipecat Cloud Python SDK
This page provides examples of common tasks with the Pipecat Cloud Python SDK.
## Starting an Agent Session
This example shows how to start a session with various configurations:
```python theme={null}
import asyncio
from pipecatcloud.exception import AgentStartError
from pipecatcloud.session import Session, SessionParams
async def main():
try:
# Create session object
session = Session(
agent_name="my-first-agent",
api_key=API_KEY, # Replace with your actual API key
params=SessionParams(
use_daily=True, # Optional: Creates a Daily room
daily_room_properties={"start_video_off": False},
data={"key": "value"},
),
)
# Start the session
response = await session.start()
# Get Daily room URL
daily_url = f"{response['dailyRoom']}?t={response['dailyToken']}"
print(f"Join Daily room: {daily_url}")
except AgentStartError as e:
print(f"Error starting agent: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
# Run the async function
if __name__ == "__main__":
asyncio.run(main())
```
## Building a Bot Entry Point with Daily Arguments
```python theme={null}
from loguru import logger
from pipecat.runner.types import DailyRunnerArguments
async def bot(args: DailyRunnerArguments):
"""Main bot entry point compatible with the FastAPI route handler.
Args:
room_url: The Daily room URL
token: The Daily room token
body: The configuration object from the request body
"""
logger.info(f"Bot process initialized {args.room_url} {args.token}")
try:
await main(args.room_url, args.token)
logger.info("Bot process completed")
except Exception as e:
logger.exception(f"Error in bot process: {str(e)}")
raise
```
## Building a Bot Entry Point with WebSocket Arguments
```python theme={null}
from loguru import logger
from pipecat.runner.types import WebSocketRunnerArguments
async def bot(args: WebSocketRunnerArguments):
"""Main bot entry point for WebSocket connections.
Args:
ws: The WebSocket connection
"""
logger.info("WebSocket bot process initialized")
try:
await main(args.websocket)
logger.info("WebSocket bot process completed")
except Exception as e:
logger.exception(f"Error in WebSocket bot process: {str(e)}")
raise
```
# Error Handling
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/sdk-reference/exceptions
Managing errors with the Pipecat Cloud Python SDK
The Pipecat Cloud SDK defines several exception classes to help you handle different error conditions.
## Base Exception Class
### Error
Base class for all Pipecat Cloud exceptions.
```python theme={null}
from pipecatcloud.exception import Error
try:
# SDK operation
except Error as e:
print(f"A Pipecat Cloud error occurred: {e}")
```
## Session Errors
### AgentStartError
Raised when an agent fails to start.
```python theme={null}
from pipecatcloud.exception import AgentStartError
try:
response = await session.start()
except AgentStartError as e:
print(f"Failed to start agent: {e}")
if e.error_code == "429":
print("Agent pool at capacity. Try again later.")
elif e.error_code == "404":
print("Agent not found.")
```
#### Properties
Error message with details about the failure.
Error code that can be used for conditional handling of different error types.
### AgentNotHealthyError
Raised when attempting to interact with an agent that is not in a ready state.
```python theme={null}
from pipecatcloud.exception import AgentNotHealthyError
try:
response = await session.start()
except AgentNotHealthyError as e:
print(f"Agent is not ready: {e}")
print("Check agent status with: pipecat cloud agent status my-agent")
```
#### Properties
Error message with details about the agent's status.
Error code identifying the specific issue with the agent.
## Authentication Errors
### AuthError
Raised when authentication fails or token has expired.
```python theme={null}
from pipecatcloud.exception import AuthError
try:
# Operation requiring authentication
except AuthError:
print("Your session has expired. Please log in again.")
# Prompt user to reauthenticate
```
#### Properties
Message explaining the authentication failure.
## Configuration Errors
### ConfigError
Raised when there are issues with configuration storage or retrieval.
```python theme={null}
from pipecatcloud.exception import ConfigError
try:
# Operation requiring config
except ConfigError as e:
print(f"Configuration error: {e.message}")
# Guide user to fix configuration
```
#### Properties
Message explaining the configuration issue.
### ConfigFileError
Raised when the configuration file is malformed.
```python theme={null}
from pipecatcloud.exception import ConfigFileError
try:
# Operation requiring config file
except ConfigFileError:
print("Your configuration file is invalid or corrupted.")
print("Try recreating it with: pipecat cloud auth login")
```
### InvalidError
Raised when an invalid operation is attempted.
```python theme={null}
from pipecatcloud.exception import InvalidError
try:
# Potentially invalid operation
except InvalidError as e:
print(f"Invalid operation: {e}")
```
# Python SDK Overview
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/sdk-reference/overview
Introduction to the Pipecat Cloud Python SDK
The Pipecat Cloud Python SDK provides a programmatic interface for managing and interacting with your agents. It allows you to start and manage agent sessions, handle different session types, and respond to various error conditions.
## Installation
Install the SDK using pip:
```bash theme={null}
pip install pipecatcloud
```
## Key Components
The SDK contains several main components:
* [**Session Management**](./sessions) - Start and interact with agent sessions
* [**Session Arguments**](./session-arguments) - Types of arguments received by `bot()` entry points
* [**Error Handling**](./exceptions) - Exception classes for handling different error scenarios
## Quick Start
Here's a simple example to get started:
```python theme={null}
import asyncio
from pipecatcloud.exception import AgentStartError
from pipecatcloud.session import Session, SessionParams
async def main():
try:
# Create session object
session = Session(
agent_name="my-first-agent",
api_key=API_KEY, # Replace with your actual API key
params=SessionParams(
use_daily=True, # Optional: Creates a Daily room
daily_room_properties={"start_video_off": False},
data={"key": "value"},
),
)
# Start the session
response = await session.start()
# Get Daily room URL
daily_url = f"{response['dailyRoom']}?t={response['dailyToken']}"
print(f"Join Daily room: {daily_url}")
except AgentStartError as e:
print(f"Error starting agent: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
# Run the async function
if __name__ == "__main__":
asyncio.run(main())
```
For more detailed examples and use cases, see the [Examples](./examples) section.
# Session Arguments
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/sdk-reference/session-arguments
Understanding the arguments received by bot entry points
When creating agents with Pipecat Cloud, your `bot()` entry point function receives different types of arguments depending on the session type. These classes represent the structure of those arguments.
## PipecatSessionArguments
Standard Pipecat Cloud agent session arguments, used for basic sessions.
```python theme={null}
from pipecatcloud.agent import PipecatSessionArguments
def bot(args: PipecatSessionArguments):
print(f"Session ID: {args.session_id}")
print(f"Custom data: {args.body}")
```
### Properties
The unique identifier for the current session.
The custom data passed to the agent via the session parameters.
## DailySessionArguments
Arguments for sessions that involve Daily WebRTC rooms for voice/video interaction.
```python theme={null}
from pipecat.runner.types import DailyRunnerArguments
def bot(args: DailyRunnerArguments):
print(f"Session ID: {args.session_id}")
print(f"Daily room URL: {args.room_url}")
print(f"Daily token: {args.token}")
print(f"Custom data: {args.body}")
```
### Properties
The unique identifier for the current session.
The URL for the Daily room.
The authentication token for the Daily room.
The custom data passed to the agent via the session parameters.
## WebSocketSessionArguments
Arguments for sessions that use WebSocket connections for real-time communication.
```python theme={null}
from pipecat.runner.types import WebSocketRunnerArguments
async def bot(args: WebSocketRunnerArguments):
print(f"Session ID: {args.session_id}")
await args.websocket.send_text("Hello from the agent!")
```
### Properties
The unique identifier for the current session.
The FastAPI WebSocket connection used to communicate with the client.
# Session Management
Source: https://docs.pipecat.ai/api-reference/pipecat-cloud/sdk-reference/sessions
Managing agent sessions with the Pipecat Cloud Python SDK
The session management classes allow you to start and interact with agent sessions.
## Session
The `Session` class is the primary way to start and interact with agent sessions.
```python theme={null}
from pipecatcloud.session import Session, SessionParams
session = Session(
agent_name="my-agent",
api_key="pk_...",
params=SessionParams(use_daily=True)
)
```
### Constructor Parameters
Name of the deployed agent to interact with.
Public API key for authentication.
Optional parameters to configure the session.
### Methods
Starts a new session with the specified agent.
**Returns**
A dictionary containing session information. If `use_daily` is True, includes `dailyRoom` URL and `dailyToken`.
**Raises**
`AgentStartError`: If the session fails to start due to missing API key, agent not found, agent not ready, or capacity limits.
## SessionParams
The `SessionParams` class allows you to configure a session.
```python theme={null}
from pipecatcloud.session import SessionParams
params = SessionParams(
data={"custom_field": "value"},
use_daily=True,
daily_room_properties={"enable_recording": "cloud"}
)
```
### Parameters
Optional dictionary of data to pass to the agent. Must be JSON-serializable.
If True, creates a Daily WebRTC room for the session, enabling voice
interaction.
Optional dictionary of properties to configure the Daily room. Only used when
`use_daily=True`.
See [Daily API
documentation](https://docs.daily.co/reference/rest-api/rooms/config) for
available properties.
# Exceptions
Source: https://docs.pipecat.ai/api-reference/pipecat-flows/exceptions
Error handling hierarchy for Pipecat Flows
## Overview
Pipecat Flows defines a hierarchy of exceptions for handling errors during flow execution. All exceptions inherit from `FlowError`, making it possible to catch all flow-related errors with a single handler.
```python theme={null}
from pipecat_flows import (
FlowError,
FlowInitializationError,
FlowTransitionError,
InvalidFunctionError,
ActionError,
)
```
## Exception Hierarchy
```
FlowError
├── FlowInitializationError
├── FlowTransitionError
├── InvalidFunctionError
└── ActionError
```
## FlowError
```python theme={null}
class FlowError(Exception)
```
Base exception for all flow-related errors. Use this for generic flow errors or as a catch-all for any flow exception.
```python theme={null}
try:
await flow_manager.initialize(initial_node)
except FlowError as e:
logger.error(f"Flow error: {e}")
```
## FlowInitializationError
```python theme={null}
class FlowInitializationError(FlowError)
```
Raised when flow manager initialization fails. Common causes include invalid configuration, missing dependencies, or calling `initialize()` with an invalid node config.
**Raised by:** `FlowManager.initialize()`
## FlowTransitionError
```python theme={null}
class FlowTransitionError(FlowError)
```
Raised when a node transition fails. This typically occurs when attempting to transition before the flow manager is initialized, or when a target node configuration is invalid.
**Raised by:** `FlowManager.set_node_from_config()`, internal node transition logic
## InvalidFunctionError
```python theme={null}
class InvalidFunctionError(FlowError)
```
Raised when a function cannot be registered or executed. Common causes include functions not found in the main module, invalid function signatures, direct functions that don't return a tuple, or missing docstrings on direct functions.
**Raised by:** Function registration, `FlowsDirectFunctionWrapper.validate_function()`
## ActionError
```python theme={null}
class ActionError(FlowError)
```
Raised when an action execution fails. This includes both built-in actions (`tts_say`, `end_conversation`, `function`) and custom registered actions. Common causes include missing required fields (e.g., `text` for `tts_say`), unregistered action types, or handler execution errors.
**Raised by:** `ActionManager.execute_actions()`, action handler registration
# FlowManager
Source: https://docs.pipecat.ai/api-reference/pipecat-flows/flow-manager
Core orchestration class for managing conversation flows
## Overview
`FlowManager` orchestrates conversation flows by managing state transitions, function registration, and message handling across different LLM providers.
## Configuration
All parameters are keyword-only.
Pipeline task instance used for queueing frames into the pipeline.
LLM service instance or an `LLMSwitcher` for switching between LLM providers
at runtime. Supports `OpenAILLMService` and any service that extends it (Groq,
Together, Cerebras, DeepSeek, etc.), `AnthropicLLMService`,
`GoogleLLMService`, and `AWSBedrockLLMService`.
Context aggregator for managing conversation context. Typically obtained from
`create_context_aggregator()` on the LLM service.
Default context strategy for managing conversation context during node
transitions. Can be overridden per-node via
[`NodeConfig.context_strategy`](/api-reference/pipecat-flows/types#nodeconfig).
See
[ContextStrategyConfig](/api-reference/pipecat-flows/types#contextstrategyconfig).
Transport instance for communication (e.g., `DailyTransport`). When provided,
accessible via the `transport` property in function and action handlers.
Functions that will be available at every node. These are registered once
during initialization and automatically included alongside node-specific
functions. Useful for capabilities like "transfer to human" that should be
accessible from any conversation state.
## Properties
### state
```python theme={null}
flow_manager.state -> Dict[str, Any]
```
Shared state dictionary that persists across node transitions. Use this to store and retrieve conversation data such as user preferences, collected information, or any data that needs to be accessible across different nodes.
```python theme={null}
# Store data
flow_manager.state["user_name"] = "Alice"
# Retrieve data
name = flow_manager.state.get("user_name", "Unknown")
```
### transport
```python theme={null}
flow_manager.transport -> Optional[BaseTransport]
```
The transport instance provided during initialization, or `None` if not set. Use this to interact with the communication platform (e.g., mute participants, access room info).
```python theme={null}
async def my_handler(args, flow_manager):
transport = flow_manager.transport
if transport:
participants = transport.participants()
```
### current\_node
```python theme={null}
flow_manager.current_node -> Optional[str]
```
The identifier of the currently active conversation node. Returns `None` before initialization or if no node has been set.
```python theme={null}
async def my_handler(args, flow_manager):
if flow_manager.current_node == "collecting_payment":
await setup_secure_session(flow_manager)
```
### task
```python theme={null}
flow_manager.task -> PipelineTask
```
The pipeline task instance used for frame queueing. Use this for advanced flow control such as queuing custom frames.
```python theme={null}
async def my_handler(args, flow_manager):
from pipecat.frames.frames import TTSUpdateSettingsFrame
await flow_manager.task.queue_frame(
TTSUpdateSettingsFrame(settings={"voice": "new-voice-id"})
)
```
## Methods
### initialize
```python theme={null}
await flow_manager.initialize(initial_node: Optional[NodeConfig] = None) -> None
```
Initialize the flow manager. Must be called before any node transitions can occur.
| Parameter | Type | Default | Description |
| -------------- | ------------ | ------- | ------------------------------------------------------------------------------- |
| `initial_node` | `NodeConfig` | `None` | Initial node configuration. Can also be set later via `set_node_from_config()`. |
**Raises:** `FlowInitializationError` if initialization fails.
```python theme={null}
flow_manager = FlowManager(task=task, llm=llm, context_aggregator=context_aggregator)
await flow_manager.initialize(initial_node=create_initial_node())
```
### set\_node\_from\_config
```python theme={null}
await flow_manager.set_node_from_config(node_config: NodeConfig) -> None
```
Transition to a new conversation node. Used to manually trigger node transitions. The node name is taken from the `name` field in the config, or a UUID is generated if not provided.
| Parameter | Type | Description |
| ------------- | ------------ | ------------------------------- |
| `node_config` | `NodeConfig` | Configuration for the new node. |
**Raises:** `FlowTransitionError` if the manager is not initialized. `FlowError` if node setup fails.
In most cases, prefer returning the next node from a consolidated function
handler instead of calling this method directly.
```python theme={null}
await flow_manager.set_node_from_config({
"name": "collect_email",
"task_messages": [{"role": "system", "content": "Ask the user for their email."}],
"functions": [collect_email_function],
})
```
### get\_current\_context
```python theme={null}
flow_manager.get_current_context() -> List[dict]
```
Get the current conversation context as a list of messages, including system messages, user messages, and assistant responses.
**Raises:** `FlowError` if the context aggregator is not available.
```python theme={null}
messages = flow_manager.get_current_context()
```
### register\_action
```python theme={null}
flow_manager.register_action(action_type: str, handler: Callable) -> None
```
Register a handler for a custom action type. The handler can be either a legacy handler `(action)` or a modern handler `(action, flow_manager)`.
| Parameter | Type | Description |
| ------------- | ---------- | ---------------------------------------------------------- |
| `action_type` | `str` | String identifier for the action (e.g., `"notify_slack"`). |
| `handler` | `Callable` | Async function that handles the action. |
```python theme={null}
async def notify_slack(action: dict, flow_manager: FlowManager):
channel = action.get("channel", "#general")
text = action.get("text", "")
await slack_client.post_message(channel=channel, text=text)
flow_manager.register_action("notify_slack", notify_slack)
```
Once registered, the action can be used in node `pre_actions` or `post_actions`:
```python theme={null}
node_config: NodeConfig = {
"task_messages": [...],
"pre_actions": [{"type": "notify_slack", "channel": "#support", "text": "New session started"}],
}
```
## Usage
### Basic Setup
```python theme={null}
from pipecat_flows import FlowManager, FlowResult, NodeConfig
async def create_initial_node() -> NodeConfig:
return {
"task_messages": [
{"role": "system", "content": "Greet the user and ask how you can help."}
],
"functions": [help_function],
}
flow_manager = FlowManager(
task=task,
llm=llm,
context_aggregator=context_aggregator,
transport=transport,
)
await flow_manager.initialize(initial_node=await create_initial_node())
```
### Using Global Functions
```python theme={null}
from pipecat_flows import FlowManager, FlowsFunctionSchema
transfer_function = FlowsFunctionSchema(
name="transfer_to_human",
description="Transfer the conversation to a human agent",
properties={},
required=[],
handler=handle_transfer,
)
flow_manager = FlowManager(
task=task,
llm=llm,
context_aggregator=context_aggregator,
global_functions=[transfer_function],
)
```
# Pipecat Flows Overview
Source: https://docs.pipecat.ai/api-reference/pipecat-flows/overview
Reference docs for Pipecat's conversation flow system
New to Pipecat Flows? Check out the
[introduction](/pipecat-flows/introduction) and
[guides](/pipecat-flows/guides/quickstart) first.
Pipecat Flows is an add-on framework for Pipecat that allows you to build structured conversations in your AI applications. It enables you to define conversation paths while handling the complexities of state management and LLM interactions.
Complete API documentation and method details
Source code, examples, and issue tracking
Working example with basic conversation flow
## Installation
### Pipecat Flows
To use Pipecat Flows, install the required dependency:
```bash theme={null}
pip install pipecat-ai-flows
```
### Pipecat Dependencies
For fresh installations, you'll need to install Pipecat with dependencies for your Transport, STT, LLM, and TTS providers.
For example, to use Daily, OpenAI, Deepgram, Cartesia, and Silero:
```bash theme={null}
pip install "pipecat-ai[daily,openai,deepgram,cartesia,silero]"
```
## Reference Pages
Core orchestration class: constructor, properties, and methods
NodeConfig, FlowsFunctionSchema, ActionConfig, context strategies, and type
aliases
Error handling hierarchy for flow management
## Function Types
### Node Functions
Execute operations within a single conversation state without switching nodes. Return `(FlowResult, None)`.
### Edge Functions
Create transitions between conversation states, optionally processing data first. Return `(FlowResult, NodeConfig)`.
### Direct Functions
Functions passed directly to NodeConfig with automatic metadata extraction from signatures and docstrings. See [`flows_direct_function`](/api-reference/pipecat-flows/types#flows_direct_function-decorator) and [`FlowsDirectFunction`](/api-reference/pipecat-flows/types#flowsdirectfunction).
## LLM Provider Support
Pipecat Flows automatically handles format differences between providers:
| Provider | Format Support | Installation |
| ----------------- | --------------------- | ------------------------------------- |
| OpenAI | Function calling | `pip install "pipecat-ai[openai]"` |
| OpenAI-compatible | Function calling | Provider-specific (see below) |
| Anthropic | Native tools | `pip install "pipecat-ai[anthropic]"` |
| Google Gemini | Function declarations | `pip install "pipecat-ai[google]"` |
| AWS Bedrock | Anthropic-compatible | `pip install "pipecat-ai[aws]"` |
Any LLM service that extends `OpenAILLMService` is automatically supported. This includes services like Groq, Together, Cerebras, DeepSeek, and others that use the OpenAI-compatible API format.
## Additional Notes
* **State Management**: Use `flow_manager.state` dictionary for persistent conversation data
* **Automatic Function Call Registration and Validation**: All functions are automatically registered and validated at run-time
* **Provider Compatibility**: Format differences handled automatically via adapter system
# Types
Source: https://docs.pipecat.ai/api-reference/pipecat-flows/types
Type definitions and configuration schemas for Pipecat Flows
## NodeConfig
Configuration for a single node in a conversation flow. `task_messages` is the only required field.
List of message dicts defining the current node's objectives. These tell the
LLM what to do in this conversation state.
```python theme={null}
"task_messages": [
{"role": "system", "content": "Ask the user for their name and email address."}
]
```
Identifier for the node. Useful for debug logging. If not provided, a UUID is
generated automatically.
The bot's role and personality as a plain string. Sent as the LLM's system
instruction via `LLMUpdateSettingsFrame`. Once set, the system instruction
persists across node transitions until a new node explicitly sets
`role_message` again.
```python theme={null}
"role_message": "You are a friendly customer service agent."
```
Deprecated list-of-dicts format for the bot's role and personality. Use
[`role_message`](#) (`str`) instead. Will be removed in 1.0.0.
```python theme={null}
"role_messages": [
{"role": "system", "content": "You are a friendly customer service agent."}
]
```
List of function definitions available in this node. Accepts provider-specific
dict format, [`FlowsFunctionSchema`](#flowsfunctionschema) objects, or [direct
functions](#flowsdirectfunction). See [Function
Types](/api-reference/pipecat-flows/overview#function-types).
Actions to execute before LLM inference when transitioning to this node. See
[ActionConfig](#actionconfig).
Actions to execute after LLM inference when transitioning to this node. If
`respond_immediately` is `False`, post-actions are deferred until after the
first LLM response in this node. See [ActionConfig](#actionconfig).
Strategy for managing conversation context when transitioning to this node.
Overrides the default strategy set on
[FlowManager](/api-reference/pipecat-flows/flow-manager). See
[ContextStrategyConfig](#contextstrategyconfig).
Whether to trigger LLM inference immediately upon entering the node. Set to
`False` when you want to wait for user input before the LLM responds (e.g.,
after a `tts_say` pre-action that asks a question).
## FlowsFunctionSchema
Dataclass for defining function call schemas with Flows-specific properties. Provides a uniform way to define functions that works across all LLM providers.
Name of the function. This is used to identify the function in LLM tool calls.
Description of what the function does. The LLM uses this to decide when to
call the function.
Dictionary defining the function's parameters using JSON Schema format.
```python theme={null}
"properties": {
"city": {
"type": "string",
"description": "The city to get weather for"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units"
}
}
```
List of required parameter names from `properties`.
Function handler to process the function call. Can be a legacy handler
`(args)` or modern handler `(args, flow_manager)`. The handler should return a
[`FlowResult`](#flowresult) or a
[`ConsolidatedFunctionResult`](#consolidatedfunctionresult) tuple.
Whether to cancel this function call when the user interrupts. Set to `True`
if you want the function call to be cancelled when the user speaks.
Optional per-tool timeout in seconds. Overrides the global
`function_call_timeout_secs` set on the LLM service. When `None`, the global
timeout is used.
### Methods
#### to\_function\_schema
```python theme={null}
schema.to_function_schema() -> FunctionSchema
```
Convert to a standard `FunctionSchema` for use with LLMs. Strips Flows-specific fields (`handler`, `cancel_on_interruption`, `timeout_secs`).
### Example
```python theme={null}
from pipecat_flows import FlowsFunctionSchema, FlowResult
async def handle_weather(args, flow_manager):
city = args["city"]
weather = await get_weather(city)
return {"status": "success", "weather": weather}
weather_function = FlowsFunctionSchema(
name="get_weather",
description="Get current weather for a city",
properties={
"city": {"type": "string", "description": "City name"}
},
required=["city"],
handler=handle_weather,
)
```
## ActionConfig
TypedDict for configuring actions that execute during node transitions.
Action type identifier. Must match a registered action handler. Built-in types
are `"tts_say"`, `"end_conversation"`, and `"function"`.
Action handler function. Required for custom action types if not previously
registered via
[`FlowManager.register_action()`](/api-reference/pipecat-flows/flow-manager#register_action).
Can be a legacy handler `(action)` or modern handler `(action, flow_manager)`.
Text content used by `tts_say` and optionally by `end_conversation` (as a
goodbye message).
Additional fields are allowed and passed through to the handler. For example,
a `"notify_slack"` action could include `"channel"` and `"text"` fields.
### Built-in Action Types
| Type | Description | Required Fields |
| ------------------ | ----------------------------------------------------------- | ----------------- |
| `tts_say` | Speak text using the pipeline's TTS service | `text` |
| `end_conversation` | End the conversation, optionally speaking a goodbye message | `text` (optional) |
| `function` | Execute a function inline in the pipeline | `handler` |
### Example
```python theme={null}
node_config: NodeConfig = {
"task_messages": [{"role": "system", "content": "Help the user."}],
"pre_actions": [
{"type": "tts_say", "text": "Welcome! Let me help you with that."},
],
"post_actions": [
{"type": "end_conversation", "text": "Goodbye!"},
],
}
```
## ContextStrategy
Enum defining strategies for managing conversation context during node transitions.
| Value | Description |
| -------------------- | -------------------------------------------------------------------------------------------------------------------------------------- |
| `APPEND` | Append new messages to existing context. This is the default behavior. |
| `RESET` | Reset context with new messages only. Previous conversation history is discarded. |
| `RESET_WITH_SUMMARY` | Reset context but include an LLM-generated summary of the previous conversation. Requires `summary_prompt` in `ContextStrategyConfig`. |
```python theme={null}
from pipecat_flows import ContextStrategy
strategy = ContextStrategy.APPEND
strategy = ContextStrategy.RESET
strategy = ContextStrategy.RESET_WITH_SUMMARY
```
## ContextStrategyConfig
Dataclass for configuring context management behavior.
The context management strategy to use. See
[ContextStrategy](#contextstrategy).
Prompt text for generating a conversation summary. Required when using
`RESET_WITH_SUMMARY`. The LLM uses this prompt to summarize the conversation
before resetting context.
**Raises:** `ValueError` if `summary_prompt` is not provided when using `RESET_WITH_SUMMARY`.
### Example
```python theme={null}
from pipecat_flows import ContextStrategy, ContextStrategyConfig
# Append (default)
config = ContextStrategyConfig(strategy=ContextStrategy.APPEND)
# Reset
config = ContextStrategyConfig(strategy=ContextStrategy.RESET)
# Reset with summary
config = ContextStrategyConfig(
strategy=ContextStrategy.RESET_WITH_SUMMARY,
summary_prompt="Summarize the key information collected so far.",
)
```
## flows\_direct\_function Decorator
Decorator that attaches metadata to a Pipecat direct function for use in Flows.
```python theme={null}
@flows_direct_function(*, cancel_on_interruption: bool = False, timeout_secs: Optional[float] = None)
```
| Parameter | Type | Default | Description |
| ------------------------ | ------- | ------- | ----------------------------------------------------------------------------------------- |
| `cancel_on_interruption` | `bool` | `False` | Whether to cancel the function call when the user interrupts. |
| `timeout_secs` | `float` | `None` | Optional per-tool timeout in seconds, overriding the global `function_call_timeout_secs`. |
Direct functions have their schema automatically extracted from the function signature and docstring. The first parameter must be `flow_manager: FlowManager`, and all other parameters become the function's properties. The docstring provides the function description and parameter descriptions (Google-style).
Direct functions must return a [`ConsolidatedFunctionResult`](#consolidatedfunctionresult) tuple.
### Example
```python theme={null}
from pipecat_flows import FlowManager, flows_direct_function, ConsolidatedFunctionResult
@flows_direct_function(cancel_on_interruption=False)
async def lookup_order(
flow_manager: FlowManager, order_id: str
) -> ConsolidatedFunctionResult:
"""Look up an order by its ID.
Args:
order_id: The order ID to look up.
"""
order = await db.get_order(order_id)
flow_manager.state["order"] = order
result = {"status": "success", "order": order}
next_node = create_order_details_node(order)
return result, next_node
```
The function can then be passed directly in a node's `functions` list:
```python theme={null}
node_config: NodeConfig = {
"task_messages": [{"role": "system", "content": "Ask for the order ID."}],
"functions": [lookup_order],
}
```
## FlowsDirectFunction
Protocol defining the interface for direct functions. Any async callable matching this signature can be used as a direct function in node configurations.
```python theme={null}
class FlowsDirectFunction(Protocol):
def __call__(
self, flow_manager: FlowManager, **kwargs: Any
) -> Awaitable[ConsolidatedFunctionResult]: ...
```
## Type Aliases
### FlowResult
```python theme={null}
class FlowResult(TypedDict, total=False):
status: str
error: str
```
Base return type for function results. The `status` field indicates the outcome. The optional `error` field contains an error message if execution failed. Additional fields are allowed and passed through to the LLM.
### FlowArgs
```python theme={null}
FlowArgs = Dict[str, Any]
```
Type alias for function handler arguments. Contains the parameters extracted from the LLM's function call.
### ConsolidatedFunctionResult
```python theme={null}
ConsolidatedFunctionResult = Tuple[Optional[FlowResult], Optional[NodeConfig]]
```
Return type for consolidated function handlers that both do work and specify the next node:
* First element: The function result (or `None` for transition-only functions)
* Second element: The next node as a `NodeConfig`, or `None` for node functions
### FlowFunctionHandler
```python theme={null}
FlowFunctionHandler = Callable[
[FlowArgs, FlowManager], Awaitable[FlowResult | ConsolidatedFunctionResult]
]
```
Type for modern function handlers that receive both arguments and the `FlowManager` instance.
### LegacyFunctionHandler
```python theme={null}
LegacyFunctionHandler = Callable[
[FlowArgs], Awaitable[FlowResult | ConsolidatedFunctionResult]
]
```
Type for legacy function handlers that only receive arguments. Both legacy and modern handlers are supported; the flow manager detects the signature automatically.
# FrameProcessor Events
Source: https://docs.pipecat.ai/api-reference/server/events/frame-processor-events
Handle errors and monitor frame processing with events available on every processor
## Overview
Every frame processor in Pipecat — including all services, transports, and custom processors — inherits from `FrameProcessor` and exposes a set of events for error handling and frame monitoring. These events are **synchronous**, meaning handlers run inline and must execute quickly to avoid blocking the pipeline.
## Events Summary
| Event | Description |
| ------------------------- | --------------------------------------------------- |
| `on_error` | An error occurred in this processor |
| `on_before_process_frame` | A frame is about to be processed |
| `on_after_process_frame` | A frame has just been processed |
| `on_before_push_frame` | A frame is about to be pushed to the next processor |
| `on_after_push_frame` | A frame has just been pushed to the next processor |
## Error Handling
### on\_error
Fired when an error occurs in this processor. This is called before the `ErrorFrame` is pushed upstream through the pipeline, giving you an opportunity to handle errors at the processor level.
```python theme={null}
@tts.event_handler("on_error")
async def on_tts_error(processor, error):
print(f"Error in {processor}: {error.error}")
if error.exception:
print(f"Exception: {error.exception}")
if error.fatal:
print("This is a fatal error — pipeline will be cancelled")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ---------------- | ---------------------------------------- |
| `processor` | `FrameProcessor` | The processor where the error occurred |
| `error` | `ErrorFrame` | The error frame containing error details |
The `ErrorFrame` provides:
* `error` (str): The error message
* `exception` (Optional\[Exception]): The underlying exception, if any
* `fatal` (bool): Whether this is a fatal error that will cancel the pipeline
* `processor` (Optional\[FrameProcessor]): The processor that originated the error
Since `on_error` is synchronous, the handler runs before the error frame
propagates upstream. Keep your handler fast — use it for logging or setting
flags, not for I/O operations.
### Example: Error handling across multiple processors
```python theme={null}
async def handle_service_error(processor, error):
logger.error(f"Service error in {processor.name}: {error.error}")
# Notify monitoring system, set a flag, etc.
# Register the same handler on multiple services
stt.add_event_handler("on_error", handle_service_error)
tts.add_event_handler("on_error", handle_service_error)
llm.add_event_handler("on_error", handle_service_error)
```
## Frame Processing Hooks
These events let you observe frames as they flow through a processor. They're useful for debugging, logging, or lightweight monitoring.
### on\_before\_process\_frame
Fired immediately before a frame is processed by this processor's `process_frame()` method.
```python theme={null}
@stt.event_handler("on_before_process_frame")
async def on_before_process(processor, frame):
print(f"{processor} is about to process: {frame}")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ---------------- | ---------------------------------------- |
| `processor` | `FrameProcessor` | The processor about to process the frame |
| `frame` | `Frame` | The frame about to be processed |
### on\_after\_process\_frame
Fired immediately after a frame has been processed by this processor's `process_frame()` method.
```python theme={null}
@stt.event_handler("on_after_process_frame")
async def on_after_process(processor, frame):
print(f"{processor} finished processing: {frame}")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ---------------- | -------------------------------------- |
| `processor` | `FrameProcessor` | The processor that processed the frame |
| `frame` | `Frame` | The frame that was processed |
### on\_before\_push\_frame
Fired immediately before a frame is pushed to the next processor in the pipeline.
```python theme={null}
@llm.event_handler("on_before_push_frame")
async def on_before_push(processor, frame):
print(f"{processor} is about to push: {frame}")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ---------------- | ------------------------------- |
| `processor` | `FrameProcessor` | The processor pushing the frame |
| `frame` | `Frame` | The frame about to be pushed |
### on\_after\_push\_frame
Fired immediately after a frame has been pushed to the next processor in the pipeline.
```python theme={null}
@llm.event_handler("on_after_push_frame")
async def on_after_push(processor, frame):
print(f"{processor} pushed: {frame}")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ---------------- | ----------------------------------- |
| `processor` | `FrameProcessor` | The processor that pushed the frame |
| `frame` | `Frame` | The frame that was pushed |
All frame processing events are **synchronous** — handlers block the pipeline
until they complete. Avoid any I/O, `await` calls to external services, or
other slow operations in these handlers. Use them only for fast operations
like logging, counting, or setting flags.
## Related
* [Events](/api-reference/server/events/overview) - Overview of all events in Pipecat
* [PipelineTask](/api-reference/server/pipeline/pipeline-task#event-handlers) - Pipeline lifecycle events
* [Observer Pattern](/api-reference/server/utilities/observers/observer-pattern) - Non-intrusive pipeline monitoring
# Events Overview
Source: https://docs.pipecat.ai/api-reference/server/events/overview
Monitor and respond to lifecycle changes using Pipecat's event system
## Overview
Pipecat provides an event system that lets you hook into lifecycle changes across the framework — pipeline state transitions, service connections, transport activity, conversation turns, errors, and more.
Events are available on most Pipecat objects (anything that inherits from `BaseObject`). You register handlers using the `@event_handler` decorator, and Pipecat calls them automatically when things happen.
```python theme={null}
@task.event_handler("on_pipeline_started")
async def on_pipeline_started(task, frame):
print("Pipeline is running!")
@tts.event_handler("on_connected")
async def on_tts_connected(service):
print(f"TTS service connected: {service}")
```
## Registering Event Handlers
Use the `@event_handler` decorator on the object that emits the event:
```python theme={null}
@object.event_handler("event_name")
async def handler(object, ...):
# Handle the event
...
```
The first parameter is always the object that emitted the event. Additional parameters depend on the specific event.
You can also register handlers without the decorator:
```python theme={null}
async def my_handler(object, ...):
...
object.add_event_handler("event_name", my_handler)
```
Handlers can be either `async` or synchronous functions — Pipecat detects which and calls them appropriately.
### Multiple Handlers
You can register multiple handlers for the same event. They execute in the order they were registered:
```python theme={null}
@task.event_handler("on_pipeline_finished")
async def log_finished(task, frame):
print("Pipeline finished (handler 1)")
@task.event_handler("on_pipeline_finished")
async def cleanup(task, frame):
print("Cleaning up (handler 2)")
```
## Synchronous vs Asynchronous Events
Events are either **synchronous** or **asynchronous**, depending on how the component registered them internally:
* **Synchronous events** (`sync=True`) run the handler inline — the caller waits for the handler to complete before continuing. These are used for events that must complete immediately, such as frame processing hooks and error handling. Synchronous event handlers should execute fast to avoid blocking the pipeline.
* **Asynchronous events** (default) run the handler in a background `asyncio.Task`. The caller continues immediately. These are used for events where you might do I/O, logging, or other work that shouldn't block the pipeline.
You don't need to choose — the sync/async behavior is determined by the
component, not by your handler. Your handler can be an `async def` or a
regular `def` regardless.
## Events Reference
The table below lists all events available in Pipecat, grouped by component. Click through to the linked documentation for full details including handler signatures and usage examples.
### Pipeline
Events on [`PipelineTask`](/api-reference/server/pipeline/pipeline-task#event-handlers).
| Event | Description |
| ----------------------------- | ---------------------------------------------------------------- |
| `on_pipeline_started` | Pipeline has started processing |
| `on_pipeline_finished` | Pipeline reached a terminal state (stopped, ended, or cancelled) |
| `on_pipeline_error` | An error frame reached the pipeline task |
| `on_frame_reached_upstream` | A filtered frame type reached the pipeline source |
| `on_frame_reached_downstream` | A filtered frame type reached the pipeline sink |
| `on_idle_timeout` | No activity detected within the idle timeout period |
### Frame Processor
Events on every [`FrameProcessor`](/api-reference/server/events/frame-processor-events). Since all services, transports, and processors inherit from `FrameProcessor`, these events are available on any processor in your pipeline.
| Event | Description |
| ------------------------- | --------------------------------------------------- |
| `on_error` | An error occurred in this processor |
| `on_before_process_frame` | A frame is about to be processed |
| `on_after_process_frame` | A frame has just been processed |
| `on_before_push_frame` | A frame is about to be pushed to the next processor |
| `on_after_push_frame` | A frame has just been pushed to the next processor |
### Turn Management
Events on context aggregators from [`LLMContextAggregatorPair`](/api-reference/server/utilities/turn-management/turn-events).
| Event | Emitter | Description |
| --------------------------- | ---------------------- | --------------------------------------------------- |
| `on_user_turn_started` | `user_aggregator` | User begins speaking |
| `on_user_turn_stopped` | `user_aggregator` | User finishes speaking (includes transcript) |
| `on_user_turn_stop_timeout` | `user_aggregator` | User turn ended due to timeout |
| `on_user_turn_idle` | `user_aggregator` | User has been idle for configured timeout |
| `on_user_mute_started` | `user_aggregator` | User input was muted |
| `on_user_mute_stopped` | `user_aggregator` | User input was unmuted |
| `on_assistant_turn_started` | `assistant_aggregator` | Assistant begins responding |
| `on_assistant_turn_stopped` | `assistant_aggregator` | Assistant finishes responding (includes transcript) |
| `on_assistant_thought` | `assistant_aggregator` | Assistant produced a thought (reasoning models) |
### STT Services
Events on STT service instances. Available on all WebSocket-based STT services. See [Service Events](/api-reference/server/events/service-events) for details.
| Event | Description |
| --------------------- | ----------------------------------- |
| `on_connected` | WebSocket connection established |
| `on_disconnected` | WebSocket connection closed |
| `on_connection_error` | WebSocket connection error occurred |
Some STT services have additional events:
**Deepgram STT:**
| Event | Description |
| ------------------- | ------------------------------- |
| `on_speech_started` | Speech detected in audio stream |
| `on_utterance_end` | End of utterance detected |
**Deepgram Flux STT:**
| Event | Description |
| ---------------------- | ------------------------------------ |
| `on_start_of_turn` | Start of a new turn detected |
| `on_turn_resumed` | A previously paused turn has resumed |
| `on_end_of_turn` | End of turn detected |
| `on_eager_end_of_turn` | Early end-of-turn prediction |
| `on_update` | Transcript updated |
**Speechmatics STT:**
| Event | Description |
| -------------------- | -------------------------------------- |
| `on_speakers_result` | Speaker identification result received |
**Sarvam STT:**
| Event | Description |
| ------------------- | ------------------------- |
| `on_speech_started` | Speech detected |
| `on_speech_stopped` | Speech stopped |
| `on_utterance_end` | End of utterance detected |
### TTS Services
Events on TTS service instances. Available on all WebSocket-based TTS services. See [Service Events](/api-reference/server/events/service-events) for details.
| Event | Description |
| --------------------- | ------------------------------------- |
| `on_connected` | WebSocket connection established |
| `on_disconnected` | WebSocket connection closed |
| `on_connection_error` | WebSocket connection error occurred |
| `on_tts_request` | A TTS synthesis request is being sent |
### LLM Services
Events on LLM service instances. See [Service Events](/api-reference/server/events/service-events) for details.
| Event | Description |
| --------------------------- | ------------------------------------------ |
| `on_function_calls_started` | LLM has started making function/tool calls |
| `on_completion_timeout` | LLM response timed out |
**OpenAI Realtime / Grok Realtime:**
| Event | Description |
| ------------------------------ | ----------------------------------- |
| `on_conversation_item_created` | A new conversation item was created |
| `on_conversation_item_updated` | A conversation item was updated |
### Daily Transport
Events on [`DailyTransport`](/api-reference/server/services/transport/daily#event-handlers) instances.
| Event | Description |
| ----------------------------- | --------------------------------- |
| `on_joined` | Bot joined the room |
| `on_left` | Bot left the room |
| `on_error` | Transport error occurred |
| `on_call_state_updated` | Call state changed |
| `on_client_connected` | A participant connected |
| `on_client_disconnected` | A participant disconnected |
| `on_first_participant_joined` | First participant joined the room |
| `on_participant_joined` | A participant joined |
| `on_participant_left` | A participant left |
| `on_participant_updated` | A participant's state was updated |
| `on_active_speaker_changed` | Active speaker changed |
| `on_app_message` | App message received |
| `on_transcription_message` | Transcription message received |
| `on_recording_started` | Recording started |
| `on_recording_stopped` | Recording stopped |
| `on_recording_error` | Recording error occurred |
| `on_dialin_connected` | Dial-in call connected |
| `on_dialin_ready` | Dial-in SIP endpoint ready |
| `on_dialin_stopped` | Dial-in call stopped |
| `on_dialin_error` | Dial-in error occurred |
| `on_dialin_warning` | Dial-in warning |
| `on_dialout_answered` | Dial-out call answered |
| `on_dialout_connected` | Dial-out call connected |
| `on_dialout_stopped` | Dial-out call stopped |
| `on_dialout_error` | Dial-out error occurred |
| `on_dialout_warning` | Dial-out warning |
| `on_before_leave` | About to leave the room (sync) |
### LiveKit Transport
Events on [`LiveKitTransport`](/api-reference/server/services/transport/livekit#event-handlers) instances.
| Event | Description |
| ----------------------------- | -------------------------- |
| `on_connected` | Connected to the room |
| `on_disconnected` | Disconnected from the room |
| `on_participant_connected` | A participant connected |
| `on_participant_disconnected` | A participant disconnected |
| `on_first_participant_joined` | First participant joined |
| `on_audio_track_subscribed` | Audio track subscribed |
| `on_audio_track_unsubscribed` | Audio track unsubscribed |
| `on_video_track_subscribed` | Video track subscribed |
| `on_video_track_unsubscribed` | Video track unsubscribed |
| `on_data_received` | Data message received |
| `on_call_state_updated` | Call state changed |
| `on_before_disconnect` | About to disconnect (sync) |
### WebSocket Transports
Events on WebSocket transport instances.
**WebSocketServerTransport:**
| Event | Description |
| ------------------------ | ------------------------- |
| `on_client_connected` | Client connected |
| `on_client_disconnected` | Client disconnected |
| `on_session_timeout` | Session timed out |
| `on_websocket_ready` | WebSocket server is ready |
**FastAPIWebsocketTransport:**
| Event | Description |
| ------------------------ | ------------------- |
| `on_client_connected` | Client connected |
| `on_client_disconnected` | Client disconnected |
| `on_session_timeout` | Session timed out |
**WebSocketClientTransport:**
| Event | Description |
| ----------------- | ------------------------ |
| `on_connected` | Connected to server |
| `on_disconnected` | Disconnected from server |
### Other Transports
**SmallWebRTCTransport:**
| Event | Description |
| ------------------------ | -------------------- |
| `on_client_connected` | Client connected |
| `on_client_disconnected` | Client disconnected |
| `on_app_message` | App message received |
**HeyGenTransport / TavusTransport:**
| Event | Description |
| ------------------------ | ------------------- |
| `on_client_connected` | Client connected |
| `on_client_disconnected` | Client disconnected |
### Utilities
**ContextSummarizer:**
| Event | Description |
| -------------------- | ------------------------------------------------------- |
| `on_summary_applied` | A summary has been successfully applied to the context. |
**ServiceSwitcher:**
| Event | Description |
| --------------------- | --------------------------- |
| `on_service_switched` | Active service was switched |
**TranscriptProcessor:**
| Event | Description |
| ---------------------- | ---------------------- |
| `on_transcript_update` | Transcript was updated |
**AudioBufferProcessor:**
| Event | Description |
| ------------------------- | ------------------------------ |
| `on_audio_data` | Audio data available |
| `on_track_audio_data` | Per-track audio data available |
| `on_user_turn_audio_data` | User turn audio data available |
| `on_bot_turn_audio_data` | Bot turn audio data available |
**RTVIProcessor:**
| Event | Description |
| ------------------- | ----------------------- |
| `on_bot_started` | Bot started |
| `on_client_ready` | Client is ready |
| `on_client_message` | Client message received |
### Observers
**TurnTrackingObserver:**
| Event | Description |
| ----------------- | --------------------------- |
| `on_turn_started` | A conversation turn started |
| `on_turn_ended` | A conversation turn ended |
**UserBotLatencyObserver:**
| Event | Description |
| --------------------- | ----------------------------- |
| `on_latency_measured` | Latency measurement available |
### Extensions
**VoicemailDetector:**
| Event | Description |
| -------------------------- | ------------------------------------------ |
| `on_conversation_detected` | Live conversation detected (not voicemail) |
| `on_voicemail_detected` | Voicemail detected |
**IVRNavigator:**
| Event | Description |
| -------------------------- | ----------------------------- |
| `on_conversation_detected` | Live conversation detected |
| `on_ivr_status_changed` | IVR navigation status changed |
# Service Events
Source: https://docs.pipecat.ai/api-reference/server/events/service-events
Handle connection lifecycle and service-specific events for STT, TTS, and LLM services
## Overview
Pipecat services emit events for connection lifecycle management and service-specific activity. These events let you monitor WebSocket connections, handle errors, and react to service behavior.
## Connection Events
All WebSocket-based STT and TTS services share a common set of connection events. These are emitted by the base `STTService` and `TTSService` classes, so they work the same way regardless of which provider you use.
### Events Summary
| Event | Available On | Description |
| --------------------- | ------------ | ----------------------------------- |
| `on_connected` | STT, TTS | WebSocket connection established |
| `on_disconnected` | STT, TTS | WebSocket connection closed |
| `on_connection_error` | STT, TTS | WebSocket connection error occurred |
### on\_connected
Fired when the service's WebSocket connection is established. This is useful for logging, monitoring connection health, or triggering actions that depend on the service being ready.
```python theme={null}
@stt.event_handler("on_connected")
async def on_stt_connected(service):
print(f"STT connected: {service.name}")
@tts.event_handler("on_connected")
async def on_tts_connected(service):
print(f"TTS connected: {service.name}")
```
**Parameters:**
| Parameter | Type | Description |
| --------- | --------------------------- | -------------------- |
| `service` | `STTService` / `TTSService` | The service instance |
Not all STT and TTS services use WebSocket connections. HTTP-based services
(e.g., Azure TTS, Google TTS) do not emit connection events.
### on\_disconnected
Fired when the service's WebSocket connection is closed, whether due to normal shutdown or an error.
```python theme={null}
@stt.event_handler("on_disconnected")
async def on_stt_disconnected(service):
print(f"STT disconnected: {service.name}")
```
**Parameters:**
| Parameter | Type | Description |
| --------- | --------------------------- | -------------------- |
| `service` | `STTService` / `TTSService` | The service instance |
### on\_connection\_error
Fired when a WebSocket connection error occurs. The error is also pushed as an `ErrorFrame` through the pipeline.
```python theme={null}
@tts.event_handler("on_connection_error")
async def on_tts_connection_error(service, error):
print(f"TTS connection error: {error}")
```
**Parameters:**
| Parameter | Type | Description |
| --------- | --------------------------- | -------------------- |
| `service` | `STTService` / `TTSService` | The service instance |
| `error` | `str` | The error message |
WebSocket-based services automatically reconnect with exponential backoff (3
retries, 4-10s waits) when connection errors occur. The `on_connection_error`
event fires for each failed attempt.
## TTS Events
### on\_tts\_request
Fired just before a TTS synthesis request is sent to the service. This is useful for logging, monitoring, or modifying behavior based on what text is being synthesized.
```python theme={null}
@tts.event_handler("on_tts_request")
async def on_tts_request(service, context_id, text):
print(f"TTS synthesizing ({context_id}): {text}")
```
**Parameters:**
| Parameter | Type | Description |
| ------------ | ------------ | ----------------------------------------- |
| `service` | `TTSService` | The TTS service instance |
| `context_id` | `str` | The context ID for this TTS request |
| `text` | `str` | The prepared text about to be synthesized |
## LLM Events
### on\_function\_calls\_started
Fired when the LLM starts making function (tool) calls. This event is emitted before the function calls are executed.
```python theme={null}
@llm.event_handler("on_function_calls_started")
async def on_function_calls_started(service, function_calls):
for call in function_calls:
print(f"LLM calling function: {call.function_name}")
```
**Parameters:**
| Parameter | Type | Description |
| ---------------- | ------------ | ------------------------------------------ |
| `service` | `LLMService` | The LLM service instance |
| `function_calls` | `list` | List of function call objects from the LLM |
### on\_completion\_timeout
Fired when an LLM completion request times out. This can happen with slow models or large context windows. The timeout is also pushed as an error frame.
```python theme={null}
@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
print("LLM completion timed out — consider increasing timeout or reducing context")
```
**Parameters:**
| Parameter | Type | Description |
| --------- | ------------ | ------------------------ |
| `service` | `LLMService` | The LLM service instance |
## Related
* [Events Overview](/api-reference/server/events/overview) - Overview of all events in Pipecat
* [FrameProcessor Events](/api-reference/server/events/frame-processor-events) - Error handling and frame monitoring events
* [Turn Events](/api-reference/server/utilities/turn-management/turn-events) - User and assistant turn lifecycle events
# Control Frames
Source: https://docs.pipecat.ai/api-reference/server/frames/control-frames
Reference for ControlFrame types: pipeline lifecycle, response boundaries, service settings, and runtime configuration
ControlFrames signal boundaries, state changes, and configuration updates within the pipeline. They are queued and processed in order alongside DataFrames. ControlFrames are cancelled on `InterruptionFrame` unless combined with `UninterruptibleFrame`. See the [frames overview](/api-reference/server/frames/overview) for base class details and the full frame hierarchy.
## Pipeline Lifecycle
### EndFrame
Signals graceful pipeline shutdown. `EndFrame` is queued with other non-SystemFrames, which allows FrameProcessors to be shut down in order, allowing queued frames ahead of the `EndFrame` to be processed first.
Inherits from `UninterruptibleFrame`, meaning it cannot be cancelled by `InterruptionFrame`.
Optional reason for the shutdown, passed along for logging or inspection.
### StopFrame
Stops the pipeline but keeps processors in a running state. Like `EndFrame`, `StopFrame` is queued with other non-SystemFrames allowing frames preceding it to be processed first. Useful when you need to halt frame flow without tearing down the entire processor graph.
Inherits from `UninterruptibleFrame`.
### OutputTransportReadyFrame
Indicates that the output transport is ready to receive frames. Processors waiting on transport availability can use this as their signal to begin sending.
### HeartbeatFrame
Used for pipeline health monitoring. Processors can observe these to detect stalls or measure latency.
Timestamp value for the heartbeat.
## Processor Pause/Resume
While a processor is paused, incoming frames accumulate in its internal queue rather than being dropped. Once the processor is resumed, it drains the queue and processes all buffered frames in the order they arrived.
For example, the TTS service pauses itself while synthesizing a `TTSSpeakFrame`. If new text frames arrive during synthesis, they queue up instead of producing overlapping audio. The TTS resumes when `BotStoppedSpeakingFrame` (a `SystemFrame`) arrives, and the buffered frames are processed in order.
Internally, each processor has two queues: a high-priority input queue for SystemFrames and a process queue for everything else. Pausing blocks the process queue, but SystemFrames continue to flow through the input queue. This is why the typical pattern is for a processor to pause itself and then resume in response to a `SystemFrame`.
`FrameProcessorResumeFrame` is a `ControlFrame`, which means it enters the
same process queue that pausing blocks. If DataFrames have already queued up
ahead of it, the resume frame will be stuck behind them and the processor will
stay paused. To resume a paused processor from outside, use the `SystemFrame`
variant `FrameProcessorResumeUrgentFrame` instead — it bypasses the process
queue entirely. See [System
Frames](/api-reference/server/frames/system-frames#processor-pauseresume-urgent).
### FrameProcessorPauseFrame
Pauses a specific processor. Queued in order, so the processor finishes handling any frames ahead of it before pausing.
The processor to pause.
### FrameProcessorResumeFrame
Resumes a previously paused processor, releasing all buffered frames for processing.
The processor to resume.
Because this is a `ControlFrame`, it will be blocked behind any DataFrames
that queued up while the processor was paused. Use
`FrameProcessorResumeUrgentFrame` if the processor may have buffered frames.
## LLM Response Boundaries
These frames bracket LLM output, letting downstream processors (aggregators, TTS services, transports) know when a response starts and ends.
### LLMFullResponseStartFrame
Marks the beginning of an LLM response. Followed by one or more `TextFrame`s and terminated by `LLMFullResponseEndFrame`.
### LLMFullResponseEndFrame
Marks the end of an LLM response.
### VisionFullResponseStartFrame
Beginning of a vision model response. Inherits from `LLMFullResponseStartFrame`.
### VisionFullResponseEndFrame
End of a vision model response. Inherits from `LLMFullResponseEndFrame`.
### LLMAssistantPushAggregationFrame
Forces the assistant aggregator to commit its buffered text to context immediately, rather than waiting for the normal end-of-response boundary.
## LLM Context Summarization
Frames that coordinate context summarization: compressing conversation history to stay within token limits.
### LLMSummarizeContextFrame
Triggers manual context summarization. Push this frame to request that the LLM summarize the current conversation context.
Optional configuration controlling summarization behavior.
### LLMContextSummaryRequestFrame
Internal request from the aggregator to the LLM service, asking it to produce a summary. You typically won't push this yourself — the aggregator creates it in response to `LLMSummarizeContextFrame` or automatic summarization triggers.
Unique identifier for this summarization request.
The conversation context to summarize.
Minimum number of recent messages to preserve after summarization.
Target token count for the summarized context.
Prompt instructing the LLM how to summarize.
Optional timeout in seconds for the summarization request.
### LLMContextSummaryResultFrame
The LLM's summarization result, sent back to the aggregator.
Inherits from `UninterruptibleFrame` to ensure the result is never dropped.
Matches the originating request.
The generated summary text.
Index of the last message included in the summary.
Error message if summarization failed, otherwise `None`.
## LLM Thought Frames
Bracket extended thinking output from LLMs that support it (e.g., Claude with extended thinking enabled).
### LLMThoughtStartFrame
Marks the beginning of LLM extended thinking content.
Whether to append thought content to the conversation context. Raises
`ValueError` if set to `True` without specifying `llm`.
Identifier for the LLM producing the thought. Required when
`append_to_context` is `True`.
### LLMThoughtEndFrame
Marks the end of LLM extended thinking content.
Thought signature, if provided by the LLM. Anthropic models include a
signature that must be preserved when appending thoughts back to context.
## Function Calling
### FunctionCallInProgressFrame
Indicates that a function call is currently executing.
Inherits from `UninterruptibleFrame`, ensuring it reaches downstream processors even during interruption.
Name of the function being called.
Unique identifier for this tool call.
Arguments passed to the function.
Whether the function call should be cancelled if the user interrupts.
## TTS State
### TTSStartedFrame
Signals the beginning of a TTS audio response.
Identifier linking this TTS output to its originating context.
### TTSStoppedFrame
Signals the end of a TTS audio response.
Identifier linking this TTS output to its originating context.
## Service Settings
Runtime settings updates for LLM, TTS, STT, and other services. These let you change service configuration mid-conversation without rebuilding the pipeline. Push an `LLMUpdateSettingsFrame`, `TTSUpdateSettingsFrame`, or `STTUpdateSettingsFrame` to update the corresponding service. See the [Changing Service Settings at Runtime](/api-reference/server/frames/overview#changing-service-settings-at-runtime) pattern for an example.
### ServiceUpdateSettingsFrame
Base frame for runtime service settings updates.
Inherits from `UninterruptibleFrame`.
Dictionary of settings to update.
Typed settings delta. Takes precedence over the `settings` dict when both are
provided.
Target a specific service instance. When `None`, the frame applies to the
first matching service in the pipeline.
### LLMUpdateSettingsFrame
Update LLM service settings at runtime. Inherits from `ServiceUpdateSettingsFrame`.
### TTSUpdateSettingsFrame
Update TTS service settings at runtime. Inherits from `ServiceUpdateSettingsFrame`.
### STTUpdateSettingsFrame
Update STT service settings at runtime. Inherits from `ServiceUpdateSettingsFrame`.
## Audio Processing
### VADParamsUpdateFrame
Update Voice Activity Detection parameters at runtime.
New VAD parameters to apply.
### FilterControlFrame
Base frame for audio filter control. Subclass this for custom filter commands.
### FilterUpdateSettingsFrame
Update audio filter settings. Inherits from `FilterControlFrame`.
Filter settings to update.
### FilterEnableFrame
Enable or disable an audio filter. Inherits from `FilterControlFrame`.
`True` to enable the filter, `False` to disable it.
### MixerControlFrame
Base frame for audio mixer control.
### MixerUpdateSettingsFrame
Update audio mixer settings. Inherits from `MixerControlFrame`.
Mixer settings to update.
### MixerEnableFrame
Enable or disable an audio mixer. Inherits from `MixerControlFrame`.
`True` to enable the mixer, `False` to disable it.
## Service Switching
### ServiceSwitcherFrame
Base frame for service switching operations.
### ManuallySwitchServiceFrame
Request a manual switch to a different service instance. Inherits from `ServiceSwitcherFrame`.
The service to switch to.
### ServiceSwitcherRequestMetadataFrame
Request that a service re-emit its metadata. Useful after switching services to ensure downstream processors have current configuration.
The service to request metadata from.
## Task Frames
Task frames are pushed upstream to the pipeline task, which converts them into the appropriate downstream frame. This indirection lets processors request pipeline-level actions without needing direct access to the pipeline task.
### TaskFrame
Base frame for task control.
### EndTaskFrame
Request graceful pipeline shutdown. The pipeline task converts this into an `EndFrame` and pushes it downstream. Inherits from `TaskFrame` and `UninterruptibleFrame`.
Optional reason for the shutdown request.
### StopTaskFrame
Request pipeline stop while keeping processors alive. Converted to a `StopFrame` downstream. Inherits from `TaskFrame` and `UninterruptibleFrame`.
# Data Frames
Source: https://docs.pipecat.ai/api-reference/server/frames/data-frames
Reference for DataFrame types: audio, image, text, transcription, and transport messages
## Overview
DataFrames carry the main content flowing through a pipeline: audio chunks, text, images, transcriptions, and messages. They are queued and processed in order with other DataFrames and ControlFrames, and any pending DataFrames are discarded when a user interrupts. See the [Frames overview](/api-reference/server/frames/overview) for base class details, mixin fields, and frame properties common to all frames.
## Audio Frames
These frames carry raw audio through the pipeline toward the output transport. Each inherits the `audio`, `sample_rate`, `num_channels`, and `num_frames` fields from the [`AudioRawFrame`](/api-reference/server/frames/overview#audiorawframe) mixin.
### OutputAudioRawFrame
A chunk of raw audio destined for the output transport. Use the inherited `transport_destination` field when your transport supports multiple audio tracks.
Inherits from `AudioRawFrame`.
### TTSAudioRawFrame
Audio generated by a TTS service, ready for playback.
Inherits from `OutputAudioRawFrame`.
Identifier for the TTS context that generated this audio.
### SpeechOutputAudioRawFrame
Audio from a continuous speech stream. The stream may contain silence frames intermixed with speech, so downstream processors may need to distinguish between the two.
Inherits from `OutputAudioRawFrame`.
## Image Frames
Frames for carrying image data to the output transport. Each inherits `image`, `size`, and `format` from the [`ImageRawFrame`](/api-reference/server/frames/overview#imagerawframe) mixin.
### OutputImageRawFrame
An image for display by the output transport. Supports the `transport_destination` field for transports with multiple video tracks.
Inherits from `ImageRawFrame`.
The `sync_with_audio` field (default `False`) is set internally, not via the
constructor. When `True`, the image is queued with audio frames so it displays
only after all preceding audio has been sent. When `False`, the transport
displays it immediately.
### URLImageRawFrame
An output image with an associated download URL, typically from a third-party image generation service.
Inherits from `OutputImageRawFrame`.
URL where the image can be downloaded.
### AssistantImageRawFrame
An image generated by the assistant for both display and inclusion in LLM context. The superclass handles display; the additional fields here carry the original image data in a format suitable for direct use in LLM context messages.
Inherits from `OutputImageRawFrame`.
Original image data for use in LLM context messages without further encoding.
MIME type of the original image data.
### SpriteFrame
An animated sprite composed of multiple image frames. The transport plays the images at the framerate specified by the transport's `camera_out_framerate` parameter.
Ordered list of image frames that make up the sprite animation.
## Text Frames
Text content at various stages of processing: raw text, LLM output, aggregated results, TTS input, and transcriptions.
### TextFrame
The fundamental text container. Emitted by LLM services, consumed by context aggregators, TTS services, and other processors.
The text content.
Several non-constructor fields control downstream behavior: - `skip_tts`
(default `None`): when set, tells the TTS service to skip this text -
`includes_inter_frame_spaces` (default `False`): indicates whether
leading/trailing spaces are already included - `append_to_context` (default
`True`): whether this text should be appended to the LLM context
### LLMTextFrame
Text generated by an LLM service. Behaves like a `TextFrame` with `includes_inter_frame_spaces` set to `True`, since LLM services include all necessary spacing.
Inherits from `TextFrame`.
### AggregatedTextFrame
Multiple text frames combined into a single frame for processing or output.
Inherits from `TextFrame`.
Method used to aggregate the text frames.
Identifier for the TTS context associated with this text.
### VisionTextFrame
Text output from a vision model. Functionally identical to `LLMTextFrame` but distinguished by type for routing purposes.
Inherits from `LLMTextFrame`.
### TTSTextFrame
Text that has been sent to a TTS service for synthesis.
Inherits from `AggregatedTextFrame`.
Identifier for the TTS context that generated this text.
### Transcriptions
Frames produced by speech-to-text services at different stages of recognition. All inherit from `TextFrame`, so they flow through text aggregators and other `TextFrame` handlers.
#### TranscriptionFrame
A non-interim transcription result from an STT service: the service's best recognition of what the user said, as opposed to the streaming partial results in `InterimTranscriptionFrame`.
Identifier for the user who spoke.
When the transcription occurred.
Detected or specified language of the speech.
Raw result object from the STT service.
Whether the STT service has explicitly committed this transcription via a
finalize signal. Some services (AssemblyAI, Deepgram, Soniox, Speechmatics)
support this; others don't, so it defaults to `False`. Turn detection
strategies can use this flag to trigger the bot's response immediately rather
than waiting for a timeout.
#### InterimTranscriptionFrame
A partial, in-progress transcription. These frames update frequently while the user is still speaking, and are superseded by a `TranscriptionFrame` once the STT service produces its result.
The partial transcription text.
Identifier for the user who spoke.
When the interim transcription occurred.
Detected or specified language of the speech.
Raw result object from the STT service.
#### TranslationFrame
A translated transcription, typically placed in the transport's receive queue when a participant speaks in a different language.
Identifier for the user who spoke.
When the translation occurred.
Target language of the translation.
## TTS Frames
### TTSSpeakFrame
Sends text to the pipeline's TTS service as a standalone utterance, independent of any LLM response turn. The TTS service creates a fresh audio context for each `TTSSpeakFrame`, whereas `TextFrame`s produced during an LLM response are grouped under the same turn context.
The text to be spoken.
Whether to append the spoken text to the LLM context.
## Transport Message Frames
### OutputTransportMessageFrame
A transport-specific message payload for sending data through the output transport. The message format depends on the transport implementation.
The transport message payload.
## DTMF Frames
### OutputDTMFFrame
A DTMF (Dual-Tone Multi-Frequency) keypress queued for output. Inherits the `button` field from the `DTMFFrame` mixin, which holds the keypad entry that was pressed.
Inherits from `DTMFFrame`.
The DTMF keypad entry to send.
For transports that support multiple dial-out destinations, set the
`transport_destination` field (inherited from `Frame`) to specify which
destination receives the DTMF tone.
## LLM Context Management
Frames that modify or trigger processing of the LLM conversation context.
### LLMMessagesAppendFrame
Appends messages to the current conversation context without replacing existing ones.
List of message dictionaries to append.
Whether the LLM should process the updated context immediately. When `None`,
the default behavior of the context aggregator applies.
### LLMMessagesUpdateFrame
Replaces the current context messages entirely with a new set.
List of message dictionaries to replace the current context.
Whether the LLM should process the updated context immediately. When `None`,
the default behavior of the context aggregator applies.
### LLMRunFrame
Triggers LLM processing with the current context. Push this frame when you want the LLM to generate a response using whatever context has already been assembled.
### LLMContextAssistantTimestampFrame
Records when an assistant message was created. Used internally to track timing of assistant responses in the conversation context.
Timestamp when the assistant message was created.
## LLM Thinking
### LLMThoughtTextFrame
A chunk of thought or reasoning text from the LLM. This is a `DataFrame`, not a `TextFrame` subclass — TTS services and text aggregators will not process it.
The text (or text chunk) of the thought.
## LLM Tool Configuration
Frames for configuring LLM function calling behavior and output settings at runtime.
### LLMSetToolsFrame
Sets the available tools for LLM function calling. The format of tool definitions typically follows JSON Schema conventions, though the exact structure depends on the LLM provider.
List of tool/function definitions for the LLM.
### LLMSetToolChoiceFrame
Configures how the LLM selects tools during function calling.
Tool choice setting: `"none"` disables tool use, `"auto"` lets the LLM decide,
`"required"` forces a tool call, or a dict specifying a particular tool.
### LLMEnablePromptCachingFrame
Toggles prompt caching for LLMs that support it.
Whether to enable prompt caching.
### LLMConfigureOutputFrame
Configures how the LLM produces output. Useful for scenarios where you want the LLM to generate tokens that update context but should not be spoken aloud.
When `True`, LLM tokens are added to context but not passed to TTS.
## Function Call Results
### FunctionCallResultFrame
Contains the result of a completed function call execution.
Inherits from `UninterruptibleFrame` to ensure the result always reaches the context aggregator.
Name of the function that was executed.
Unique identifier for the function call.
Arguments that were passed to the function.
The result returned by the function.
Whether to run the LLM after this result. Overrides the default behavior.
Additional properties for result handling.
# LLM Frames
Source: https://docs.pipecat.ai/api-reference/server/frames/llm-frames
LLM context frame and function calling helper dataclasses
This page documents LLM-specific types that don't belong on a base-type page: `LLMContextFrame` (which inherits directly from `Frame`) and the helper dataclasses used by function calling frames. All other LLM-related frames are documented on their base-type pages. See [Related Frames](#related-frames) below for links.
## LLMContextFrame
Contains a complete LLM context. Acts as a signal to LLM services to ingest the provided context and generate a response.
Inherits directly from `Frame` (not `DataFrame`, `ControlFrame`, or `SystemFrame`).
The LLM context containing messages, tools, and configuration.
## Function Calling Helper Dataclasses
These are plain dataclasses used as fields within function calling frames, not frames themselves.
### FunctionCallFromLLM
Represents a function call returned by the LLM, ready for execution.
The name of the function to call.
A unique identifier for the function call.
The arguments to pass to the function.
The LLM context at the time the function call was made.
### FunctionCallResultProperties
Configures how a function call result is handled after execution.
Whether to run the LLM after receiving this result.
Async callback to execute when the context is updated with the result.
## Related Frames
LLM-related frames organized by base type:
* **Data Frames**: [Context Management](/api-reference/server/frames/data-frames#llm-context-management), [Thinking](/api-reference/server/frames/data-frames#llm-thinking), [Tool Configuration](/api-reference/server/frames/data-frames#llm-tool-configuration), [Function Call Results](/api-reference/server/frames/data-frames#function-call-results)
* **Control Frames**: [Response Boundaries](/api-reference/server/frames/control-frames#llm-response-boundaries), [Context Summarization](/api-reference/server/frames/control-frames#llm-context-summarization), [Thought Frames](/api-reference/server/frames/control-frames#llm-thought-frames), [Function Calling](/api-reference/server/frames/control-frames#function-calling), [Service Settings](/api-reference/server/frames/control-frames#service-settings)
* **System Frames**: [Function Calling](/api-reference/server/frames/system-frames#function-calling)
# Frames
Source: https://docs.pipecat.ai/api-reference/server/frames/overview
Frame categories, processing behavior, and common patterns for Pipecat pipelines
## Overview
Frames are the fundamental units of data in Pipecat. Every piece of information that moves through a pipeline — audio, text, images, control signals — is wrapped in a frame. Frame processors receive frames, act on them, and push new or modified frames along to the next processor.
All frames inherit from the base `Frame` class and are Python [dataclasses](https://docs.python.org/3/library/dataclasses.html).
## Frame Categories
Pipecat has three base frame types, each with different processing behavior:
| Base Type | Processing | Interruption Behavior |
| -------------- | ------------------------------------------------------------- | -------------------------------------- |
| `DataFrame` | Queued, processed in order with non-SystemFrames | Cancelled on user interruption |
| `ControlFrame` | Queued, processed in order with non-SystemFrames | Cancelled on user interruption |
| `SystemFrame` | Higher priority, queued, processed in order with SystemFrames | **Not** cancelled on user interruption |
### DataFrame
Data frames carry the main content flowing through a pipeline: audio chunks, text, images, and LLM messages. They are queued and processed in order with other DataFrames and ControlFrames. If a user interrupts (starts speaking while the bot is responding), any pending data frames are discarded so the new input can be handled immediately.
Examples: `TextFrame`, `OutputAudioRawFrame`, `LLMMessagesAppendFrame`, `TTSSpeakFrame`
### ControlFrame
ControlFrames signal processing boundaries and configuration changes: response start/end markers, settings updates, and state transitions. They are queued and processed in order alongside DataFrames, and like DataFrames, any pending ControlFrames are discarded when a user interrupts unless combined with `UninterruptibleFrame`.
Examples: `EndFrame`, `LLMFullResponseStartFrame`, `TTSStartedFrame`, `ServiceUpdateSettingsFrame`
### SystemFrame
SystemFrames are high-priority signals that must always be delivered: interruptions, user input, error notifications, and pipeline lifecycle events. They are queued and processed in order with other SystemFrames. Unlike DataFrames and ControlFrames, they are never discarded when a user interrupts.
Examples: `StartFrame`, `CancelFrame`, `InterruptionFrame.`, `UserStartedSpeakingFrame`, `InputAudioRawFrame`
## Frame Properties
Every frame has these properties set automatically:
Unique identifier for the frame instance.
Human-readable name combining class name and instance count (e.g.,
`TextFrame#3`). Useful for debugging.
Presentation timestamp in nanoseconds. Used for audio/video synchronization.
Dictionary for arbitrary frame metadata.
Name of the transport source that created this frame.
Name of the transport destination for this frame. Used when a transport
supports multiple output tracks.
## Frame Direction
Frames flow through the pipeline in one of two directions:
```python theme={null}
from pipecat.processors.frame_processor import FrameDirection
class FrameDirection(Enum):
DOWNSTREAM = 1 # Input → Output (default)
UPSTREAM = 2 # Output → Input
```
**Downstream** is the default. In a typical voice AI pipeline, audio enters from the transport input, gets transcribed, runs through the LLM, converts to speech, and reaches the transport output.
**Upstream** lets processors send information back toward the start of the pipeline. The most common example: the assistant context aggregator at the end of the pipeline pushes context frames upstream so they flow back to the LLM.
### Pushing Frames
Within a frame processor, call `push_frame()` to send a frame to the next processor:
```python theme={null}
# Push downstream (default)
await self.push_frame(frame, FrameDirection.DOWNSTREAM)
# Push upstream
await self.push_frame(frame, FrameDirection.UPSTREAM)
```
### Broadcasting Frames
To send a frame in **both** directions simultaneously, use `broadcast_frame()`:
```python theme={null}
# Create and push instances upstream and downstream
await self.broadcast_frame(UserStartedSpeakingFrame)
```
Each direction receives its own frame instance, linked by `broadcast_sibling_id`.
To broadcast an existing frame instance (when you are not the original creator of the frame), use `broadcast_frame_instance()`:
```python theme={null}
# Broadcast an existing frame instance in both directions
await self.broadcast_frame_instance(frame)
```
This creates two new instances by shallow-copying all fields from the original frame except `id` and `name`, which get fresh values.
Prefer `broadcast_frame()` when possible, as it is more efficient.
## Mixins
Mixins add cross-cutting behavior or shared data fields to frames without changing their base type.
### UninterruptibleFrame
Occasionally a `DataFrame` or `ControlFrame` is too important to discard during an interruption. Adding the `UninterruptibleFrame` mixin protects it: the frame stays in internal queues and any task processing it will not be cancelled.
```python theme={null}
@dataclass
class FunctionCallResultFrame(DataFrame, UninterruptibleFrame):
"""Must be delivered even if the user interrupts."""
...
```
Examples: `EndFrame`, `StopFrame`, `FunctionCallResultFrame`, `FunctionCallInProgressFrame`
### AudioRawFrame
Carries raw audio fields shared by both input and output audio frames.
Raw audio bytes in PCM format.
Audio sample rate in Hz (e.g., 16000).
Number of audio channels (e.g., 1 for mono).
Number of audio frames. Calculated automatically from the audio data.
### ImageRawFrame
Carries raw image fields shared by both input and output image frames.
Raw image bytes.
Image dimensions as (width, height).
Image format (e.g., `"RGB"`, `"RGBA"`).
## Common Patterns
Pipecat prefers pushing frames over calling methods directly between processors. Routing data through the pipeline as frames ensures correct processing order, which is critical for real-time use cases.
Most frames are produced and consumed by Pipecat's built-in services. The patterns below cover the frames you're most likely to push yourself in application code.
### Starting a Conversation
Add an initial message to the context, then push `LLMRunFrame` to kick off processing:
```python theme={null}
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
context.add_message({"role": "user", "content": "Please introduce yourself."})
await task.queue_frames([LLMRunFrame()])
```
### Injecting a Prompt
`LLMMessagesAppendFrame` adds messages to the context without replacing what's already there. Set `run_llm=True` to trigger a response immediately:
```python theme={null}
message = {
"role": "user",
"content": "The user has been quiet. Ask if they're still there.",
}
await aggregator.push_frame(LLMMessagesAppendFrame([message], run_llm=True))
```
### Speaking Without the LLM
`TTSSpeakFrame` sends text directly to the TTS service as a standalone utterance, bypassing the LLM entirely:
```python theme={null}
@llm.event_handler("on_function_calls_started")
async def on_function_calls_started(service, function_calls):
await tts.queue_frame(TTSSpeakFrame("Let me check on that."))
```
### Ending a Conversation
Push `EndTaskFrame` upstream to gracefully shut down the pipeline. Pair it with a `TTSSpeakFrame` to say goodbye first:
```python theme={null}
await aggregator.push_frame(
TTSSpeakFrame("It seems like you're busy. Have a nice day!")
)
await aggregator.push_frame(EndTaskFrame(), FrameDirection.UPSTREAM)
```
### Changing Service Settings at Runtime
Push settings frames to adjust LLM, TTS, or STT configuration mid-conversation:
```python theme={null}
await task.queue_frame(
LLMUpdateSettingsFrame(delta=OpenAILLMService.Settings(temperature=0.1))
)
```
### Updating Tools at Runtime
Add or replace available function-calling tools while the conversation is active:
```python theme={null}
new_tools = ToolsSchema(
standard_tools=[weather_function, restaurant_function]
)
await task.queue_frames([LLMSetToolsFrame(tools=new_tools)])
```
### Playing Sound Effects
Load audio files and push `OutputAudioRawFrame` directly from a custom processor:
```python theme={null}
with wave.open("ding.wav") as f:
ding = OutputAudioRawFrame(f.readframes(-1), f.getframerate(), f.getnchannels())
class SoundEffect(FrameProcessor):
async def process_frame(self, frame, direction):
await super().process_frame(frame, direction)
if isinstance(frame, LLMFullResponseEndFrame):
await self.push_frame(ding)
await self.push_frame(frame, direction)
```
### Reacting to LLM Response Boundaries
`LLMFullResponseStartFrame` and `LLMFullResponseEndFrame` bracket every LLM response. Custom processors can watch for these to trigger side effects:
```python theme={null}
class ResponseLogger(FrameProcessor):
async def process_frame(self, frame, direction):
await super().process_frame(frame, direction)
if isinstance(frame, LLMFullResponseStartFrame):
logger.info("LLM response started")
elif isinstance(frame, LLMFullResponseEndFrame):
logger.info("LLM response finished")
await self.push_frame(frame, direction)
```
## Frame Type Reference
The individual reference pages below document every frame class, organized by function:
Audio, image, text, transcription, and transport message frames that carry
content through the pipeline.
Pipeline lifecycle, LLM response boundaries, TTS state, service settings,
and filter/mixer configuration.
Interruptions, user/bot speaking state, VAD events, errors, metrics, and raw
input frames.
LLM context frame, function calling helper dataclasses, and links to
LLM-related frames on other pages.
# System Frames
Source: https://docs.pipecat.ai/api-reference/server/frames/system-frames
Reference for SystemFrame types: pipeline lifecycle, interruptions, speaking state, input, and diagnostics
SystemFrames have higher priority than DataFrames and ControlFrames and are never cancelled during user interruptions. They are queued and processed in order with other SystemFrames. They carry signals that must always be delivered: pipeline startup and teardown, error notifications, user input, and speaking state changes. See the [frames overview](/api-reference/server/frames/overview) for base class details, mixin fields, and frame properties common to all frames.
## Pipeline Lifecycle
### StartFrame
The first frame pushed into a pipeline, initializing all processors. Every processor receives this before any DataFrames or ControlFrames arrive.
Input audio sample rate in Hz.
Output audio sample rate in Hz.
Whether user interruptions are allowed. Deprecated since 0.0.99: use
interruption strategies instead.
Enable performance metrics collection from processors.
Enable tracing for pipeline execution.
Enable usage metrics (token counts, API calls) from services.
List of interruption strategies for the pipeline. Deprecated since 0.0.99.
When `True`, only report time-to-first-byte for the initial response rather
than every response.
Optional tracing context for distributed tracing integration.
### CancelFrame
Stops the pipeline immediately, skipping any queued non-SystemFrames. Use this when you need to abort without waiting for pending work to drain. For example, when the user has left the session.
Optional reason for the cancellation.
## Errors
### ErrorFrame
Carries an error notification, typically pushed upstream so earlier processors can react.
Human-readable error message.
Whether this error is fatal and requires the bot to shut down.
The processor that raised the error.
The underlying exception, if one was caught.
### FatalErrorFrame
An unrecoverable error requiring the bot to shut down. The `fatal` field is always `True`.
Inherits from `ErrorFrame`.
## Processor Pause/Resume (Urgent)
These are the `SystemFrame` variants of `FrameProcessorPauseFrame` and `FrameProcessorResumeFrame`. As SystemFrames, they flow through the high-priority input queue rather than the process queue, so they are not blocked by paused state or buffered frames. This makes `FrameProcessorResumeUrgentFrame` the correct way to resume a processor externally — the `ControlFrame` variant (`FrameProcessorResumeFrame`) would get stuck behind any DataFrames that queued up during the pause. See [Control Frames](/api-reference/server/frames/control-frames#processor-pauseresume) for the full explanation.
### FrameProcessorPauseUrgentFrame
Pauses a processor immediately, without waiting for queued frames to drain first.
The processor to pause.
### FrameProcessorResumeUrgentFrame
Resumes a paused processor immediately, releasing buffered frames. Use this instead of `FrameProcessorResumeFrame` when the processor may have frames queued up.
The processor to resume.
## Interruptions
### InterruptionFrame
Interrupts the pipeline, discarding pending DataFrames and ControlFrames. Typically triggered when the user starts speaking during a bot response.
## User Speaking State
### UserStartedSpeakingFrame
Indicates that a user turn has begun. By this point, transcriptions are usually already flowing through the pipeline.
Whether this event was emulated rather than detected by VAD. Deprecated since
0.0.99.
### UserStoppedSpeakingFrame
Marks the end of a user turn. The bot's response is triggered separately by the turn detection system.
Whether this event was emulated rather than detected by VAD. Deprecated since
0.0.99.
### UserSpeakingFrame
Emitted by the VAD processor while the user is actively speaking. Useful for UI feedback or suppressing idle timeouts.
### UserMuteStartedFrame
Broadcast when one or more [user mute strategies](/api-reference/server/utilities/turn-management/user-mute-strategies) activate. User mute temporarily suppresses user input while the bot is speaking to prevent interruptions. While muted, the `LLMUserAggregator` drops incoming user frames (`InputAudioRawFrame`, `TranscriptionFrame`, `InterimTranscriptionFrame`, `UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`, VAD signals, and `InterruptionFrame`). Lifecycle frames (`StartFrame`, `EndFrame`, `CancelFrame`) are never muted.
### UserMuteStoppedFrame
Broadcast when all active [user mute strategies](/api-reference/server/utilities/turn-management/user-mute-strategies) deactivate, allowing user input to be processed again.
## VAD Events
These frames are emitted directly by the Voice Activity Detection (VAD) processor and carry timing metadata. Higher-level speaking-state frames (`UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`) are derived from these.
### VADUserStartedSpeakingFrame
VAD confirmed that speech has started.
Timestamp in seconds when speech onset was detected.
Wall-clock time when the frame was created.
### VADUserStoppedSpeakingFrame
VAD confirmed that speech has ended.
Timestamp in seconds when speech ended.
Wall-clock time when the frame was created.
### SpeechControlParamsFrame
Notifies processors that VAD or turn detection parameters have changed at runtime.
Updated VAD parameters.
Updated turn detection parameters.
## Bot Speaking State
### BotStartedSpeakingFrame
Emitted by the output transport when the bot begins speaking. Broadcast in both directions so processors on either side of the transport can react.
### BotStoppedSpeakingFrame
Emitted by the output transport when the bot finishes speaking. Also broadcast in both directions.
### BotSpeakingFrame
Emitted continuously while the bot is speaking. Processors can use this to suppress idle timeouts or drive visual indicators.
## Connection Status
### BotConnectedFrame
The bot has joined the transport room. Only relevant for SFU-based transports: Daily, LiveKit, HeyGen, and Tavus.
### ClientConnectedFrame
A client or participant has connected to the transport.
## Input Frames
Input frames carry raw data from transport sources into the pipeline. As `SystemFrame`s, they are never discarded during interruptions. Incoming user data must always be processed.
### InputAudioRawFrame
Raw audio received from the transport. Inherits the `audio`, `sample_rate`, `num_channels`, and `num_frames` fields from the [`AudioRawFrame`](/api-reference/server/frames/overview#audiorawframe) mixin.
Inherits from `AudioRawFrame`.
### UserAudioRawFrame
Audio from a specific user in a multi-participant session.
Inherits from `InputAudioRawFrame`.
Identifier for the user who produced this audio.
### InputImageRawFrame
Raw image received from the transport. Inherits `image`, `size`, and `format` from the [`ImageRawFrame`](/api-reference/server/frames/overview#imagerawframe) mixin.
Inherits from `ImageRawFrame`.
### UserImageRawFrame
An image from a specific user, optionally tied to a pending image request.
Inherits from `InputImageRawFrame`.
Identifier for the user who produced this image.
Optional text associated with the image.
Whether to append this image to the LLM context.
The original request frame that triggered this image capture.
### InputTextRawFrame
Text received from the transport, such as a user typing in a chat interface. Inherits the `text` field from `TextFrame`.
Inherits from `TextFrame`.
## DTMF Input
### InputDTMFFrame
A DTMF keypress received from the transport. Inherits the `button` field from the `DTMFFrame` mixin.
Inherits from `DTMFFrame`.
### OutputDTMFUrgentFrame
A DTMF keypress for immediate output, bypassing the normal frame queue.
Inherits from `DTMFFrame`.
## Transport Messages
### InputTransportMessageFrame
A message received from an external transport. The message format is transport-specific.
The transport message payload.
### OutputTransportMessageUrgentFrame
An outbound transport message that bypasses the normal queue for immediate delivery.
The transport message payload.
## Function Calling
### FunctionCallsStartedFrame
Signals that one or more function calls are about to begin executing.
Sequence of function calls that will be executed.
### FunctionCallCancelFrame
Signals that a function call was cancelled, typically due to user interruption when the function's `cancel_on_interruption` flag is set.
Name of the function that was cancelled.
Unique identifier for the cancelled function call.
## User Interaction
### UserImageRequestFrame
Requests an image from a specific user, typically to capture a camera frame for vision processing.
Identifier for the user to capture from.
Optional text prompt associated with the image request.
Whether to append the resulting image to the LLM context.
Specific video source to capture from.
Function name if this request originated from a tool call.
Tool call identifier if this request originated from a tool call.
Callback to invoke with the captured image result.
### STTMuteFrame
Mutes or unmutes the STT service. While muted, incoming audio is not sent to the STT provider.
`True` to mute, `False` to unmute.
### UserIdleTimeoutUpdateFrame
Updates the user idle timeout at runtime. Set to `0` to disable idle detection entirely.
New idle timeout in seconds. `0` disables detection.
## Diagnostics
### MetricsFrame
Performance metrics collected from processors. Emitted when metrics reporting is enabled via `StartFrame`.
List of metrics data entries.
## Service Metadata
### ServiceMetadataFrame
Base metadata frame broadcast by services at startup, providing information about service capabilities and configuration.
Name of the service that emitted this metadata.
### STTMetadataFrame
Metadata from an STT service, including latency characteristics used for turn detection tuning.
Inherits from `ServiceMetadataFrame`.
P99 latency in seconds for time-to-final-segment. Used by turn detectors to
calibrate wait times.
## RTVI
Frames for the [Real-Time Voice Interface (RTVI)](/api-reference/server/rtvi) protocol, which bridges clients and the pipeline. These frames handle custom messaging between the client and server.
### RTVIServerMessageFrame
Sends a server message to the connected client.
The message data to send to the client.
### RTVIClientMessageFrame
A message received from the client, expecting a server response via `RTVIServerResponseFrame`.
Unique identifier for the client message.
The message type.
Optional message data from the client.
### RTVIServerResponseFrame
Responds to an `RTVIClientMessageFrame`. Include the original client message frame to ensure the response is properly correlated. Set the `error` field to respond with an error instead of a normal response.
The original client message this response is for.
Response data to send to the client.
Error message. When set, the client receives an `error-response` instead of a
`server-response`.
## Task Frames
Task frames provide a system-priority mechanism for requesting pipeline actions from outside the normal frame flow. They are converted into their corresponding standard frames when processed.
### TaskSystemFrame
Base class for system-priority task frames.
### CancelTaskFrame
Requests immediate pipeline cancellation. Converted to a `CancelFrame` when processed by the pipeline.
Inherits from `TaskSystemFrame`.
Optional reason for the cancellation request.
### InterruptionTaskFrame
Requests a pipeline interruption. Converted to an `InterruptionFrame` when processed.
Inherits from `TaskSystemFrame`.
# Pipecat Server Overview
Source: https://docs.pipecat.ai/api-reference/server/introduction
API reference for the Pipecat Python framework.
This is the API reference for the server-side Pipecat Python framework. It covers the services, utilities, pipeline components, and frame types you use to build voice and multimodal AI agents.
## What's in This Reference
* **Services**: Integrations for STT, LLM, TTS, speech-to-speech, image generation, video, transports, and more. Over 100 providers supported.
* **Utilities**: Frame processors, audio filters, observers, turn detection, context summarization, MCP, and other helpers.
* **Pipeline**: Configuration, task management, idle detection, and parallel pipelines.
* **Frames**: Data frames, control frames, system frames, and LLM frames that flow through the pipeline.
* **Events**: Frame processor and service events for hooking into the pipeline lifecycle.
## Explore
Browse the full list of AI service integrations and their install commands
Understand the data, control, system, and LLM frames that flow through
pipelines
Hook into pipeline lifecycle with frame processor and service events
Auto-generated API reference with every class, method, and parameter
# Pipeline Heartbeats
Source: https://docs.pipecat.ai/api-reference/server/pipeline/heartbeats
Monitor pipeline health with heartbeat frames
## Overview
Pipeline heartbeats provide a way to monitor the health of your pipeline by sending periodic heartbeat frames through the system. When enabled, the pipeline will send heartbeat frames every second and monitor their progress through the pipeline.
## Enabling Heartbeats
Heartbeats can be enabled by setting `enable_heartbeats` to `True` in the `PipelineParams`:
```python theme={null}
from pipecat.pipeline.task import PipelineParams, PipelineTask
pipeline = Pipeline([...])
params = params=PipelineParams(enable_heartbeats=True)
task = PipelineTask(pipeline, params)
```
## How It Works
When heartbeats are enabled:
1. The pipeline sends a `HeartbeatFrame` every second
2. The frame traverses through all processors in the pipeline, from source to sink
3. The pipeline monitors how long it takes for heartbeat frames to complete their journey
4. If a heartbeat frame isn't received within 10 seconds, a warning is logged
## Monitoring Output
The system will log:
* Trace-level logs showing heartbeat processing time
* Warning messages if heartbeats aren't received within the monitoring window
Example warning message:
```
WARNING PipelineTask#1: heartbeat frame not received for more than 5.0 seconds
```
## Use Cases
Heartbeat monitoring is useful for:
* Detecting pipeline stalls or blockages
* Monitoring processing latency through the pipeline
* Identifying performance issues in specific processors
* Ensuring the pipeline remains responsive
## Configuration
The heartbeat system uses two timing values:
* **Interval** (default 1.0s) — how often heartbeat frames are sent. Configurable via `heartbeats_period_secs` in `PipelineParams`.
* **Monitor window** (10x the interval) — how long to wait before logging a warning if no heartbeat is received.
The heartbeat interval is configurable via the `heartbeats_period_secs` parameter in `PipelineParams`. The monitor window is always 10x the interval.
# ParallelPipeline
Source: https://docs.pipecat.ai/api-reference/server/pipeline/parallel-pipeline
Run multiple pipeline branches in parallel, with synchronized inputs and outputs for complex flows
## Overview
`ParallelPipeline` allows you to create multiple independent processing branches that run simultaneously, sharing input and coordinating output. It's particularly useful for multi-agent systems, parallel stream processing, and creating redundant service paths.
Each branch receives the same downstream frames, processes them independently, and the results are merged back into a single stream. System frames (like `StartFrame` and `EndFrame`) are synchronized across all branches.
## Constructor Parameters
Multiple lists of processors, where each list defines a parallel branch. All
branches execute simultaneously when frames flow through the pipeline.
## Usage Examples
### Multi-Agent Conversation
Create a conversation with two AI agents that can interact with the user independently:
```python theme={null}
pipeline = Pipeline([
transport.input(),
ParallelPipeline(
# Agent 1: Customer service representative
[
stt_1,
context_aggregator.user_a(),
llm_agent_1,
tts_agent_1,
],
# Agent 2: Technical specialist
[ stt_2,
context_aggregator.user_b(),
llm_agent_2,
tts_agent_2,
]
),
transport.output(),
])
```
### Redundant Services with Failover
Set up redundant services with automatic failover:
```python theme={null}
pipeline = Pipeline([
transport.input(),
stt,
ParallelPipeline(
# Primary LLM service
[ gate_primary,
primary_llm,
error_detector,
],
# Backup LLM service (used only if primary fails)
[ gate_backup,
backup_llm,
fallback_processor,
]
),
tts,
transport.output(),
])
```
### Cross-Branch Communication
Using Producer/Consumer processors to share data between branches:
```python theme={null}
# Create producer/consumer pair for cross-branch communication
frame_producer = ProducerProcessor(filter=is_important_frame)
frame_consumer = ConsumerProcessor(producer=frame_producer)
pipeline = Pipeline([
transport.input(),
ParallelPipeline(
# Branch that generates important frames
[
stt,
llm,
tts,
frame_producer, # Share frames with other branch
],
# Branch that consumes those frames
[
frame_consumer, # Receive frames from other branch
llm, # Speech to Speech LLM (audio in)
]
),
transport.output(),
])
```
## How It Works
1. `ParallelPipeline` adds special source and sink processors to each branch
2. System frames (like `StartFrame` and `EndFrame`) are sent to all branches
3. Other frames flow downstream to all branch sources
4. Results from each branch are collected at the sinks
5. The pipeline ensures proper frame ordering:
* `StartFrame` is pushed before any buffered frames (ensuring downstream processors initialize first)
* `EndFrame` and `CancelFrame` are pushed after buffered frames are flushed (ensuring pending work completes before shutdown)
# Pipeline Idle Detection
Source: https://docs.pipecat.ai/api-reference/server/pipeline/pipeline-idle-detection
Automatically detect and handle idle pipelines with no bot activity
## Overview
Pipeline idle detection monitors activity in your pipeline and can automatically cancel tasks when no meaningful interactions are occurring. This helps prevent pipelines from running indefinitely when a conversation has naturally ended but wasn't properly terminated.
## How It Works
The system monitors specific "activity frames" that indicate the bot is actively engaged in the conversation. By default, these are:
* `BotSpeakingFrame` - When the bot is speaking
* `UserSpeakingFrame` - When the user is speaking
If no activity frames are detected within the configured timeout period (5 minutes by default), the system considers the pipeline idle and can automatically terminate it.
Idle detection only starts after the pipeline has begun processing frames. The
idle timer resets whenever an activity frame (as specified in
`idle_timeout_frames`) is received.
## Configuration
You can configure idle detection behavior when creating a `PipelineTask`:
```python theme={null}
from pipecat.pipeline.task import PipelineParams, PipelineTask
# Default configuration - cancel after 5 minutes of inactivity
task = PipelineTask(pipeline)
# Custom configuration
task = PipelineTask(
pipeline,
idle_timeout_secs=600, # 10 minute timeout
idle_timeout_frames=(BotSpeakingFrame,), # Only monitor bot speaking
cancel_on_idle_timeout=False, # Don't auto-cancel, just notify
)
```
## Configuration Parameters
Timeout in seconds before considering the pipeline idle. Set to `None` to
disable idle detection.
Frame types that should prevent the pipeline from being considered idle.
Whether to automatically cancel the pipeline task when idle timeout is
reached.
## Handling Idle Timeouts
You can respond to idle timeout events by adding an event handler:
```python theme={null}
@task.event_handler("on_idle_timeout")
async def on_idle_timeout(task):
logger.info("Pipeline has been idle for too long")
# Perform any custom cleanup or logging
# Note: If cancel_on_idle_timeout=True, the pipeline will be cancelled after this handler runs
```
## Example Implementation
Here's a complete example showing how to configure idle detection with custom handling:
```python theme={null}
from pipecat.frames.frames import EndFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
# Create pipeline
pipeline = Pipeline([...])
# Configure task with custom idle settings
task = PipelineTask(
pipeline,
idle_timeout_secs=180, # 3 minutes
cancel_on_idle_timeout=False # Don't auto-cancel
)
# Add event handler for idle timeout
@task.event_handler("on_idle_timeout")
async def on_idle_timeout(task):
logger.info("Conversation has been idle for 3 minutes")
# Add a farewell message
await task.queue_frame(TTSSpeakFrame("I haven't heard from you in a while. Goodbye!"))
# Then end the conversation gracefully
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
```
# PipelineParams
Source: https://docs.pipecat.ai/api-reference/server/pipeline/pipeline-params
Configure pipeline execution with PipelineParams
## Overview
The `PipelineParams` class provides a structured way to configure various aspects of pipeline execution. These parameters control behaviors like audio settings, metrics collection, heartbeat monitoring, and interruption handling.
## Basic Usage
```python theme={null}
from pipecat.pipeline.task import PipelineParams, PipelineTask
# Create with default parameters
params = PipelineParams()
# Or customize specific parameters
params = PipelineParams(
audio_in_sample_rate=16000,
enable_metrics=True
)
# Pass to PipelineTask
pipeline = Pipeline([...])
task = PipelineTask(pipeline, params=params)
```
## Available Parameters
DEPRECATED: This parameter is deprecated. Configure interruption behavior
via [User Turn
Strategies](/api-reference/server/utilities/turn-management/user-turn-strategies) instead.
See the `enable_interruptions` parameter on start strategies.
Whether to allow pipeline interruptions. When enabled, a user's speech will
immediately interrupt the bot's response.
Input audio sample rate in Hz.
Setting the `audio_in_sample_rate` as a `PipelineParam` sets the input sample
rate for all corresponding services in the pipeline.
Output audio sample rate in Hz.
Setting the `audio_out_sample_rate` as a `PipelineParam` sets the output
sample rate for all corresponding services in the pipeline.
Whether to enable heartbeat monitoring to detect pipeline stalls. See
[Heartbeats](/api-reference/server/pipeline/heartbeats) for details.
Period between heartbeats in seconds (when heartbeats are enabled).
Whether to enable metrics collection for pipeline performance.
Whether to enable usage metrics tracking.
Whether to report only initial time to first byte metric.
Whether to send initial empty metrics frame at pipeline start.
Additional metadata to include in the StartFrame.
## Common Configurations
### Audio Processing Configuration
You can set the audio input and output sample rates in the `PipelineParams` to set the sample rate for all input and output services in the pipeline. This acts as a convenience to avoid setting the sample rate for each service individually. Note, if services are set individually, they will supersede the values set in `PipelineParams`.
```python theme={null}
params = PipelineParams(
audio_in_sample_rate=8000, # Lower quality input audio
audio_out_sample_rate=8000 # High quality output audio
)
```
### Performance Monitoring Configuration
Pipeline heartbeats provide a way to monitor the health of your pipeline by sending periodic heartbeat frames through the system. When enabled, the pipeline will send heartbeat frames every second and monitor their progress through the pipeline.
```python theme={null}
params = PipelineParams(
enable_heartbeats=True,
heartbeats_period_secs=2.0, # Send heartbeats every 2 seconds
enable_metrics=True
)
```
## How Parameters Are Used
The parameters you set in `PipelineParams` are passed to various components of the pipeline:
1. **StartFrame**: Many parameters are included in the StartFrame that initializes the pipeline
2. **Metrics Collection**: Metrics settings configure what performance data is gathered
3. **Heartbeat Monitoring**: Controls the pipeline's health monitoring system
4. **Audio Processing**: Sample rates affect how audio is processed throughout the pipeline
## Complete Example
```python theme={null}
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.observers.file_observer import FileObserver
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.runner import PipelineRunner
# Create comprehensive parameters
params = PipelineParams(
audio_in_sample_rate=8000,
audio_out_sample_rate=8000,
enable_heartbeats=True,
enable_metrics=True,
enable_usage_metrics=True,
heartbeats_period_secs=1.0,
report_only_initial_ttfb=False,
start_metadata={
"conversation_id": "conv-123",
"session_data": {
"user_id": "user-456",
"start_time": "2023-10-25T14:30:00Z"
}
}
)
# Create pipeline and task
pipeline = Pipeline([...])
task = PipelineTask(
pipeline,
params=params,
observers=[FileObserver("pipeline_logs.jsonl")]
)
# Run the pipeline
runner = PipelineRunner()
await runner.run(task)
```
## Additional Information
* Parameters are immutable once the pipeline starts
* The `start_metadata` dictionary can contain any serializable data
* For metrics collection to work properly, `enable_metrics` must be set to `True`
# PipelineTask
Source: https://docs.pipecat.ai/api-reference/server/pipeline/pipeline-task
Manage pipeline execution and lifecycle with PipelineTask
## Overview
`PipelineTask` is the central class for managing pipeline execution. It handles the lifecycle of the pipeline, processes frames in both directions, manages task cancellation, and provides event handlers for monitoring pipeline activity.
## Basic Usage
```python theme={null}
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
# Create a pipeline
pipeline = Pipeline([...])
# Create a task with the pipeline
task = PipelineTask(pipeline)
# Queue frames for processing
await task.queue_frame(TTSSpeakFrame("Hello, how can I help you today?"))
# Run the pipeline
runner = PipelineRunner()
await runner.run(task)
```
## Constructor Parameters
The pipeline to execute.
Configuration parameters for the pipeline. See
[PipelineParams](/api-reference/server/pipeline/pipeline-params) for details.
List of observers for monitoring pipeline execution. See
[Observers](/api-reference/server/utilities/observers/observer-pattern) for details.
Clock implementation for timing operations.
Custom task manager for handling asyncio tasks. If None, a default TaskManager
is used.
Whether to check for processors' tasks finishing properly.
Timeout in seconds before considering the pipeline idle. Set to None to
disable idle detection. See [Pipeline Idle
Detection](/api-reference/server/pipeline/pipeline-idle-detection) for details.
Frame types that should prevent the pipeline from being considered idle. See
[Pipeline Idle Detection](/api-reference/server/pipeline/pipeline-idle-detection) for
details.
Whether to automatically cancel the pipeline task when idle timeout is
reached. See [Pipeline Idle
Detection](/api-reference/server/pipeline/pipeline-idle-detection) for details.
Whether to enable OpenTelemetry tracing. See [The OpenTelemetry
guide](/api-reference/server/utilities/opentelemetry) for details.
Whether to enable turn tracking. See [The OpenTelemetry
guide](/api-reference/server/utilities/opentelemetry) for details.
Custom ID for the conversation. If not provided, a UUID will be generated. See
[The OpenTelemetry guide](/api-reference/server/utilities/opentelemetry) for details.
Any additional attributes to add to top-level OpenTelemetry conversation span.
See [The OpenTelemetry guide](/api-reference/server/utilities/opentelemetry) for details.
## Methods
### Task Lifecycle Management
Starts and manages the pipeline execution until completion or cancellation. Typically called via `PipelineRunner` rather than directly:
```python theme={null}
runner = PipelineRunner()
await runner.run(task)
```
Sends an EndFrame to the pipeline to gracefully stop the task after all queued
frames have been processed.
```python theme={null}
await task.stop_when_done()
```
Stops the running pipeline immediately by sending a CancelFrame.
```python theme={null}
await task.cancel()
```
Returns whether the task has finished (all processors have stopped).
```python theme={null}
if task.has_finished(): print("Task is complete")
```
### Frame Management
Queues a single frame to be pushed through the pipeline.
Downstream frames are pushed from the beginning of the pipeline. Upstream frames are pushed from the end of the pipeline.
**Parameters:**
| Parameter | Type | Default | Description |
| ----------- | ---------------- | --------------------------- | ------------------------------- |
| `frame` | `Frame` | (required) | The frame to be processed |
| `direction` | `FrameDirection` | `FrameDirection.DOWNSTREAM` | The direction to push the frame |
```python theme={null}
# Push a frame downstream (default behavior)
await task.queue_frame(TTSSpeakFrame("Hello!"))
# Push a frame upstream from the end of the pipeline
from pipecat.processors.frame_processor import FrameDirection
await task.queue_frame(UserStoppedSpeakingFrame(), direction=FrameDirection.UPSTREAM)
```
Queues multiple frames to be pushed through the pipeline.
Downstream frames are pushed from the beginning of the pipeline. Upstream frames are pushed from the end of the pipeline.
**Parameters:**
| Parameter | Type | Default | Description |
| ----------- | ----------------------------------------- | --------------------------- | --------------------------------------- |
| `frames` | `Iterable[Frame] \| AsyncIterable[Frame]` | (required) | An iterable or async iterable of frames |
| `direction` | `FrameDirection` | `FrameDirection.DOWNSTREAM` | The direction to push the frames |
```python theme={null}
# Push frames downstream (default behavior)
frames = [TTSSpeakFrame("Hello!"), TTSSpeakFrame("How are you?")]
await task.queue_frames(frames)
# Push frames upstream from the end of the pipeline
from pipecat.processors.frame_processor import FrameDirection
frames = [TranscriptionFrame("user input"), UserStoppedSpeakingFrame()]
await task.queue_frames(frames, direction=FrameDirection.UPSTREAM)
```
## Event Handlers
PipelineTask provides event handlers for monitoring pipeline lifecycle and frame flow. Register handlers using the `@event_handler` decorator.
| Event | Description |
| ----------------------------- | --------------------------------------------------- |
| `on_pipeline_started` | Pipeline has started processing |
| `on_pipeline_finished` | Pipeline reached a terminal state |
| `on_pipeline_error` | An error frame reached the pipeline task |
| `on_frame_reached_upstream` | A filtered frame type reached the pipeline source |
| `on_frame_reached_downstream` | A filtered frame type reached the pipeline sink |
| `on_idle_timeout` | No activity detected within the idle timeout period |
### on\_pipeline\_started
Fired when the `StartFrame` has been processed by all processors in the pipeline. This indicates the pipeline is fully initialized and running.
```python theme={null}
@task.event_handler("on_pipeline_started")
async def on_pipeline_started(task, frame):
print("Pipeline is running!")
```
**Parameters:**
| Parameter | Type | Description |
| --------- | -------------- | ---------------------------------- |
| `task` | `PipelineTask` | The pipeline task instance |
| `frame` | `StartFrame` | The start frame that was processed |
### on\_pipeline\_finished
Fired after the pipeline reaches any terminal state. This includes normal completion (`EndFrame`), explicit stop (`StopFrame`), or cancellation (`CancelFrame`). Use this event for cleanup, logging, or post-processing.
```python theme={null}
@task.event_handler("on_pipeline_finished")
async def on_pipeline_finished(task, frame):
if isinstance(frame, EndFrame):
print("Pipeline ended normally")
elif isinstance(frame, CancelFrame):
print("Pipeline was cancelled")
elif isinstance(frame, StopFrame):
print("Pipeline was stopped")
```
**Parameters:**
| Parameter | Type | Description |
| --------- | -------------- | -------------------------------------------------------------- |
| `task` | `PipelineTask` | The pipeline task instance |
| `frame` | `Frame` | The terminal frame (`EndFrame`, `StopFrame`, or `CancelFrame`) |
The deprecated events `on_pipeline_ended`, `on_pipeline_stopped`, and
`on_pipeline_cancelled` still work but will emit a deprecation warning. Use
`on_pipeline_finished` and inspect the frame type if you need to distinguish
between terminal states.
### on\_pipeline\_error
Fired when an `ErrorFrame` reaches the pipeline task (upstream from a processor). If the error is fatal, the pipeline will be cancelled after this handler runs.
```python theme={null}
@task.event_handler("on_pipeline_error")
async def on_pipeline_error(task, frame):
print(f"Pipeline error: {frame.error}")
if frame.fatal:
print("Fatal error — pipeline will be cancelled")
```
**Parameters:**
| Parameter | Type | Description |
| --------- | -------------- | ---------------------------------- |
| `task` | `PipelineTask` | The pipeline task instance |
| `frame` | `ErrorFrame` | The error frame with error details |
### on\_frame\_reached\_upstream
Fired when a frame of a registered type reaches the pipeline source (the start of the pipeline). You must configure which frame types trigger this event using `set_reached_upstream_filter()` or `add_reached_upstream_filter()`.
```python theme={null}
from pipecat.frames.frames import TranscriptionFrame
# Configure which frame types to monitor
task.set_reached_upstream_filter((TranscriptionFrame,))
@task.event_handler("on_frame_reached_upstream")
async def on_frame_reached_upstream(task, frame):
print(f"Frame reached upstream: {frame}")
```
**Parameters:**
| Parameter | Type | Description |
| --------- | -------------- | ------------------------------------------ |
| `task` | `PipelineTask` | The pipeline task instance |
| `frame` | `Frame` | The frame that reached the pipeline source |
This event only fires for frame types you've explicitly registered. By
default, no frame types are monitored. This is for efficiency — checking every
frame would be wasteful when you typically only care about specific types.
### on\_frame\_reached\_downstream
Fired when a frame of a registered type reaches the pipeline sink (the end of the pipeline). You must configure which frame types trigger this event using `set_reached_downstream_filter()` or `add_reached_downstream_filter()`.
```python theme={null}
from pipecat.frames.frames import TTSAudioRawFrame
# Configure which frame types to monitor
task.set_reached_downstream_filter((TTSAudioRawFrame,))
@task.event_handler("on_frame_reached_downstream")
async def on_frame_reached_downstream(task, frame):
print(f"Frame reached downstream: {frame}")
```
**Parameters:**
| Parameter | Type | Description |
| --------- | -------------- | ---------------------------------------- |
| `task` | `PipelineTask` | The pipeline task instance |
| `frame` | `Frame` | The frame that reached the pipeline sink |
### on\_idle\_timeout
Fired when no activity frames (as specified by `idle_timeout_frames`) have been received within the idle timeout period. See [Pipeline Idle Detection](/api-reference/server/pipeline/pipeline-idle-detection) for configuration details.
```python theme={null}
@task.event_handler("on_idle_timeout")
async def on_idle_timeout(task):
print("Pipeline has been idle too long")
await task.queue_frame(TTSSpeakFrame("Are you still there?"))
```
**Parameters:**
| Parameter | Type | Description |
| --------- | -------------- | -------------------------- |
| `task` | `PipelineTask` | The pipeline task instance |
If `cancel_on_idle_timeout` is `True` (the default), the pipeline will be
automatically cancelled after this handler runs. Set it to `False` if you want
to handle idle timeouts yourself.
# Google RTVI Observer
Source: https://docs.pipecat.ai/api-reference/server/rtvi/google-rtvi-observer
Adding support for sending search responses to RTVI clients
## Overview
The `GoogleRTVIObserver` extends the [base `RTVIObserver` type](./rtvi-observer), to add support for the `bot-llm-search-response` message type and providing clients with the search results from the `GoogleLLMService`. See [this section on Search Grounding](/api-reference/server/services/llm/google#search-grounding) for more details.
Complete API documentation and method details
Official Google Gemini API documentation and features
Working example using Google's search grounding to ask about current events
## Installation
To use `GoogleRTVIObserver`, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[google]"
```
You'll also need to follow setup instructions for the [Google LLM Service](/api-reference/server/services/llm/google#installation) to ensure the `GoogleLLMService` is properly configured.
## Frame Translation
The observer maps the `LLMSearchResponseFrame` to the `RTVIBotLLMSearchResponseMessage`. Check out the [RTVI Standard Reference](/client/rtvi-standard#bot-llm-search-response-🤖) for details on the message format.
## Usage Example
To use `GoogleRTVIObserver`, pass a `GoogleRTVIProcessor` to your `PipelineTask`. This automatically creates and attaches the Google-specific observer:
```python theme={null}
from pipecat.services.google.rtvi import GoogleRTVIProcessor
pipeline = Pipeline([
transport.input(),
stt,
# Other processors...
])
task = PipelineTask(
pipeline,
rtvi_processor=GoogleRTVIProcessor(),
)
```
# RTVI (Real-Time Voice Interaction)
Source: https://docs.pipecat.ai/api-reference/server/rtvi/introduction
Build real-time voice and multimodal applications with Pipecat’s RTVI protocol
Pipecat's RTVI (Real-Time Voice Interaction) protocol provides a standardized communication layer between clients and servers for building real-time voice and multimodal applications. It handles the synchronization of user and bot interactions, transcriptions, LLM processing, and text-to-speech delivery.
This page provides an overview of RTVI from the server's perspective and how to use it in your bot applications.
A complete specification of the RTVI protocol for client-server communication.
## Architecture
RTVI operates with two primary components:
1. [**RTVIProcessor**](./rtvi-processor) - A frame processor residing in the pipeline that serves as the entry point for sending and receiving messages to/from the client.
2. [**RTVIObserver**](./rtvi-observer) - An observer that monitors pipeline events and translates them into client-compatible messages, handling:
* Speaking state changes
* Transcription updates
* LLM responses
* TTS events
* Performance metrics
RTVI is enabled by default. When you create a `PipelineTask`, it automatically
adds `RTVIProcessor` to the start of your pipeline and registers an
`RTVIObserver`. The default `on_client_ready` handler calls `set_bot_ready()`
automatically.
## Basic Example
With automatic RTVI setup, your pipeline code can focus on core functionality:
```python theme={null}
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
# Add the RTVIObserver to your pipeline task
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
)
# Access the RTVI processor via task.rtvi
@task.rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
# set_bot_ready() is called automatically, add custom logic here
await task.queue_frames([LLMRunFrame()])
# Handle participant disconnection
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
# Run the pipeline
runner = PipelineRunner()
await runner.run(task)
```
## Customizing RTVI
You can customize RTVI behavior through `PipelineTask` parameters:
```python theme={null}
from pipecat.processors.frameworks.rtvi import RTVIProcessor, RTVIObserverParams
task = PipelineTask(
pipeline,
rtvi_processor=RTVIProcessor(), # Provide your own processor
rtvi_observer_params=RTVIObserverParams(...), # Customize observer
)
```
To disable RTVI entirely:
```python theme={null}
task = PipelineTask(pipeline, enable_rtvi=False)
```
## Protocol Flow
1. Client connects and sends a `client-ready` message
2. Server responds with `bot-ready` and initial configuration
3. Client and server exchange real-time events:
* Speaking state changes (`user/bot-started/stopped-speaking`)
* Transcriptions (`user-transcription/bot-output`)
* LLM processing (`bot-llm-started/stopped`, `bot-llm-text`, `llm-function-call`)
* TTS events (`bot-tts-started/stopped`, `bot-tts-text`, `bot-tts-audio`)
## Key Components
Configure and manage RTVI services, actions, and client communication
Translate internal pipeline events to standardized client messages
## Client Integration
RTVI is implemented in Pipecat client SDKs, providing a high-level API to interact with the protocol. Visit the Pipecat Client SDKs documentation:
Learn how to implement RTVI on the client-side with our JavaScript, React, and
mobile SDKs
# RTVI Observer
Source: https://docs.pipecat.ai/api-reference/server/rtvi/rtvi-observer
Converting pipeline frames to RTVI protocol messages
The `RTVIObserver` translates Pipecat's internal pipeline events into standardized RTVI protocol messages. It monitors frame flow through the pipeline and generates corresponding client messages based on event types.
## Purpose
The `RTVIObserver` primarily serves to convert internal pipeline frames into to client-compatible RTVI messages. It is required for any application using RTVI as the client protocol to ensure proper communication of events such as speech start/stop, user transcript, bot output, metrics, and server messages.
## Automatic Setup
`RTVIObserver` is automatically created and attached when you create a `PipelineTask`. No manual setup is required for standard usage.
To customize the observer's behavior, pass `RTVIObserverParams` to the task:
```python theme={null}
from pipecat.processors.frameworks.rtvi import RTVIObserverParams
task = PipelineTask(
pipeline,
rtvi_observer_params=RTVIObserverParams(
bot_llm_enabled=False,
metrics_enabled=False,
),
)
```
## Configuration
`RTVIObserverParams` accepts the following fields:
Indicates if bot output messages should be sent.
Indicates if the bot's LLM messages should be sent.
Indicates if the bot's TTS messages should be sent.
Indicates if the bot's started/stopped speaking messages should be sent.
Indicates if bot's audio level messages should be sent.
Indicates if the user's LLM input messages should be sent.
Indicates if the user's started/stopped speaking messages should be sent.
Indicates if user mute started/stopped messages (`user-mute-started`,
`user-mute-stopped`) should be sent.
Indicates if user's transcription messages should be sent.
Indicates if user's audio level messages should be sent.
Indicates if metrics messages should be sent.
Indicates if system logs should be sent.
⚠️ **Deprecated**: Indicates if errors messages should be sent.
List of aggregation types to skip sending as tts/output messages.
If using this to avoid sending secure information, be sure to also disable
bot\_llm\_enabled to avoid leaking through LLM messages.
A list of tuples to transform text just before sending it to TTS. Each tuple should be of the form `(aggregation_type, transform_function)`, where `aggregation_type` is a string (or `'*'` for all types), and `transform_function` is a callable that takes `(text, aggregation_type)` and returns the transformed text.
**Example:**
```python theme={null}
def redact_sensitive(text, agg_type):
# Example: redact numbers
import re
return re.sub(r"\d+", "[REDACTED]", text)
bot_output_transforms = [
("credit_card", redact_sensitive), # Only for 'credit_card' type
("*", lambda text, agg_type: text.upper()), # For all types, make uppercase
]
observer = RTVIObserver(
rtvi,
params=RTVIObserverParams(bot_output_transforms=bot_output_transforms),
)
```
How often audio levels should be sent if enabled.
Controls what information is exposed in function call lifecycle events
(`llm-function-call-started`, `llm-function-call-in-progress`,
`llm-function-call-stopped`). Maps function names to security levels, where
`"*"` sets the default for unlisted functions.
**Levels:**
* `DISABLED`: No events emitted for this function
* `NONE`: Events with `tool_call_id` only (most secure when events are needed)
* `NAME`: Adds function name to events
* `FULL`: Adds function name, arguments, and results
```python theme={null}
from pipecat.processors.frameworks.rtvi import (
RTVIFunctionCallReportLevel,
RTVIObserverParams,
)
task = PipelineTask(
pipeline,
rtvi_observer_params=RTVIObserverParams(
function_call_report_level={
"*": RTVIFunctionCallReportLevel.NONE,
"get_weather": RTVIFunctionCallReportLevel.FULL,
},
),
)
```
## Frame Translation
The observer maps Pipecat's internal frames to RTVI protocol messages:
| Pipeline Frame | RTVI Message |
| ----------------------------- | ------------------------------------------- |
| **Speech Events** | |
| `UserStartedSpeakingFrame` | `RTVIUserStartedSpeakingMessage` |
| `UserStoppedSpeakingFrame` | `RTVIUserStoppedSpeakingMessage` |
| `BotStartedSpeakingFrame` | `RTVIBotStartedSpeakingMessage` |
| `BotStoppedSpeakingFrame` | `RTVIBotStoppedSpeakingMessage` |
| **User Mute** | |
| `UserMuteStartedFrame` | `RTVIUserMuteStartedMessage` |
| `UserMuteStoppedFrame` | `RTVIUserMuteStoppedMessage` |
| **Transcription** | |
| `TranscriptionFrame` | `RTVIUserTranscriptionMessage(final=true)` |
| `InterimTranscriptionFrame` | `RTVIUserTranscriptionMessage(final=false)` |
| **Bot Output** | |
| `AggregatedTextFrame` | `RTVIBotOutputMessage` |
| **LLM Processing** | |
| `LLMFullResponseStartFrame` | `RTVIBotLLMStartedMessage` |
| `LLMFullResponseEndFrame` | `RTVIBotLLMStoppedMessage` |
| `LLMTextFrame` | `RTVIBotLLMTextMessage` |
| **TTS Events** | |
| `TTSStartedFrame` | `RTVIBotTTSStartedMessage` |
| `TTSStoppedFrame` | `RTVIBotTTSStoppedMessage` |
| `TTSTextFrame` | `RTVIBotTTSTextMessage` |
| **Function Calls** | |
| `FunctionCallsStartedFrame` | `llm-function-call-started` |
| `FunctionCallInProgressFrame` | `llm-function-call-in-progress` |
| `FunctionCallResultFrame` | `llm-function-call-stopped` |
| **Context/Metrics** | |
| `LLMContextFrame` | `RTVIUserLLMTextMessage` |
| `MetricsFrame` | `RTVIMetricsMessage` |
| `RTVIServerMessageFrame` | `RTVIServerMessage` |
# RTVIProcessor
Source: https://docs.pipecat.ai/api-reference/server/rtvi/rtvi-processor
Core coordinator for RTVI protocol communication
The `RTVIProcessor` manages bidirectional communication between clients and your Pipecat application. It processes client messages, handles service configuration, executes actions, and coordinates function calls.
## Initialization
`RTVIProcessor` is automatically added to your pipeline when you create a `PipelineTask`. Access it via `task.rtvi`:
```python theme={null}
pipeline = Pipeline([
transport.input(),
stt,
# ... other processors ...
transport.output()
])
task = PipelineTask(pipeline)
# Access the processor
rtvi = task.rtvi
```
To provide a custom processor (e.g., for [Google RTVI](./google-rtvi-observer)):
```python theme={null}
from pipecat.services.google.rtvi import GoogleRTVIProcessor
task = PipelineTask(pipeline, rtvi_processor=GoogleRTVIProcessor())
```
To disable RTVI entirely, set `enable_rtvi=False`.
## Readiness Protocol
### Client Ready State
Clients indicate readiness by sending a `client-ready` message, triggering the `on_client_ready` event in the processor:
```python theme={null}
@rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
# Handle client ready state
await rtvi.set_bot_ready()
# Initialize conversation
await task.queue_frames([...])
```
### Bot Ready State
The server must mark the bot as ready before it can process client messages:
```python theme={null}
await rtvi.set_bot_ready()
```
When marked ready, the bot sends a response containing:
* RTVI protocol version
* Current service configuration
* Available actions
## Services
Services represent configurable components of your application that clients can interact with.
### Registering Services
```python theme={null}
# 1. Define option handler
async def handle_voice_option(processor, service, option):
voice_id = option.value
# Apply configuration change
logger.info(f"Voice ID updated to: {voice_id}")
# 2. Create RTVIService
voice_service = RTVIService(
name="voice",
options=[
RTVIServiceOption(
name="voice_id",
type="string",
handler=handle_voice_option
)
]
)
# 3. Register with processor
rtvi.register_service(voice_service)
```
### Option Types
Services support multiple data types for configuration:
```python theme={null}
RTVIServiceOption(
name="temperature",
type="number", # number, string, bool, array, object
handler=handle_temperature
)
```
Option handlers receive:
* The processor instance
* The service name
* The option configuration with new value
## Actions
Actions are server-side functions that clients can trigger with arguments.
### Registering Actions
```python theme={null}
# 1. Define handler function
async def handle_print_message(processor, service, arguments):
message = arguments.get("message", "Default message")
logger.info(f"Print action triggered with message: {message}")
return True
# 2. Create and register RTVIAction
print_action = RTVIAction(
service="conversation",
action="print_message",
arguments=[
RTVIActionArgument(name="message", type="string")
],
result="bool",
handler=handle_print_message
)
rtvi.register_action(print_action)
```
### Action Arguments
Actions can accept typed arguments from clients:
```python theme={null}
search_action = RTVIAction(
service="knowledge",
action="search",
arguments=[
RTVIActionArgument(name="query", type="string"),
RTVIActionArgument(name="limit", type="number")
],
result="array",
handler=handle_search
)
```
## Function Calls
Handle LLM function calls with client interaction:
```python theme={null}
await processor.handle_function_call(
function_name=function_name,
tool_call_id=tool_call_id,
arguments=arguments,
)
await processor.handle_function_call(params)
```
The function call process:
1. LLM requests a function call
2. Processor notifies client with `llm-function-call` message
3. Client executes function and returns result
4. Result is passed back to LLM via `FunctionCallResultFrame`
5. Conversation continues
## Error Handling
Send error messages to clients:
```python theme={null}
# General error
await processor.send_error("Invalid configuration")
# Request-specific error
await processor._send_error_response(request_id, "Invalid action arguments")
```
Error categories:
* Configuration errors
* Action execution errors
* Function call errors
* Protocol errors
* Fatal and non-fatal errors
## Bot Control
Manage bot state and handle interruptions:
```python theme={null}
# Set bot as ready
await processor.set_bot_ready()
# Handle interruptions
await processor.interrupt_bot()
```
## Custom Messaging
Server messages let you push unsolicited data from the server to the client at any time — notifications, status updates, real-time results, etc. They are distinct from **server responses**, which reply to a specific client request (see [Requesting Information from the Server](/client/js/api-reference/messages#requesting-information-from-the-server)).
### Sending server messages
Any `FrameProcessor` in the pipeline can push an `RTVIServerMessageFrame`. The `RTVIObserver` picks it up and delivers it to the client:
```python theme={null}
from pipecat.processors.frameworks.rtvi import RTVIServerMessageFrame
class MyProcessor(FrameProcessor):
async def process_frame(self, frame, direction):
await super().process_frame(frame, direction)
if isinstance(frame, SomeEventFrame):
await self.push_frame(
RTVIServerMessageFrame(
data={"type": "event", "value": frame.value}
)
)
await self.push_frame(frame, direction)
```
`RTVIServerMessageFrame` is a `SystemFrame`, so it propagates immediately
through the pipeline and is not affected by interruptions or queuing.
### Client-side handling
The message arrives at the client with the wire format `{ label: "rtvi-ai", type: "server-message", data: ... }`. Handle it with the `onServerMessage` callback:
```javascript theme={null}
pcClient.onServerMessage((message) => {
console.log("Server message:", message);
// message.data contains whatever you passed on the server
});
```
See [Handling Custom Messages from the Server](/client/js/api-reference/messages#handling-custom-messages-from-the-server) for more details and examples.
# Sentry Metrics
Source: https://docs.pipecat.ai/api-reference/server/services/analytics/sentry
Performance monitoring integration with Sentry for Pipecat frame processors
## Overview
`SentryMetrics` extends `FrameProcessorMetrics` to provide performance monitoring integration with Sentry. It tracks Time to First Byte (TTFB) and processing duration metrics for frame processors, enabling real-time performance monitoring and error tracking for your Pipecat applications.
Pipecat's API methods for Sentry metrics integration
Browse examples using Sentry metrics
Official Sentry Python SDK documentation
Access performance monitoring and error tracking
## Installation
To use Sentry analytics services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[sentry]"
```
## Prerequisites
### Sentry Account Setup
Before using Sentry metrics services, you need:
1. **Sentry Account**: Sign up at [Sentry Platform](https://sentry.io/)
2. **Project Setup**: Create a project and obtain your DSN
3. **SDK Initialization**: Configure Sentry SDK in your application
4. **Metrics Configuration**: Set up performance monitoring and error tracking
### Required Configuration
* **Sentry DSN**: Your project's Data Source Name for authentication
* **Traces Sample Rate**: Configure performance monitoring sampling
* **SDK Initialization**: Initialize Sentry before using metrics
### Key Features
* **Performance Monitoring**: Track TTFB and processing duration metrics
* **Error Tracking**: Automatic error capture and reporting
* **Frame Processor Metrics**: Monitor individual processor performance
* **Real-time Analytics**: Live performance data and alerting
## Configuration
`SentryMetrics` takes no constructor parameters. It automatically detects whether the Sentry SDK has been initialized and logs a warning if it has not.
You must initialize the Sentry SDK in your application before creating
`SentryMetrics`. The metrics collector checks `sentry_sdk.is_initialized()` at
construction time.
## Usage
### Basic Setup
```python theme={null}
import sentry_sdk
from pipecat.processors.metrics.sentry import SentryMetrics
# Initialize Sentry SDK first
sentry_sdk.init(
dsn=os.getenv("SENTRY_DSN"),
traces_sample_rate=1.0,
)
# Create metrics and assign to a service
sentry = SentryMetrics()
tts = SomeTTSService(
metrics=sentry,
)
```
### With Multiple Services
```python theme={null}
sentry_tts = SentryMetrics()
sentry_llm = SentryMetrics()
tts = SomeTTSService(metrics=sentry_tts)
llm = SomeLLMService(metrics=sentry_llm)
```
## Notes
* **SDK initialization required**: Sentry metrics are silently disabled if `sentry_sdk.init()` has not been called. A warning is logged in this case.
* **Transaction types**: The service creates two types of Sentry transactions: `ttfb` for time-to-first-byte tracking and `processing` for frame processing duration.
* **Background processing**: Transactions are completed in a background task to avoid blocking the pipeline.
* **Graceful shutdown**: On cleanup, the service flushes all pending transactions to Sentry with a 5-second timeout.
# Community Integrations
Source: https://docs.pipecat.ai/api-reference/server/services/community-integrations
Community-maintained service integrations for Pipecat
Community Integrations are service integrations built and maintained by developers in the Pipecat community. These are not officially supported by the Pipecat team but are listed here to help you discover what's available.
Want to add your integration? See our [Community Integrations Guide](https://github.com/pipecat-ai/pipecat/blob/main/COMMUNITY_INTEGRATIONS.md).
***
## Knowledge Retrieval
Semantic retrieval services enable context-aware search and retrieval of relevant information.
| Service | Repository | Maintainer(s) |
| -------------------------------- | ---------------------------------------------------------------------------------- | ---------------------------------- |
| [Moss](https://www.usemoss.dev/) | [https://github.com/usemoss/pipecat-moss](https://github.com/usemoss/pipecat-moss) | [Moss](https://github.com/usemoss) |
## Large Language Models
LLMs receive text or audio based input and output a streaming text response.
| Service | Repository | Maintainer(s) |
| -------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | --------------------------------------- |
| [Anannas AI](https://anannas.ai) | [https://github.com/Anannas-AI/anannas-pipecat-integration](https://github.com/Anannas-AI/anannas-pipecat-integration) | [Haleshot](https://github.com/Haleshot) |
## Observability
Observability services enable telemetry and metrics data to be passed to a Open Telemetry backend or another service.
| Service | Repository | Maintainer(s) |
| ---------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- |
| [OpenInference](https://arize-ai.github.io/openinference/) | [openinference-instrumentation-pipecat](https://github.com/Arize-ai/openinference/tree/main/python/instrumentation/openinference-instrumentation-pipecat) | [Arize-ai](https://github.com/Arize-ai/openinference/graphs/contributors) |
| [Finchvox](https://finchvox.dev/) | [https://github.com/finchvox/finchvox](https://github.com/finchvox/finchvox) | [Finchvox](https://github.com/finchvox) |
## Speech-to-Text
Speech-to-Text services receive and audio input and output transcriptions.
| Service | Repository | Maintainer(s) |
| --------------------------------- | ------------------------------------------------------------------------------------------------------------ | ----------------------------------------------- |
| [Uplift AI](https://upliftai.org) | [https://github.com/havkerboi123/pipecat-upliftai-stt](https://github.com/havkerboi123/pipecat-upliftai-stt) | [havkerboi123](https://github.com/havkerboi123) |
## Translation
Translation services enable real-time speech-to-speech and speech-to-text translation.
| Service | Repository | Maintainer(s) |
| ------------------------------------ | -------------------------------------------------------------------------------------------------------- | ----------------------------------------- |
| [Pinch](https://www.startpinch.com/) | [https://github.com/pinch-eng/pipecat-plugins-pinch](https://github.com/pinch-eng/pipecat-plugins-pinch) | [pinch-eng](https://github.com/pinch-eng) |
## Telephony Serializers
Serializers convert between frames and media streams, enabling real-time communication over a websocket.
| Service | Repository | Maintainer(s) |
| -------------------------------- | ---------------------------------------------------------------------------------------- | ------------------------------------- |
| [AwaazAI](https://www.awaaz.ai/) | [https://github.com/awaazde/pipecat-awaazai](https://github.com/awaazde/pipecat-awaazai) | [AwaazAI](https://github.com/awaazde) |
| [Wavix](https://wavix.com/) | [https://github.com/wavix/pipecat-wavix](https://github.com/wavix/pipecat-wavix) | [Wavix](https://github.com/wavix) |
## Text-to-Speech
Text-to-Speech services receive text input and output audio streams or chunks.
| Service | Repository | Maintainer(s) |
| ---------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ | ----------------------------------------------- |
| [Deepdub](https://deepdub.ai/) | [https://github.com/deepdub-ai/pipecat-deepdub-tts](https://github.com/deepdub-ai/pipecat-deepdub-tts) | [deepdub-ai](https://github.com/deepdub-ai) |
| [Murf AI](https://murf.ai/api) | [https://github.com/murf-ai/pipecat-murf-tts](https://github.com/murf-ai/pipecat-murf-tts) | [murf-ai](https://github.com/murf-ai) |
| [Pipecat TTS Cache](https://pypi.org/project/pipecat-tts-cache/) | [https://github.com/omChauhanDev/pipecat-tts-cache](https://github.com/omChauhanDev/pipecat-tts-cache) | [omChauhanDev](https://github.com/omChauhanDev) |
| [Respeecher](https://www.respeecher.com/real-time-tts-api) | [https://github.com/respeecher/pipecat-respeecher](https://github.com/respeecher/pipecat-respeecher) | [respeecher](https://github.com/respeecher) |
| [Typecast](https://typecast.ai/) | [https://github.com/neosapience/pipecat-typecast](https://github.com/neosapience/pipecat-typecast) | [neosapience](https://github.com/neosapience) |
| [Uplift AI](https://upliftai.org) | [https://github.com/havkerboi123/pipecat-upliftai-tts](https://github.com/havkerboi123/pipecat-upliftai-tts) | [havkerboi123](https://github.com/havkerboi123) |
| [Voice.ai](https://voice.ai/) | [https://github.com/voice-ai/voice-ai-pipecat-tts](https://github.com/voice-ai/voice-ai-pipecat-tts) | [voice-ai](https://github.com/voice-ai) |
## Video
Video services enable you to build an avatar where audio and video are synchronized.
| Service | Repository | Maintainer(s) |
| ------------------------------------------------- | ------------------------------------------------------------------------------------ | --------------------------------------- |
| [Anam](https://anam.ai) | [https://github.com/anam-org/pipecat-anam](https://github.com/anam-org/pipecat-anam) | [anam-org](https://github.com/anam-org) |
| [Beyond Presence](https://www.beyondpresence.ai/) | [https://github.com/bey-dev/pipecat-bey](https://github.com/bey-dev/pipecat-bey) | [bey-dev](https://github.com/bey-dev) |
## VAD
VAD services analyze audio input to detect when a user starts and stops speaking.
| Service | Repository | Maintainer(s) |
| ------- | -------------------------------------------------------------------------------------------------------- | --------------------------------------------------- |
| TEN VAD | [https://github.com/rahulsolanki001/pipecat-ten-vad](https://github.com/rahulsolanki001/pipecat-ten-vad) | [rahul solanki](https://github.com/rahulsolanki001) |
## Image Generation
Image generation services receive text inputs and output images.
| Service | Repository | Maintainer(s) |
| ------------------------------- | ---------- | ------------- |
| *No community integrations yet* | | |
## Vision
Vision services receive a streaming video input and output text describing the video input.
| Service | Repository | Maintainer(s) |
| ------------------------------- | ---------- | ------------- |
| *No community integrations yet* | | |
# Azure OpenAI Image Generation
Source: https://docs.pipecat.ai/api-reference/server/services/image-generation/azure
Image generation service implementation using Azure OpenAI's REST API
## Overview
`AzureImageGenServiceREST` provides image generation capabilities using Azure's OpenAI service via REST API. It supports asynchronous image generation with automatic polling for completion and image downloading.
Pipecat's API methods for Azure OpenAI image generation integration
Official Azure OpenAI DALL-E documentation and guides
## Installation
To use Azure OpenAI image generation services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[azure]"
```
## Prerequisites
### Azure Account Setup
Before using Azure OpenAI image generation services, you need:
1. **Azure Account**: Sign up at [Azure Portal](https://portal.azure.com/)
2. **Azure OpenAI Resource**: Create an Azure OpenAI resource
3. **API Key**: Get your API key from the Azure OpenAI resource
4. **Endpoint**: Note your resource endpoint URL
5. **HTTP Session**: Configure aiohttp session for image downloading
### Required Environment Variables
* `AZURE_API_KEY`: Your Azure OpenAI API key for authentication
* `AZURE_ENDPOINT`: Your Azure OpenAI endpoint URL
## Configuration
Azure OpenAI API key for authentication.
Azure OpenAI endpoint URL.
HTTP session for API requests and downloading generated images. You must
create and manage this yourself.
Target size for generated images (e.g., `"1024x1024"`). *Deprecated in
v0.0.105. Use `settings=AzureImageGenServiceREST.Settings(image_size=...)`
instead.*
Image generation model to use. *Deprecated in v0.0.105. Use
`settings=AzureImageGenServiceREST.Settings(model=...)` instead.*
Azure API version string.
Runtime-configurable generation settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `AzureImageGenServiceREST.Settings(...)`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ------------ | ------------- | ----------- | -------------------------------------------------------------------- |
| `model` | `str` | `NOT_GIVEN` | Image generation model identifier. *(Inherited from base settings.)* |
| `image_size` | `str \| None` | `NOT_GIVEN` | Target size for generated images (e.g., `"1024x1024"`). |
`NOT_GIVEN` values are omitted from the request, letting the service use its
own defaults. Only parameters that are explicitly set are included.
## Usage
### Basic Setup
```python theme={null}
import aiohttp
from pipecat.services.azure.image import AzureImageGenServiceREST
async with aiohttp.ClientSession() as session:
image_gen = AzureImageGenServiceREST(
api_key=os.getenv("AZURE_API_KEY"),
endpoint=os.getenv("AZURE_ENDPOINT"),
aiohttp_session=session,
settings=AzureImageGenServiceREST.Settings(
image_size="1024x1024",
),
)
```
The deprecated `model` and `image_size` constructor parameters are replaced by
`Settings` as of v0.0.105. Use `Settings` / `settings=` instead. See the
[Service Settings guide](/pipecat/fundamentals/service-settings) for migration
details.
## Notes
* **HTTP session required**: You must provide an `aiohttp.ClientSession` for both API requests and downloading the generated images.
* **Asynchronous generation**: Azure uses an asynchronous pattern where image generation is submitted and then polled for completion, with a timeout of 120 seconds.
* **REST API**: This service uses Azure's REST API directly (not the OpenAI SDK), requiring an explicit endpoint URL.
# fal
Source: https://docs.pipecat.ai/api-reference/server/services/image-generation/fal
Image generation service implementation using fal's fast SDXL models
## Overview
`FalImageGenService` provides high-speed image generation capabilities using fal's optimized Stable Diffusion XL models. It supports various image sizes, formats, and generation parameters with a focus on fast inference and low-latency image creation.
Pipecat's API methods for fal image generation integration
Browse examples using fal image generation
Official fal API documentation and model guides
Access fast SDXL models and manage API keys
## Installation
To use fal image generation services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[fal]"
```
## Prerequisites
### fal Account Setup
Before using fal image generation services, you need:
1. **fal Account**: Sign up at [fal Platform](https://fal.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Selection**: Choose from available fast SDXL models
4. **HTTP Session**: Configure aiohttp session for image downloading
### Required Environment Variables
* `FAL_KEY`: Your fal API key for authentication
## Configuration
Input parameters for image generation configuration. *Deprecated in v0.0.105.
Use `settings=FalImageGenService.Settings(...)` instead.*
HTTP client session for downloading generated images.
The fal model to use for image generation. *Deprecated in v0.0.105. Use
`settings=FalImageGenService.Settings(model=...)` instead.*
Optional API key for fal. If provided, sets the `FAL_KEY` environment
variable.
Runtime-configurable generation settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `FalImageGenService.Settings(...)`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ----------------------- | ------------- | ----------- | ----------------------------------------------------------------------- |
| `model` | `str` | `NOT_GIVEN` | Fal model identifier. *(Inherited from base settings.)* |
| `seed` | `int \| None` | `NOT_GIVEN` | Random seed for reproducible generation. If `None`, uses a random seed. |
| `num_inference_steps` | `int` | `NOT_GIVEN` | Number of inference steps for generation. |
| `num_images` | `int` | `NOT_GIVEN` | Number of images to generate. |
| `image_size` | `str \| dict` | `NOT_GIVEN` | Image dimensions as a string preset or dict with `width`/`height` keys. |
| `expand_prompt` | `bool` | `NOT_GIVEN` | Whether to automatically expand/enhance the prompt. |
| `enable_safety_checker` | `bool` | `NOT_GIVEN` | Whether to enable content safety filtering. |
| `format` | `str` | `NOT_GIVEN` | Output image format. |
`NOT_GIVEN` values are omitted from the request, letting the service use its
own defaults (`"fal-ai/fast-sdxl"` for model, `8` for num\_inference\_steps,
`"square_hd"` for image\_size, etc.). Only parameters that are explicitly set
are included.
## Usage
### Basic Setup
```python theme={null}
import aiohttp
from pipecat.services.fal import FalImageGenService
async with aiohttp.ClientSession() as session:
image_gen = FalImageGenService(
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
settings=FalImageGenService.Settings(
image_size="landscape_16_9",
),
)
```
### With Custom Settings
```python theme={null}
image_gen = FalImageGenService(
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
settings=FalImageGenService.Settings(
model="fal-ai/fast-sdxl",
image_size={"width": 1024, "height": 768},
num_inference_steps=12,
seed=42,
enable_safety_checker=True,
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Environment variable**: If the `key` constructor parameter is provided, it sets the `FAL_KEY` environment variable automatically.
* **HTTP session required**: You must provide an `aiohttp.ClientSession` for downloading the generated images from fal's URLs.
* **Image size presets**: The `image_size` parameter accepts string presets (e.g., `"square_hd"`, `"landscape_16_9"`) or a dictionary with explicit `width` and `height` values.
# Google Imagen
Source: https://docs.pipecat.ai/api-reference/server/services/image-generation/google
Image generation service implementation using Google's Imagen models
## Overview
`GoogleImageGenService` provides high-quality image generation capabilities using Google's Imagen models. It supports generating multiple images from text prompts with various customization options and advanced prompt understanding for photorealistic and artistic image creation.
Pipecat's API methods for Google Imagen integration
Browse examples using Google Imagen
Official Google Imagen API documentation and guides
Access Imagen models and manage API keys
## Installation
To use Google Imagen services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[google]"
```
## Prerequisites
### Google Cloud Setup
Before using Google Imagen services, you need:
1. **Google Cloud Account**: Set up at [Google Cloud Console](https://console.cloud.google.com/)
2. **API Key**: Generate a Google API key with Vertex AI access
3. **Project Configuration**: Enable Vertex AI API for your project
4. **Model Access**: Ensure access to Imagen generation models
### Required Environment Variables
* `GOOGLE_API_KEY`: Your Google API key for authentication
## Configuration
Google AI API key for authentication.
Configuration parameters for image generation. *Deprecated in v0.0.105. Use
`settings=GoogleImageGenService.Settings(...)` instead.*
HTTP options for the Google AI client.
Runtime-configurable generation settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `GoogleImageGenService.Settings(...)`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ------------------ | ------------- | ----------- | ----------------------------------------------------------------------------- |
| `model` | `str` | `NOT_GIVEN` | Google Imagen model identifier. *(Inherited from base settings.)* |
| `number_of_images` | `int` | `NOT_GIVEN` | Number of images to generate (1-8). |
| `negative_prompt` | `str \| None` | `NOT_GIVEN` | Optional negative prompt to guide what not to include in the generated image. |
`NOT_GIVEN` values are omitted from the request, letting the service use its
own defaults (`"imagen-3.0-generate-002"` for model, `1` for
number\_of\_images). Only parameters that are explicitly set are included.
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.google import GoogleImageGenService
image_gen = GoogleImageGenService(
api_key=os.getenv("GOOGLE_API_KEY"),
)
```
### With Settings
```python theme={null}
image_gen = GoogleImageGenService(
api_key=os.getenv("GOOGLE_API_KEY"),
settings=GoogleImageGenService.Settings(
model="imagen-3.0-generate-002",
number_of_images=2,
negative_prompt="blurry, low quality",
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **No HTTP session needed**: Unlike OpenAI and fal, Google returns image data directly in the API response, so no separate HTTP session is required for downloading.
* **Negative prompts**: Use the `negative_prompt` parameter to specify what should not appear in the generated image, giving you more control over the output.
* **Metrics support**: Google Imagen supports TTFB metrics tracking.
# OpenAI Image Generation
Source: https://docs.pipecat.ai/api-reference/server/services/image-generation/openai
Image generation service implementation using OpenAI's DALL-E models
## Overview
`OpenAIImageGenService` provides high-quality image generation capabilities using OpenAI's DALL-E models. It transforms text prompts into images with various size options and model configurations, offering both artistic and photorealistic image creation capabilities.
Pipecat's API methods for OpenAI image generation integration
Official OpenAI DALL-E API documentation and guides
Access DALL-E models and manage API keys
## Installation
To use OpenAI image generation services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[openai]"
```
## Prerequisites
### OpenAI Account Setup
Before using OpenAI image generation services, you need:
1. **OpenAI Account**: Sign up at [OpenAI Platform](https://platform.openai.com/)
2. **API Key**: Generate an OpenAI API key from your account dashboard
3. **Model Access**: Ensure access to DALL-E models
4. **HTTP Session**: Configure aiohttp session for image downloading
### Required Environment Variables
* `OPENAI_API_KEY`: Your OpenAI API key for authentication
## Configuration
OpenAI API key for authentication.
HTTP session for downloading generated images. You must create and manage this
yourself.
Target size for generated images. *Deprecated in v0.0.105. Use
`settings=OpenAIImageGenService.Settings(image_size=...)` instead.*
Custom base URL for OpenAI API. If `None`, uses the default OpenAI endpoint.
DALL-E model to use for image generation. *Deprecated in v0.0.105. Use
`settings=OpenAIImageGenService.Settings(model=...)` instead.*
Runtime-configurable generation settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `OpenAIImageGenService.Settings(...)`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ------------ | ------------- | ----------- | ---------------------------------------------------------- |
| `model` | `str` | `NOT_GIVEN` | DALL-E model identifier. *(Inherited from base settings.)* |
| `image_size` | `str \| None` | `NOT_GIVEN` | Target size for generated images. |
`NOT_GIVEN` values are omitted from the request, letting the service use its
own defaults (`"dall-e-3"` for model). Only parameters that are explicitly set
are included.
## Usage
### Basic Setup
```python theme={null}
import aiohttp
from pipecat.services.openai import OpenAIImageGenService
async with aiohttp.ClientSession() as session:
image_gen = OpenAIImageGenService(
api_key=os.getenv("OPENAI_API_KEY"),
aiohttp_session=session,
image_size="1024x1024",
)
```
### With Settings
```python theme={null}
from pipecat.services.openai.image import OpenAIImageGenSettings
image_gen = OpenAIImageGenService(
api_key=os.getenv("OPENAI_API_KEY"),
aiohttp_session=session,
settings=OpenAIImageGenService.Settings(
model="dall-e-3",
image_size="1792x1024",
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **HTTP session required**: You must provide an `aiohttp.ClientSession` for downloading the generated images from OpenAI's URLs.
* **Image sizes vary by model**: DALL-E 3 supports `1024x1024`, `1792x1024`, and `1024x1792`. DALL-E 2 supports `256x256`, `512x512`, and `1024x1024`.
# Anthropic
Source: https://docs.pipecat.ai/api-reference/server/services/llm/anthropic
Large Language Model service implementation using Anthropic's Claude API
## Overview
`AnthropicLLMService` provides integration with Anthropic's Claude models, supporting streaming responses, function calling, and prompt caching with specialized context handling for Anthropic's message format and advanced reasoning capabilities.
Pipecat's API methods for Anthropic Claude integration
Complete example with function calling
Official Anthropic API documentation and features
Access Claude models and API keys
## Installation
To use Anthropic services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[anthropic]"
```
## Prerequisites
### Anthropic Account Setup
Before using Anthropic LLM services, you need:
1. **Anthropic Account**: Sign up at [Anthropic Console](https://console.anthropic.com/)
2. **API Key**: Generate an API key from your console dashboard
3. **Model Selection**: Choose from available Claude models (Claude Sonnet 4.5, Claude Opus 4.6, etc.)
### Required Environment Variables
* `ANTHROPIC_API_KEY`: Your Anthropic API key for authentication
## Configuration
Anthropic API key for authentication.
Claude model name to use (e.g., `"claude-sonnet-4-5-20250929"`,
`"claude-opus-4-6-20250929"`). *Deprecated in v0.0.105. Use
`settings=AnthropicLLMService.Settings(...)` instead.*
Runtime-configurable model settings. See [Settings](#settings) below.
Runtime-configurable model settings. See [Settings](#settings) below.
*Deprecated in v0.0.105. Use `settings=AnthropicLLMService.Settings(...)`
instead.*
Optional custom Anthropic client instance. Useful for custom clients like
`AsyncAnthropicBedrock` or `AsyncAnthropicVertex`.
Request timeout in seconds. Used when `retry_on_timeout` is enabled to
determine when to retry.
Whether to retry the request once if it times out. The retry attempt has no
timeout limit.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `AnthropicLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ----------------------- | ------------------------- | ----------- | ----------------------------------------------------------------------------------------------- |
| `model` | `str` | `None` | Anthropic model identifier. *(Inherited from base settings.)* |
| `system_instruction` | `str` | `None` | System instruction/prompt for the model. *(Inherited from base settings.)* |
| `max_tokens` | `int` | `NOT_GIVEN` | Maximum tokens to generate. |
| `temperature` | `float` | `NOT_GIVEN` | Sampling temperature (0.0 to 1.0). Lower values are more focused, higher values more creative. |
| `top_k` | `int` | `NOT_GIVEN` | Top-k sampling parameter. Limits tokens to the top k most likely. |
| `top_p` | `float` | `NOT_GIVEN` | Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output. |
| `enable_prompt_caching` | `bool` | `NOT_GIVEN` | Whether to enable Anthropic's prompt caching feature. Reduces costs for repeated context. |
| `thinking` | `AnthropicThinkingConfig` | `NOT_GIVEN` | Extended thinking configuration. See [AnthropicThinkingConfig](#anthropicthinkingconfig) below. |
`NOT_GIVEN` values are omitted from the API request entirely, letting the
Anthropic API use its own defaults.
### AnthropicThinkingConfig
Configuration for Anthropic's extended thinking feature, which causes the model to spend more time reasoning before responding.
| Parameter | Type | Default | Description |
| --------------- | --------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| `type` | `"enabled"` or `"disabled"` | | Whether extended thinking is enabled. |
| `budget_tokens` | `int` (optional) | `None` | Maximum number of tokens for thinking. Currently required when type is "enabled", minimum 1024 with today's models. Not allowed when "disabled". |
When extended thinking is enabled, the service emits `LLMThoughtStartFrame`, `LLMThoughtTextFrame`, and `LLMThoughtEndFrame` during response generation.
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.anthropic import AnthropicLLMService
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
model="claude-sonnet-4-5-20250929",
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.anthropic import AnthropicLLMService
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
settings=AnthropicLLMService.Settings(
model="claude-sonnet-4-5-20250929",
enable_prompt_caching=True,
max_tokens=2048,
temperature=0.7,
),
)
```
### With Extended Thinking
```python theme={null}
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
settings=AnthropicLLMService.Settings(
model="claude-sonnet-4-5-20250929",
max_tokens=16384,
thinking=AnthropicLLMService.AnthropicThinkingConfig(
type="enabled",
budget_tokens=10000,
),
),
)
```
### Updating Settings at Runtime
Model settings can be changed mid-conversation using `LLMUpdateSettingsFrame`:
```python theme={null}
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.anthropic.llm import AnthropicLLMSettings
await task.queue_frame(
LLMUpdateSettingsFrame(
delta=AnthropicLLMSettings(
temperature=0.3,
max_tokens=1024,
)
)
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Prompt caching**: When `enable_prompt_caching` is enabled, Anthropic caches repeated context to reduce costs. Cache control markers are automatically added to the most recent user messages. This is most effective for conversations with large system prompts or long conversation histories.
* **Extended thinking**: Enabling thinking increases response quality for complex tasks but adds latency. When `type="enabled"`, you must provide a `budget_tokens` value (minimum 1024 with current models). Extended thinking is disabled by default.
* **Custom clients**: You can pass custom Anthropic client instances (e.g., `AsyncAnthropicBedrock` or `AsyncAnthropicVertex`) via the `client` parameter to use Anthropic models through other cloud providers.
* **Retry behavior**: When `retry_on_timeout=True`, the first attempt uses the `retry_timeout_secs` timeout. If it times out, a second attempt is made with no timeout limit.
* **System instruction precedence**: If both `system_instruction` (from the constructor) and a system message in the context are set, the constructor's `system_instruction` takes precedence and a warning is logged.
## Event Handlers
`AnthropicLLMService` supports the following event handlers, inherited from [LLMService](/api-reference/server/events/service-events):
| Event | Description |
| --------------------------- | ----------------------------------------------------------------------- |
| `on_completion_timeout` | Called when an LLM completion request times out |
| `on_function_calls_started` | Called when function calls are received and execution is about to start |
```python theme={null}
@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
print("LLM completion timed out")
```
# AWS Bedrock
Source: https://docs.pipecat.ai/api-reference/server/services/llm/aws
Large Language Model service implementation using Amazon Bedrock API
## Overview
`AWSBedrockLLMService` provides access to Amazon's foundation models including Anthropic Claude and Amazon Nova, with streaming responses, function calling, and multimodal capabilities through Amazon's managed AI service for enterprise-grade LLM deployment.
Pipecat's API methods for AWS Bedrock integration
Complete example with function calling
Official AWS Bedrock documentation and features
Access foundation models and manage IAM
## Installation
To use AWS Bedrock services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[aws]"
```
## Prerequisites
### AWS Account Setup
Before using AWS Bedrock LLM services, you need:
1. **AWS Account**: Sign up at [AWS Console](https://console.aws.amazon.com/)
2. **IAM User**: Create an IAM user with Amazon Bedrock permissions
3. **Model Access**: Request access to foundation models in your AWS region
4. **Credentials**: Set up AWS access keys and region configuration
### Required Environment Variables
* `AWS_ACCESS_KEY_ID`: Your AWS access key ID
* `AWS_SECRET_ACCESS_KEY`: Your AWS secret access key
* `AWS_SESSION_TOKEN`: Session token (if using temporary credentials)
* `AWS_REGION`: AWS region (defaults to "us-east-1")
## Configuration
AWS Bedrock model identifier (e.g.,
`"us.anthropic.claude-sonnet-4-5-20250929-v1:0"`,
`"us.amazon.nova-pro-v1:0"`). *Deprecated in v0.0.105. Use
`settings=AWSBedrockLLMService.Settings(...)` instead.*
AWS access key ID. If `None`, uses the `AWS_ACCESS_KEY_ID` environment
variable or default credential chain.
AWS secret access key. If `None`, uses the `AWS_SECRET_ACCESS_KEY` environment
variable or default credential chain.
AWS session token for temporary credentials. If `None`, uses the
`AWS_SESSION_TOKEN` environment variable.
AWS region for the Bedrock service. If `None`, uses the `AWS_REGION`
environment variable, defaulting to `"us-east-1"`.
Runtime-configurable model settings. See [Settings](#settings) below.
Runtime-configurable model settings. See [Settings](#settings) below.
*Deprecated in v0.0.105. Use `settings=AWSBedrockLLMService.Settings(...)`
instead.*
List of strings that stop generation when encountered. *Deprecated in
v0.0.105. Use `settings=AWSBedrockLLMService.Settings(stop_sequences=...)`
instead.*
Custom boto3 client configuration. If `None`, uses defaults with 5-minute
connect/read timeouts and 3 retry attempts.
Request timeout in seconds. Used when `retry_on_timeout` is enabled to
determine when to retry.
Whether to retry the request once if it times out. The retry attempt has no
timeout limit.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `AWSBedrockLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| --------------------------------- | ----------- | ----------- | -------------------------------------------------------------------------------------------------- |
| `model` | `str` | `None` | AWS Bedrock model identifier. *(Inherited from base settings.)* |
| `system_instruction` | `str` | `None` | System instruction/prompt for the model. *(Inherited from base settings.)* |
| `max_tokens` | `int` | `NOT_GIVEN` | Maximum number of tokens to generate. |
| `temperature` | `float` | `NOT_GIVEN` | Sampling temperature (0.0 to 1.0). Lower values are more focused, higher values are more creative. |
| `top_p` | `float` | `NOT_GIVEN` | Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output. |
| `top_k` | `int` | `NOT_GIVEN` | Top-k sampling parameter. |
| `seed` | `int` | `NOT_GIVEN` | Random seed for deterministic outputs. |
| `stop_sequences` | `List[str]` | `NOT_GIVEN` | List of strings that stop generation when encountered. |
| `latency` | `str` | `NOT_GIVEN` | Performance mode: `"standard"` or `"optimized"`. |
| `additional_model_request_fields` | `dict` | `NOT_GIVEN` | Additional model-specific parameters passed directly to the API. |
`NOT_GIVEN` values are omitted from the inference config, letting the Bedrock
API use its own defaults. Only parameters that are explicitly set are included
in the request. This avoids conflicts with models that don't allow certain
parameter combinations (e.g., `temperature` and `top_p` together).
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.aws import AWSBedrockLLMService
llm = AWSBedrockLLMService(
model="us.anthropic.claude-sonnet-4-5-20250929-v1:0",
aws_access_key=os.getenv("AWS_ACCESS_KEY_ID"),
aws_secret_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
aws_region=os.getenv("AWS_REGION", "us-east-1"),
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.aws import AWSBedrockLLMService
llm = AWSBedrockLLMService(
model="us.amazon.nova-pro-v1:0",
aws_access_key=os.getenv("AWS_ACCESS_KEY_ID"),
aws_secret_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
aws_region="us-east-1",
settings=AWSBedrockLLMService.Settings(
max_tokens=2048,
temperature=0.7,
latency="optimized",
),
)
```
### Updating Settings at Runtime
Model settings can be changed mid-conversation using `LLMUpdateSettingsFrame`:
```python theme={null}
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.aws.llm import AWSBedrockLLMSettings
await task.queue_frame(
LLMUpdateSettingsFrame(
delta=AWSBedrockLLMSettings(
temperature=0.3,
max_tokens=1024,
)
)
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Credential chain**: If `aws_access_key` and `aws_secret_key` are not provided, the service falls back to environment variables and then the standard AWS credential chain (IAM roles, instance profiles, etc.).
* **No-op tool handling**: AWS Bedrock requires at least one tool to be defined when tool content exists in the conversation. The service automatically adds a placeholder tool when needed to prevent API errors.
* **Model-specific parameters**: Some models (e.g., Claude Sonnet 4.5) don't allow certain parameter combinations. The service only includes explicitly set parameters in the inference config to avoid conflicts.
* **Retry behavior**: When `retry_on_timeout=True`, the first attempt uses the `retry_timeout_secs` timeout. If it times out, a second attempt is made with no timeout limit.
* **System instruction precedence**: If both `system_instruction` (from the constructor) and a system message in the context are set, the constructor's `system_instruction` takes precedence and a warning is logged.
## Event Handlers
`AWSBedrockLLMService` supports the following event handlers, inherited from [LLMService](/api-reference/server/events/service-events):
| Event | Description |
| --------------------------- | ----------------------------------------------------------------------- |
| `on_completion_timeout` | Called when an LLM completion request times out |
| `on_function_calls_started` | Called when function calls are received and execution is about to start |
```python theme={null}
@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
print("LLM completion timed out")
```
# Azure
Source: https://docs.pipecat.ai/api-reference/server/services/llm/azure
Large Language Model service implementation using Azure OpenAI API
## Overview
`AzureLLMService` provides access to Azure OpenAI's language models through an OpenAI-compatible interface. It inherits from `OpenAILLMService` and supports streaming responses, function calling, and context management with enterprise-grade security and compliance.
Pipecat's API methods for Azure OpenAI integration
Complete example with function calling
Official Azure OpenAI documentation and setup
Create OpenAI resources and get credentials
## Installation
To use Azure OpenAI services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[azure]"
```
## Prerequisites
### Azure OpenAI Setup
Before using Azure OpenAI LLM services, you need:
1. **Azure Account**: Sign up at [Azure Portal](https://portal.azure.com/)
2. **OpenAI Resource**: Create an Azure OpenAI resource in your subscription
3. **Model Deployment**: Deploy your chosen model (GPT-4, GPT-4o, etc.)
4. **Credentials**: Get your API key, endpoint, and deployment name
### Required Environment Variables
* `AZURE_CHATGPT_API_KEY`: Your Azure OpenAI API key
* `AZURE_CHATGPT_ENDPOINT`: Your Azure OpenAI endpoint URL
* `AZURE_CHATGPT_MODEL`: Your model deployment name
## Configuration
Azure OpenAI API key for authentication.
Azure OpenAI endpoint URL (e.g., `"https://your-resource.openai.azure.com/"`).
*Deprecated in v0.0.105. Use `settings=AzureLLMService.Settings(model=...)`
instead.*
Azure OpenAI API version string.
Runtime-configurable settings. See [Settings](#settings) below.
Since `AzureLLMService` inherits from `OpenAILLMService`, it also accepts the following parameters:
*Deprecated in v0.0.105. Use `settings=AzureLLMService.Settings(...)`
instead.*
Request timeout in seconds. Used when `retry_on_timeout` is enabled to
determine when to retry.
Whether to retry the request once if it times out. The retry attempt has no
timeout limit.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `AzureLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
`AzureLLMService` uses the same settings as `OpenAILLMService`. See the [OpenAI LLM Settings](/api-reference/server/services/llm/openai#settings) section for the full parameter reference.
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.azure import AzureLLMService
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.azure import AzureLLMService
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
api_version="2024-09-01-preview",
settings=AzureLLMService.Settings(
temperature=0.7,
max_completion_tokens=1000,
frequency_penalty=0.5,
),
)
```
### Updating Settings at Runtime
Model settings can be changed mid-conversation using `LLMUpdateSettingsFrame`:
```python theme={null}
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.openai.base_llm import OpenAILLMSettings
await task.queue_frame(
LLMUpdateSettingsFrame(
delta=OpenAILLMSettings(
temperature=0.3,
max_completion_tokens=500,
)
)
)
```
## Notes
* **Deployment name vs model name**: The `model` parameter should be your Azure deployment name, not the underlying model name (e.g., use `"my-gpt4-deployment"` instead of `"gpt-4"`).
* **API version**: Different API versions support different features. Check the [Azure OpenAI documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference) for version-specific capabilities.
* **Full OpenAI compatibility**: Since `AzureLLMService` inherits from `OpenAILLMService`, it supports all the same features including function calling, vision input, and streaming responses.
## Event Handlers
`AzureLLMService` supports the same event handlers as `OpenAILLMService`, inherited from [LLMService](/api-reference/server/events/service-events):
| Event | Description |
| --------------------------- | ----------------------------------------------------------------------- |
| `on_completion_timeout` | Called when an LLM completion request times out |
| `on_function_calls_started` | Called when function calls are received and execution is about to start |
```python theme={null}
@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
print("LLM completion timed out")
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# Cerebras
Source: https://docs.pipecat.ai/api-reference/server/services/llm/cerebras
LLM service implementation using Cerebras's API with OpenAI-compatible interface
## Overview
`CerebrasLLMService` provides access to Cerebras's language models through an OpenAI-compatible interface. It inherits from `OpenAILLMService` and supports streaming responses, function calling, and context management with ultra-fast inference speeds.
Pipecat's API methods for Cerebras integration
Complete example with function calling
Official Cerebras inference API documentation
Access models and manage API keys
## Installation
To use Cerebras services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[cerebras]"
```
## Prerequisites
### Cerebras Account Setup
Before using Cerebras LLM services, you need:
1. **Cerebras Account**: Sign up at [Cerebras Cloud](https://cloud.cerebras.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Selection**: Choose from available Cerebras models with ultra-fast inference
### Required Environment Variables
* `CEREBRAS_API_KEY`: Your Cerebras API key for authentication
## Configuration
Cerebras API key for authentication.
Base URL for Cerebras API endpoint.
Model identifier to use.
*Deprecated in v0.0.105. Use `settings=CerebrasLLMService.Settings(model=...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `CerebrasLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
This service uses the same settings as `OpenAILLMService`. See [OpenAI LLM Settings](/api-reference/server/services/llm/openai#settings) for the full parameter reference.
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.cerebras import CerebrasLLMService
llm = CerebrasLLMService(
api_key=os.getenv("CEREBRAS_API_KEY"),
model="gpt-oss-120b",
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.cerebras import CerebrasLLMService
llm = CerebrasLLMService(
api_key=os.getenv("CEREBRAS_API_KEY"),
settings=CerebrasLLMService.Settings(
model="gpt-oss-120b",
temperature=0.7,
top_p=0.9,
max_completion_tokens=1024,
),
)
```
## Notes
* Cerebras supports a subset of OpenAI parameters. Advanced parameters like `frequency_penalty` and `presence_penalty` are not passed to the API.
* Cerebras is known for ultra-fast inference speeds on supported models.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# DeepSeek
Source: https://docs.pipecat.ai/api-reference/server/services/llm/deepseek
LLM service implementation using DeepSeek's API with OpenAI-compatible interface
## Overview
`DeepSeekLLMService` provides access to DeepSeek's language models through an OpenAI-compatible interface. It inherits from `OpenAILLMService` and supports streaming responses, function calling, and context management with advanced reasoning capabilities.
Pipecat's API methods for DeepSeek integration
Complete example with function calling
Official DeepSeek API documentation and features
Access models and manage API keys
## Installation
To use DeepSeek services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[deepseek]"
```
## Prerequisites
### DeepSeek Account Setup
Before using DeepSeek LLM services, you need:
1. **DeepSeek Account**: Sign up at [DeepSeek Platform](https://platform.deepseek.com/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Selection**: Choose from available DeepSeek models with reasoning capabilities
### Required Environment Variables
* `DEEPSEEK_API_KEY`: Your DeepSeek API key for authentication
## Configuration
DeepSeek API key for authentication.
Base URL for DeepSeek API endpoint.
Model identifier to use.
*Deprecated in v0.0.105. Use `settings=DeepSeekLLMService.Settings(model=...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `DeepSeekLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
This service uses the same settings as `OpenAILLMService`. See [OpenAI LLM Settings](/api-reference/server/services/llm/openai#settings) for the full parameter reference.
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.deepseek import DeepSeekLLMService
llm = DeepSeekLLMService(
api_key=os.getenv("DEEPSEEK_API_KEY"),
model="deepseek-chat",
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.deepseek import DeepSeekLLMService
llm = DeepSeekLLMService(
api_key=os.getenv("DEEPSEEK_API_KEY"),
settings=DeepSeekLLMService.Settings(
model="deepseek-chat",
temperature=0.7,
top_p=0.9,
max_tokens=2048,
),
)
```
## Notes
* DeepSeek does not support the `seed` and `max_completion_tokens` parameters. Use `max_tokens` instead.
* DeepSeek models offer strong reasoning capabilities, particularly the `deepseek-reasoner` model variant.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# Fireworks AI
Source: https://docs.pipecat.ai/api-reference/server/services/llm/fireworks
LLM service implementation using Fireworks AI's API with OpenAI-compatible interface
## Overview
`FireworksLLMService` provides access to Fireworks AI's language models through an OpenAI-compatible interface. It inherits from `OpenAILLMService` and supports streaming responses, function calling, and context management with optimized inference infrastructure.
Pipecat's API methods for Fireworks AI integration
Complete example with function calling
Official Fireworks AI API documentation and features
Access models and manage API keys
## Installation
To use Fireworks AI services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[fireworks]"
```
## Prerequisites
### Fireworks AI Account Setup
Before using Fireworks AI LLM services, you need:
1. **Fireworks Account**: Sign up at [Fireworks AI](https://fireworks.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Selection**: Choose from available open-source and proprietary models
### Required Environment Variables
* `FIREWORKS_API_KEY`: Your Fireworks AI API key for authentication
## Configuration
Fireworks AI API key for authentication.
Model identifier to use.
*Deprecated in v0.0.105. Use `settings=FireworksLLMService.Settings(model=...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
Base URL for Fireworks API endpoint.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `FireworksLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
This service uses the same settings as `OpenAILLMService`. See [OpenAI LLM Settings](/api-reference/server/services/llm/openai#settings) for the full parameter reference.
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.fireworks import FireworksLLMService
llm = FireworksLLMService(
api_key=os.getenv("FIREWORKS_API_KEY"),
model="accounts/fireworks/models/firefunction-v2",
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.fireworks import FireworksLLMService
llm = FireworksLLMService(
api_key=os.getenv("FIREWORKS_API_KEY"),
settings=FireworksLLMService.Settings(
model="accounts/fireworks/models/firefunction-v2",
temperature=0.7,
top_p=0.9,
max_tokens=1024,
),
)
```
## Notes
* Fireworks does not support the `seed`, `max_completion_tokens`, or `stream_options` parameters. Use `max_tokens` instead.
* Model identifiers use the `accounts/fireworks/models/` prefix format.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# Google Gemini
Source: https://docs.pipecat.ai/api-reference/server/services/llm/google
Large Language Model service implementation using Google's Gemini API
## Overview
`GoogleLLMService` provides integration with Google's Gemini models, supporting streaming responses, function calling, and multimodal inputs. It includes specialized context handling for Google's message format while maintaining compatibility with OpenAI-style contexts.
Pipecat's API methods for Google Gemini integration
Complete example with function calling
Official Google Gemini API documentation and features
Access Gemini models and manage API keys
## Installation
To use Google Gemini services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[google]"
```
## Prerequisites
### Google Gemini Setup
Before using Google Gemini LLM services, you need:
1. **Google Account**: Sign up at [Google AI Studio](https://aistudio.google.com/)
2. **API Key**: Generate a Gemini API key from AI Studio
3. **Model Selection**: Choose from available Gemini models (Gemini 2.5 Flash, Gemini 2.5 Pro, etc.)
### Required Environment Variables
* `GOOGLE_API_KEY`: Your Google Gemini API key for authentication
## Configuration
Google AI API key for authentication.
Gemini model name to use (e.g., `"gemini-2.5-flash"`, `"gemini-2.5-pro"`).
*Deprecated in v0.0.105. Use `settings=GoogleLLMService.Settings(...)`
instead.*
Runtime-configurable model settings. See [Settings](#settings) below.
Runtime-configurable model settings. See [Settings](#settings) below.
*Deprecated in v0.0.105. Use `settings=GoogleLLMService.Settings(...)`
instead.*
System instruction/prompt for the model. Sets the overall behavior and
context. *Deprecated in v0.0.105. Use
`settings=GoogleLLMService.Settings(system_instruction=...)` instead.*
List of available tools/functions for the model to call.
Configuration for tool usage behavior.
HTTP options for the Google API client.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `GoogleLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| -------------------- | ---------------------- | ----------- | -------------------------------------------------------------------------------------------------- |
| `model` | `str` | `None` | Gemini model identifier. *(Inherited from base settings.)* |
| `system_instruction` | `str` | `None` | System instruction/prompt for the model. *(Inherited from base settings.)* |
| `max_tokens` | `int` | `NOT_GIVEN` | Maximum number of tokens to generate. |
| `temperature` | `float` | `NOT_GIVEN` | Sampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative. |
| `top_k` | `int` | `NOT_GIVEN` | Top-k sampling parameter. Limits tokens to the top k most likely. |
| `top_p` | `float` | `NOT_GIVEN` | Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output. |
| `thinking` | `GoogleThinkingConfig` | `NOT_GIVEN` | Thinking configuration. See [GoogleThinkingConfig](#googlethinkingconfig) below. |
`NOT_GIVEN` values are omitted from the API request, letting the Gemini API
use its own defaults. If `thinking` is not provided, Pipecat disables thinking
for Gemini 2.5 Flash models (where possible) to reduce latency.
### GoogleThinkingConfig
Configuration for controlling the model's internal thinking process. Gemini 2.5 and 3 series models support this feature.
| Parameter | Type | Default | Description |
| ------------------ | ------ | ------- | ------------------------------------------------------------------------------------------------------------------------ |
| `thinking_budget` | `int` | `None` | Token budget for thinking (Gemini 2.5 series). -1 for dynamic, 0 to disable, or a specific count (e.g., 128-32768). |
| `thinking_level` | `str` | `None` | Thinking level for Gemini 3 models. `"low"`, `"high"` for 3 Pro; `"minimal"`, `"low"`, `"medium"`, `"high"` for 3 Flash. |
| `include_thoughts` | `bool` | `None` | Whether to include thought summaries in the response. |
Gemini 2.5 series models use `thinking_budget`, while Gemini 3 models use
`thinking_level`. Do not mix these parameters across model generations.
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.google import GoogleLLMService
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash",
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.google import GoogleLLMService
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
settings=GoogleLLMService.Settings(
model="gemini-2.5-pro",
system_instruction="You are a helpful assistant.",
temperature=0.7,
max_tokens=2048,
top_p=0.9,
),
)
```
### With Thinking Configuration
```python theme={null}
# Gemini 2.5 series (using thinking_budget)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
settings=GoogleLLMService.Settings(
model="gemini-2.5-pro",
max_tokens=8192,
thinking=GoogleLLMService.GoogleThinkingConfig(
thinking_budget=4096,
include_thoughts=True,
),
),
)
# Gemini 3 series (using thinking_level)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
settings=GoogleLLMService.Settings(
model="gemini-3-flash",
max_tokens=8192,
thinking=GoogleLLMService.GoogleThinkingConfig(
thinking_level="high",
include_thoughts=True,
),
),
)
```
### Updating Settings at Runtime
Model settings can be changed mid-conversation using `LLMUpdateSettingsFrame`:
```python theme={null}
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.google.llm import GoogleLLMSettings
await task.queue_frame(
LLMUpdateSettingsFrame(
delta=GoogleLLMSettings(
temperature=0.3,
max_tokens=1024,
)
)
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **System instruction priority**: The `system_instruction` set via the constructor or `GoogleLLMSettings` takes priority over any system message in the context. If both are set, a warning is logged and the constructor/settings value is used.
* **Thinking defaults**: By default, Pipecat disables thinking for Gemini 2.5 Flash models to reduce latency. To enable it, explicitly pass a `GoogleThinkingConfig` via `settings`.
* **Multimodal support**: Gemini models natively support image and audio inputs through Google's Content/Part format. Images and audio are automatically converted from OpenAI-style contexts.
* **Grounding with Google Search**: When grounding metadata is present in the response (e.g., from Google Search tool), the service emits `LLMSearchResponseFrame` with search results and source attributions.
* **Context format**: The service automatically converts between OpenAI-style message formats and Google's native Content/Part format, so you can use either.
## Event Handlers
`GoogleLLMService` supports the following event handlers, inherited from [LLMService](/api-reference/server/events/service-events):
| Event | Description |
| --------------------------- | --------------------------------------------------------------------------- |
| `on_completion_timeout` | Called when an LLM completion request times out (Google `DeadlineExceeded`) |
| `on_function_calls_started` | Called when function calls are received and execution is about to start |
```python theme={null}
@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
print("LLM completion timed out")
```
# Google Vertex AI
Source: https://docs.pipecat.ai/api-reference/server/services/llm/google-vertex
LLM service implementation using Google's Vertex AI with OpenAI-compatible interface
## Overview
`GoogleVertexLLMService` provides access to Google's language models through Vertex AI while maintaining an OpenAI-compatible interface. It inherits from `OpenAILLMService` and supports all OpenAI features while connecting to Google's enterprise AI services with enhanced security and compliance.
Pipecat's API methods for Google Vertex AI integration
Browse examples using Vertex AI models
Official Google Vertex AI documentation
Access Vertex AI and manage credentials
## Installation
To use Google Vertex AI services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[google]"
```
## Prerequisites
### Google Cloud Setup
Before using Google Vertex AI LLM services, you need:
1. **Google Cloud Account**: Sign up at [Google Cloud Console](https://console.cloud.google.com/)
2. **Project Setup**: Create a project and enable the Vertex AI API
3. **Service Account**: Create a service account with Vertex AI permissions
4. **Authentication**: Set up credentials via service account key or Application Default Credentials
### Required Environment Variables
* `GOOGLE_APPLICATION_CREDENTIALS`: Path to your service account key file (recommended)
* Or use Application Default Credentials for cloud deployments
## Configuration
JSON string of Google service account credentials for authentication.
Path to the service account JSON key file. Alternative to providing
credentials as a string.
Google Cloud project ID.
GCP region for the Vertex AI endpoint (e.g., `"us-east4"`, `"us-central1"`).
*Deprecated in v0.0.105. Use
`settings=GoogleVertexLLMService.Settings(model=...)` instead.*
*Deprecated in v0.0.105. Use `settings=GoogleVertexLLMService.Settings(...)`
instead.*
Runtime-configurable settings. See [Google Gemini
Settings](/api-reference/server/services/llm/google#settings) for the full parameter
reference.
*Deprecated in v0.0.105. Use
`settings=GoogleVertexLLMService.Settings(system_instruction=...)` instead.*
List of available tools/functions for the model.
Configuration for tool usage behavior.
HTTP options for the Google AI client.
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.google import GoogleVertexLLMService
llm = GoogleVertexLLMService(
credentials_path=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
project_id="my-gcp-project",
location="us-east4",
model="gemini-2.5-flash",
)
```
### With Credentials JSON String
```python theme={null}
from pipecat.services.google import GoogleVertexLLMService
llm = GoogleVertexLLMService(
credentials=os.getenv("GOOGLE_CREDENTIALS_JSON"),
project_id="my-gcp-project",
location="us-central1",
settings=GoogleVertexLLMService.Settings(
model="gemini-2.5-flash",
temperature=0.7,
top_p=0.9,
),
)
```
### With Application Default Credentials
```python theme={null}
from pipecat.services.google import GoogleVertexLLMService
# Uses ADC when neither credentials nor credentials_path is provided
llm = GoogleVertexLLMService(
project_id="my-gcp-project",
location="us-east4",
model="gemini-2.5-flash",
)
```
## Notes
* This service does **not** accept an `api_key` parameter. Use `credentials`, `credentials_path`, or Application Default Credentials instead.
* `GoogleVertexLLMService` extends `GoogleLLMService` (not `OpenAILLMService` directly) and uses the Google AI Python SDK with Vertex AI authentication.
* Authentication supports three methods: direct JSON credentials string, path to a service account key file, or Application Default Credentials (ADC).
* The `project_id` parameter is required. If `location` is not provided, it defaults to `"us-east4"`.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# Grok
Source: https://docs.pipecat.ai/api-reference/server/services/llm/grok
LLM service implementation using Grok's API with OpenAI-compatible interface
## Overview
`GrokLLMService` provides access to Grok's language models through an OpenAI-compatible interface. It inherits from `OpenAILLMService` and supports streaming responses, function calling, and context management with Grok's unique reasoning capabilities.
Pipecat's API methods for Grok integration
Complete example with function calling
Official Grok API documentation and features
Access Grok models and manage API keys
## Installation
To use Grok services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[grok]"
```
## Prerequisites
### Grok Account Setup
Before using Grok LLM services, you need:
1. **X.AI Account**: Sign up at [X.AI Console](https://console.x.ai/)
2. **API Key**: Generate an API key from your console dashboard
3. **Model Selection**: Choose from available Grok models
### Required Environment Variables
* `XAI_API_KEY`: Your X.AI API key for authentication
## Configuration
X.AI API key for authentication.
Base URL for Grok API endpoint.
Model identifier to use.
*Deprecated in v0.0.105. Use `settings=GrokLLMService.Settings(model=...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `GrokLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
This service uses the same settings as `OpenAILLMService`. See [OpenAI LLM Settings](/api-reference/server/services/llm/openai#settings) for the full parameter reference.
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.xai.llm import GrokLLMService
llm = GrokLLMService(
api_key=os.getenv("XAI_API_KEY"),
model="grok-3-beta",
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.xai.llm import GrokLLMService
llm = GrokLLMService(
api_key=os.getenv("XAI_API_KEY"),
settings=GrokLLMService.Settings(
model="grok-3-beta",
temperature=0.7,
top_p=0.9,
max_completion_tokens=1024,
),
)
```
## Notes
* Grok uses incremental token reporting. The service accumulates token usage metrics during processing and reports the final totals at the end of each request.
* Grok supports prompt caching and reasoning tokens, which are tracked in usage metrics when available.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# Groq
Source: https://docs.pipecat.ai/api-reference/server/services/llm/groq
LLM service implementation using Groq's API with OpenAI-compatible interface
## Overview
`GroqLLMService` provides access to Groq's language models through an OpenAI-compatible interface. It inherits from `OpenAILLMService` and supports streaming responses, function calling, and context management with ultra-fast inference speeds.
Pipecat's API methods for Groq integration
Complete example with function calling
Official Groq API documentation and features
Access models and manage API keys
## Installation
To use Groq services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[groq]"
```
## Prerequisites
### Groq Account Setup
Before using Groq LLM services, you need:
1. **Groq Account**: Sign up at [Groq Console](https://console.groq.com/)
2. **API Key**: Generate an API key from your console dashboard
3. **Model Selection**: Choose from available models with ultra-fast inference
### Required Environment Variables
* `GROQ_API_KEY`: Your Groq API key for authentication
## Configuration
Groq API key for authentication.
Base URL for Groq API endpoint.
Model identifier to use.
*Deprecated in v0.0.105. Use `settings=GroqLLMService.Settings(model=...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `GroqLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
This service uses the same settings as `OpenAILLMService`. See [OpenAI LLM Settings](/api-reference/server/services/llm/openai#settings) for the full parameter reference.
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.groq import GroqLLMService
llm = GroqLLMService(
api_key=os.getenv("GROQ_API_KEY"),
model="llama-3.3-70b-versatile",
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.groq import GroqLLMService
llm = GroqLLMService(
api_key=os.getenv("GROQ_API_KEY"),
settings=GroqLLMService.Settings(
model="llama-3.3-70b-versatile",
temperature=0.7,
top_p=0.9,
max_completion_tokens=1024,
),
)
```
## Notes
* Groq provides ultra-fast inference using custom LPU (Language Processing Unit) hardware.
* Groq fully supports the OpenAI-compatible parameter set inherited from `OpenAILLMService`.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# Mistral
Source: https://docs.pipecat.ai/api-reference/server/services/llm/mistral
LLM service implementation using Mistral's API with OpenAI-compatible interface
## Overview
`MistralLLMService` provides access to Mistral's language models through an OpenAI-compatible interface. It inherits from `OpenAILLMService` and supports streaming responses, function calling, and vision with Mistral-specific optimizations for tool use and message handling.
Pipecat's API methods for Mistral integration
Complete example with function calling
Official Mistral API documentation and features
Access models and manage API keys
## Installation
To use Mistral services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[mistral]"
```
## Prerequisites
### Mistral Account Setup
Before using Mistral LLM services, you need:
1. **Mistral Account**: Sign up at [Mistral Console](https://console.mistral.ai/)
2. **API Key**: Generate an API key from your console dashboard
3. **Model Selection**: Choose from available models (Mistral Small, Mistral Large, etc.)
### Required Environment Variables
* `MISTRAL_API_KEY`: Your Mistral API key for authentication
## Configuration
Mistral API key for authentication.
Base URL for Mistral API endpoint.
Model identifier to use.
*Deprecated in v0.0.105. Use `settings=MistralLLMService.Settings(model=...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `MistralLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
This service uses the same settings as `OpenAILLMService`. See [OpenAI LLM Settings](/api-reference/server/services/llm/openai#settings) for the full parameter reference.
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.mistral import MistralLLMService
llm = MistralLLMService(
api_key=os.getenv("MISTRAL_API_KEY"),
model="mistral-small-latest",
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.mistral import MistralLLMService
llm = MistralLLMService(
api_key=os.getenv("MISTRAL_API_KEY"),
settings=MistralLLMService.Settings(
model="mistral-large-latest",
temperature=0.7,
top_p=0.9,
max_completion_tokens=1024,
),
)
```
### Updating Settings at Runtime
```python theme={null}
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.mistral.llm import MistralLLMSettings
await task.queue_frame(
LLMUpdateSettingsFrame(
delta=MistralLLMSettings(
temperature=0.3,
max_tokens=512,
)
)
)
```
## Notes
* **Function calling**: Mistral supports tool/function calling. The service includes deduplication logic to prevent repeated execution of the same function calls across conversation turns.
* **Mistral API constraints**: The service automatically handles Mistral-specific requirements, such as ensuring tool result messages are followed by an assistant message and that system messages appear only at the start of the conversation.
* **Vision**: Supports image inputs via base64-encoded JPEG content.
* **Default model**: `mistral-small-latest` is used when no model is specified.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# Novita AI
Source: https://docs.pipecat.ai/api-reference/server/services/llm/novita
LLM service implementation using Novita AI's OpenAI-compatible API
## Overview
`NovitaLLMService` provides access to Novita AI's language models through an OpenAI-compatible interface. It inherits from `OpenAILLMService` and supports streaming responses, function calling, and context management with competitive pricing and a wide selection of open-source models.
Pipecat's API methods for Novita AI integration
Complete example with function calling
Official Novita AI API documentation and features
Access models and manage API keys
## Installation
To use Novita AI services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[novita]"
```
## Prerequisites
### Novita AI Account Setup
Before using Novita AI LLM services, you need:
1. **Novita AI Account**: Sign up at [Novita AI](https://novita.ai)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Selection**: Choose from a wide selection of open-source models
### Required Environment Variables
* `NOVITA_API_KEY`: Your Novita AI API key for authentication
## Configuration
Novita AI API key for authentication.
Base URL for Novita AI API endpoint.
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `NovitaLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
This service uses the same settings as `OpenAILLMService`. See [OpenAI LLM Settings](/api-reference/server/services/llm/openai#settings) for the full parameter reference.
The default model is `"moonshotai/kimi-k2.5"`.
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.novita import NovitaLLMService
llm = NovitaLLMService(
api_key=os.getenv("NOVITA_API_KEY"),
settings=NovitaLLMService.Settings(
model="openai/gpt-oss-120b",
),
)
```
### With Custom Settings
```python theme={null}
llm = NovitaLLMService(
api_key=os.getenv("NOVITA_API_KEY"),
settings=NovitaLLMService.Settings(
model="moonshotai/kimi-k2.5",
temperature=0.7,
max_tokens=500,
),
)
```
### With Function Calling
```python theme={null}
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.services.llm_service import FunctionCallParams
async def get_weather(params: FunctionCallParams):
await params.result_callback({"temperature": "75", "conditions": "sunny"})
llm = NovitaLLMService(
api_key=os.getenv("NOVITA_API_KEY"),
settings=NovitaLLMService.Settings(
model="openai/gpt-oss-120b",
),
)
llm.register_function("get_weather", get_weather)
weather_function = FunctionSchema(
name="get_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "City and state, e.g. San Francisco, CA",
},
},
required=["location"],
)
tools = ToolsSchema(standard_tools=[weather_function])
context = LLMContext(tools=tools)
```
## Notes
* Novita AI provides an OpenAI-compatible API, so all OpenAI features and patterns work with this service
* The service supports streaming responses, function calling, and other OpenAI-compatible features
* Model selection depends on your Novita AI account access and pricing tier
# NVIDIA NIM
Source: https://docs.pipecat.ai/api-reference/server/services/llm/nvidia
LLM service implementation using NVIDIA's NIM (NVIDIA Inference Microservice) API with OpenAI-compatible interface
## Overview
`NvidiaLLMService` provides access to NVIDIA's NIM language models through an OpenAI-compatible interface. It inherits from `OpenAILLMService` and supports streaming responses, function calling, and context management, with special handling for NVIDIA's incremental token reporting and enterprise deployment.
Pipecat's API methods for NVIDIA NIM integration
Complete example with function calling
Official NVIDIA NIM documentation and setup
Access NIM services and manage API keys
## Installation
To use NVIDIA NIM services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[nvidia]"
```
## Prerequisites
### NVIDIA NIM Setup
Before using NVIDIA NIM LLM services, you need:
1. **NVIDIA Developer Account**: Sign up at [NVIDIA Developer Portal](https://developer.nvidia.com/)
2. **API Key**: Generate an NVIDIA API key for NIM services
3. **Model Selection**: Choose from available NIM-hosted models
4. **Enterprise Setup**: Configure NIM for on-premises deployment if needed
### Required Environment Variables
* `NVIDIA_API_KEY`: Your NVIDIA API key for authentication
## Configuration
NVIDIA API key for authentication.
Base URL for NIM API endpoint.
Model identifier to use.
*Deprecated in v0.0.105. Use `settings=NvidiaLLMService.Settings(model=...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `NvidiaLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
This service uses the same settings as `OpenAILLMService`. See [OpenAI LLM Settings](/api-reference/server/services/llm/openai#settings) for the full parameter reference.
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.nvidia import NvidiaLLMService
llm = NvidiaLLMService(
api_key=os.getenv("NVIDIA_API_KEY"),
model="nvidia/llama-3.1-nemotron-70b-instruct",
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.nvidia import NvidiaLLMService
llm = NvidiaLLMService(
api_key=os.getenv("NVIDIA_API_KEY"),
settings=NvidiaLLMService.Settings(
model="nvidia/llama-3.1-nemotron-70b-instruct",
temperature=0.7,
top_p=0.9,
max_completion_tokens=1024,
),
)
```
## Notes
* NVIDIA NIM uses incremental token reporting. The service accumulates token usage metrics during processing and reports the final totals at the end of each request.
* The legacy `NimLLMService` import from `pipecat.services.nim` is deprecated. Use `NvidiaLLMService` from `pipecat.services.nvidia` instead.
* NIM supports both cloud-hosted and on-premises deployments. For on-premises, override the `base_url` to point to your local NIM endpoint.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# Ollama
Source: https://docs.pipecat.ai/api-reference/server/services/llm/ollama
LLM service implementation using Ollama with OpenAI-compatible interface
## Overview
`OLLamaLLMService` provides access to locally-run Ollama models through an OpenAI-compatible interface. It inherits from `BaseOpenAILLMService` and allows you to run various open-source models locally while maintaining compatibility with OpenAI's API format for privacy and cost control.
Pipecat's API methods for Ollama integration
Browse examples using Ollama models
Official Ollama documentation and model library
Download and setup instructions for Ollama
## Installation
To use Ollama services, you need to install both Ollama and the Pipecat dependency:
1. **Install Ollama** on your system from [ollama.com/download](https://ollama.com/download)
2. **Install Pipecat dependency**:
```bash theme={null}
pip install "pipecat-ai[ollama]"
```
3. **Pull a model** (first time only):
```bash theme={null}
ollama pull llama2
```
## Prerequisites
### Ollama Local Setup
Before using Ollama LLM services, you need:
1. **Ollama Installation**: Download and install Ollama from [ollama.com](https://ollama.com/download)
2. **Model Selection**: Pull your desired models (llama2, mistral, codellama, etc.)
3. **Local Service**: Ensure Ollama service is running (default port 11434)
4. **Hardware**: Sufficient RAM and storage for your chosen models
### Configuration
* **No API Keys Required**: Ollama runs entirely locally
* **Model Management**: Use `ollama pull ` to download models
* **Service URL**: Default is `http://localhost:11434` (configurable)
Ollama runs as a local service on port 11434. No API key required for complete
privacy!
## Configuration
The Ollama model to use. Must be pulled locally first with `ollama pull`.
*Deprecated in v0.0.105. Use `settings=OLLamaLLMService.Settings(model=...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
Base URL for the Ollama API endpoint.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `OLLamaLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
This service uses the same settings as `OpenAILLMService`. See [OpenAI LLM Settings](/api-reference/server/services/llm/openai#settings) for the full parameter reference.
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.ollama import OLLamaLLMService
llm = OLLamaLLMService(
model="llama2",
)
```
### With Custom Model and URL
```python theme={null}
from pipecat.services.ollama import OLLamaLLMService
llm = OLLamaLLMService(
model="mistral",
base_url="http://localhost:11434/v1",
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.ollama import OLLamaLLMService
llm = OLLamaLLMService(
base_url="http://localhost:11434/v1",
settings=OLLamaLLMService.Settings(
model="mistral",
temperature=0.7,
),
)
```
## Notes
* No API key is required. The service automatically uses a placeholder key (`"ollama"`) for OpenAI client compatibility.
* The Ollama service must be running locally before starting your pipeline. Start it with `ollama serve` if it is not already running.
* Model capabilities (function calling, vision, etc.) depend on the specific model you pull. Check the [Ollama model library](https://ollama.com/library) for details.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# OpenAI
Source: https://docs.pipecat.ai/api-reference/server/services/llm/openai
Large Language Model services using OpenAI's chat completion API
## Overview
`OpenAILLMService` provides chat completion capabilities using OpenAI's API, supporting streaming responses, function calling, vision input, and advanced context management for conversational AI applications with state-of-the-art language models.
Pipecat's API methods for OpenAI integration
Function calling example with weather API
Official OpenAI API documentation
Access models and manage API keys
## Installation
To use OpenAI services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[openai]"
```
## Prerequisites
### OpenAI Account Setup
Before using OpenAI LLM services, you need:
1. **OpenAI Account**: Sign up at [OpenAI Platform](https://platform.openai.com/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Selection**: Choose from available models (GPT-4.1, GPT-4o, GPT-4o-mini, etc.)
4. **Usage Limits**: Set up billing and usage limits as needed
### Required Environment Variables
* `OPENAI_API_KEY`: Your OpenAI API key for authentication
## Configuration
OpenAI model name to use (e.g., `"gpt-4.1"`, `"gpt-4o"`, `"gpt-4o-mini"`).
*Deprecated in v0.0.105. Use `settings=OpenAILLMService.Settings(...)`
instead.*
OpenAI API key. If `None`, uses the `OPENAI_API_KEY` environment variable.
Custom base URL for the OpenAI API. Override for proxied or self-hosted
deployments.
OpenAI organization ID.
OpenAI project ID.
Additional HTTP headers to include in every request.
Runtime-configurable model settings. See [Settings](#settings) below.
Runtime-configurable model settings. See [Settings](#settings) below.
*Deprecated in v0.0.105. Use `settings=OpenAILLMService.Settings(...)`
instead.*
Request timeout in seconds. Used when `retry_on_timeout` is enabled to
determine when to retry.
Whether to retry the request once if it times out. The retry attempt has no
timeout limit.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `OpenAILLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ----------------------- | ------- | ----------- | --------------------------------------------------------------------------------------------------- |
| `model` | `str` | `"gpt-4.1"` | OpenAI model identifier. *(Inherited from base settings.)* |
| `system_instruction` | `str` | `None` | System instruction/prompt for the model. *(Inherited from base settings.)* |
| `temperature` | `float` | `NOT_GIVEN` | Sampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative. |
| `max_tokens` | `int` | `NOT_GIVEN` | Maximum tokens to generate. |
| `top_p` | `float` | `NOT_GIVEN` | Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output. |
| `top_k` | `int` | `NOT_GIVEN` | Top-k sampling parameter. |
| `frequency_penalty` | `float` | `NOT_GIVEN` | Penalty for frequent tokens (-2.0 to 2.0). Positive values discourage repetition. |
| `presence_penalty` | `float` | `NOT_GIVEN` | Penalty for new topics (-2.0 to 2.0). Positive values encourage the model to talk about new topics. |
| `seed` | `int` | `NOT_GIVEN` | Random seed for deterministic outputs. |
| `max_completion_tokens` | `int` | `NOT_GIVEN` | Maximum completion tokens to generate. |
`NOT_GIVEN` values are omitted from the API request entirely, letting the
OpenAI API use its own defaults. This is different from `None`, which would be
sent explicitly.
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.openai import OpenAILLMService
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o",
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.openai import OpenAILLMService
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
model="gpt-4.1",
temperature=0.7,
max_completion_tokens=1000,
frequency_penalty=0.5,
),
)
```
### Updating Settings at Runtime
Model settings can be changed mid-conversation using `LLMUpdateSettingsFrame`:
```python theme={null}
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.openai.base_llm import OpenAILLMSettings
await task.queue_frame(
LLMUpdateSettingsFrame(
delta=OpenAILLMSettings(
temperature=0.3,
max_completion_tokens=500,
)
)
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **OpenAI-compatible providers**: Many third-party LLM providers offer OpenAI-compatible APIs. You can use `OpenAILLMService` with them by setting `base_url` to the provider's endpoint.
* **Retry behavior**: When `retry_on_timeout=True`, the first attempt uses the `retry_timeout_secs` timeout. If it times out, a second attempt is made with no timeout limit.
* **Function calling**: Supports OpenAI's tool/function calling format. Register function handlers on the pipeline task to handle tool calls automatically.
* **System instruction precedence**: If both `system_instruction` (from the constructor) and a system message in the context are set, the constructor's `system_instruction` takes precedence and a warning is logged.
## Event Handlers
`OpenAILLMService` supports the following event handlers, inherited from [LLMService](/api-reference/server/events/service-events):
| Event | Description |
| --------------------------- | ----------------------------------------------------------------------- |
| `on_completion_timeout` | Called when an LLM completion request times out |
| `on_function_calls_started` | Called when function calls are received and execution is about to start |
```python theme={null}
@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
print("LLM completion timed out")
```
# OpenAI Responses
Source: https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses
Large Language Model services using OpenAI's Responses API
## Overview
`OpenAIResponsesLLMService` provides chat completion capabilities using OpenAI's Responses API, supporting streaming text responses, function calling, usage metrics, and out-of-band inference. This service works with the universal `LLMContext` and `LLMContextAggregatorPair`.
The Responses API is a newer OpenAI API designed for conversational AI
applications. It differs from the Chat Completions API in its request/response
structure and streaming format. See [OpenAI Responses API
documentation](https://platform.openai.com/docs/api-reference/responses) for
more details.
Pipecat's API methods for OpenAI Responses integration
Interruptible conversation example
Official OpenAI Responses API documentation
Access models and manage API keys
## Installation
To use OpenAI services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[openai]"
```
## Prerequisites
### OpenAI Account Setup
Before using OpenAI Responses LLM services, you need:
1. **OpenAI Account**: Sign up at [OpenAI Platform](https://platform.openai.com/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Selection**: Choose from available models (GPT-4.1, GPT-4o, GPT-4o-mini, etc.)
4. **Usage Limits**: Set up billing and usage limits as needed
### Required Environment Variables
* `OPENAI_API_KEY`: Your OpenAI API key for authentication
## Configuration
OpenAI API key. If `None`, uses the `OPENAI_API_KEY` environment variable.
Custom base URL for the OpenAI API. Override for proxied or self-hosted
deployments.
OpenAI organization ID.
OpenAI project ID.
Additional HTTP headers to include in every request.
Service tier to use (e.g., "auto", "flex", "priority").
Runtime-configurable model settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `OpenAIResponsesLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ----------------------- | ------- | ----------- | --------------------------------------------------------------------------------------------------- |
| `model` | `str` | `"gpt-4.1"` | OpenAI model identifier. *(Inherited from base settings.)* |
| `system_instruction` | `str` | `None` | System instruction/prompt for the model. *(Inherited from base settings.)* |
| `temperature` | `float` | `NOT_GIVEN` | Sampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative. |
| `top_p` | `float` | `NOT_GIVEN` | Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output. |
| `frequency_penalty` | `float` | `None` | Penalty for frequent tokens (-2.0 to 2.0). Positive values discourage repetition. |
| `presence_penalty` | `float` | `None` | Penalty for new topics (-2.0 to 2.0). Positive values encourage the model to talk about new topics. |
| `seed` | `int` | `None` | Random seed for deterministic outputs. |
| `max_completion_tokens` | `int` | `NOT_GIVEN` | Maximum completion tokens to generate. |
`NOT_GIVEN` values are omitted from the API request entirely, letting the
OpenAI API use its own defaults. This is different from `None`, which would be
sent explicitly.
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService
llm = OpenAIResponsesLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAIResponsesLLMService.Settings(
model="gpt-4.1",
system_instruction="You are a helpful assistant.",
),
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.openai.responses.llm import (
OpenAIResponsesLLMService,
OpenAIResponsesLLMSettings,
)
llm = OpenAIResponsesLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAIResponsesLLMSettings(
model="gpt-4.1",
temperature=0.7,
max_completion_tokens=1000,
frequency_penalty=0.5,
),
)
```
### Updating Settings at Runtime
Model settings can be changed mid-conversation using `LLMUpdateSettingsFrame`:
```python theme={null}
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.openai.responses.llm import OpenAIResponsesLLMSettings
await task.queue_frame(
LLMUpdateSettingsFrame(
delta=OpenAIResponsesLLMSettings(
temperature=0.3,
max_completion_tokens=500,
)
)
)
```
### Out-of-Band Inference
Run a one-shot inference without pushing frames through the pipeline:
```python theme={null}
from pipecat.processors.aggregators.llm_context import LLMContext
context = LLMContext()
context.add_user_message("What is the capital of France?")
response = await llm.run_inference(
context=context,
max_tokens=100,
system_instruction="You are a helpful geography assistant.",
)
print(response) # "The capital of France is Paris."
```
## Notes
* **Responses API vs Chat Completions API**: The Responses API has a different request/response structure compared to the Chat Completions API. Use `OpenAILLMService` for the Chat Completions API and `OpenAIResponsesLLMService` for the Responses API.
* **Universal LLM Context**: This service works with the universal `LLMContext` and `LLMContextAggregatorPair`, making it easy to switch between different LLM providers.
* **Function calling**: Supports OpenAI's tool/function calling format. Register function handlers on the pipeline task to handle tool calls automatically.
* **Usage metrics**: Automatically tracks token usage, including cached tokens and reasoning tokens.
* **Service tiers**: Supports OpenAI's service tier system for prioritizing requests.
## Event Handlers
`OpenAIResponsesLLMService` supports the following event handlers, inherited from [LLMService](/api-reference/server/events/service-events):
| Event | Description |
| --------------------------- | ----------------------------------------------------------------------- |
| `on_completion_timeout` | Called when an LLM completion request times out |
| `on_function_calls_started` | Called when function calls are received and execution is about to start |
```python theme={null}
@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
print("LLM completion timed out")
```
# OpenPipe
Source: https://docs.pipecat.ai/api-reference/server/services/llm/openpipe
LLM service implementation using OpenPipe for LLM request logging and fine-tuning
## Overview
`OpenPipeLLMService` extends the BaseOpenAILLMService to provide integration with OpenPipe, enabling request logging, model fine-tuning, and performance monitoring. It maintains compatibility with OpenAI's API while adding OpenPipe's logging and optimization capabilities.
Pipecat's API methods for OpenPipe integration
Browse examples using OpenPipe logging
Official OpenPipe API documentation and features
Access logging and fine-tuning features
## Installation
To use OpenPipe services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[openpipe]"
```
## Prerequisites
### OpenPipe Account Setup
Before using OpenPipe LLM services, you need:
1. **OpenPipe Account**: Sign up at [OpenPipe](https://openpipe.ai/)
2. **API Keys**: Generate both OpenPipe and OpenAI API keys
3. **Project Setup**: Configure logging and fine-tuning projects
### Required Environment Variables
* `OPENPIPE_API_KEY`: Your OpenPipe API key for logging and fine-tuning
* `OPENAI_API_KEY`: Your OpenAI API key for underlying model access
## Configuration
*Deprecated in v0.0.105. Use `settings=OpenPipeLLMService.Settings(model=...)`
instead.*
OpenAI API key for authentication. If not provided, reads from environment.
Custom OpenAI API endpoint URL. Uses the default OpenAI URL if not provided.
OpenPipe API key for request logging and fine-tuning features. If not
provided, reads from environment.
OpenPipe API endpoint URL.
Runtime-configurable settings. See [Settings](#settings) below.
Dictionary of tags to apply to all requests for tracking and filtering in the
OpenPipe dashboard.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `OpenPipeLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
This service uses the same settings as `OpenAILLMService`. See [OpenAI LLM Settings](/api-reference/server/services/llm/openai#settings) for the full parameter reference.
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.openpipe import OpenPipeLLMService
llm = OpenPipeLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
model="gpt-4.1",
)
```
### With Tags for Tracking
```python theme={null}
from pipecat.services.openpipe import OpenPipeLLMService
llm = OpenPipeLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
model="gpt-4.1",
tags={
"environment": "production",
"project": "voice-assistant",
},
)
```
## Notes
* All requests are automatically logged to OpenPipe for monitoring and fine-tuning purposes.
* Tags are included with every request and can be used to filter and organize requests in the OpenPipe dashboard.
* OpenPipe uses its own client (`openpipe.AsyncOpenAI`) instead of the standard OpenAI client to enable transparent request logging.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# OpenRouter
Source: https://docs.pipecat.ai/api-reference/server/services/llm/openrouter
LLM service implementation using OpenRouter's API with OpenAI-compatible interface
## Overview
`OpenRouterLLMService` provides access to OpenRouter's language models through an OpenAI-compatible interface. It inherits from `OpenAILLMService` and supports streaming responses, function calling, and context management with access to multiple model providers through a single API.
Pipecat's API methods for OpenRouter integration
Complete example with function calling
Official OpenRouter API documentation and features
Access multiple model providers and manage API keys
## Installation
To use OpenRouter services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[openrouter]"
```
## Prerequisites
### OpenRouter Account Setup
Before using OpenRouter LLM services, you need:
1. **OpenRouter Account**: Sign up at [OpenRouter](https://openrouter.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Selection**: Choose from hundreds of available models from different providers
4. **Credits**: Add credits to your account for model usage
### Required Environment Variables
* `OPENROUTER_API_KEY`: Your OpenRouter API key for authentication
## Configuration
OpenRouter API key for authentication. If not provided, the client will
attempt to read from environment variables.
*Deprecated in v0.0.105. Use
`settings=OpenRouterLLMService.Settings(model=...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
Base URL for OpenRouter API endpoint.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `OpenRouterLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
This service uses the same settings as `OpenAILLMService`. See [OpenAI LLM Settings](/api-reference/server/services/llm/openai#settings) for the full parameter reference.
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.openrouter import OpenRouterLLMService
llm = OpenRouterLLMService(
api_key=os.getenv("OPENROUTER_API_KEY"),
model="openai/gpt-4o-2024-11-20",
)
```
### With a Different Provider Model
```python theme={null}
from pipecat.services.openrouter import OpenRouterLLMService
llm = OpenRouterLLMService(
api_key=os.getenv("OPENROUTER_API_KEY"),
settings=OpenRouterLLMService.Settings(
model="anthropic/claude-sonnet-4-20250514",
temperature=0.7,
max_completion_tokens=1024,
),
)
```
## Notes
* OpenRouter model identifiers use the `provider/model` format (e.g., `openai/gpt-4o`, `anthropic/claude-sonnet-4-20250514`, `google/gemini-pro`).
* When using Gemini models through OpenRouter, the service automatically handles the constraint that only one system message is allowed by converting additional system messages to user messages.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# Perplexity
Source: https://docs.pipecat.ai/api-reference/server/services/llm/perplexity
LLM service implementation using Perplexity's API with OpenAI-compatible interface
## Overview
`PerplexityLLMService` provides access to Perplexity's language models through an OpenAI-compatible interface. It inherits from `OpenAILLMService` and supports streaming responses and context management, with special handling for Perplexity's incremental token reporting and built-in internet search capabilities.
Pipecat's API methods for Perplexity integration
Complete example with search capabilities
Official Perplexity API documentation and features
Access search-enhanced models and API keys
## Installation
To use Perplexity services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[perplexity]"
```
## Prerequisites
### Perplexity Account Setup
Before using Perplexity LLM services, you need:
1. **Perplexity Account**: Sign up at [Perplexity](https://www.perplexity.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Selection**: Choose from available models with built-in search capabilities
### Required Environment Variables
* `PERPLEXITY_API_KEY`: Your Perplexity API key for authentication
Unlike other LLM services, Perplexity does not support function calling.
Instead, they offer native internet search built in without requiring special
function calls.
## Configuration
Perplexity API key for authentication.
Base URL for Perplexity API endpoint.
*Deprecated in v0.0.105. Use
`settings=PerplexityLLMService.Settings(model=...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `PerplexityLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
This service uses the same settings as `OpenAILLMService`. See [OpenAI LLM Settings](/api-reference/server/services/llm/openai#settings) for the full parameter reference.
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.perplexity import PerplexityLLMService
llm = PerplexityLLMService(
api_key=os.getenv("PERPLEXITY_API_KEY"),
model="sonar",
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.perplexity import PerplexityLLMService
llm = PerplexityLLMService(
api_key=os.getenv("PERPLEXITY_API_KEY"),
settings=PerplexityLLMService.Settings(
model="sonar",
temperature=0.7,
top_p=0.9,
max_tokens=1024,
),
)
```
## Notes
* Perplexity does not support function calling or tools. The service only sends messages to the API, without tool definitions.
* Perplexity uses incremental token reporting. The service accumulates token usage metrics during processing and reports the final totals at the end of each request.
* Perplexity models have built-in internet search capabilities, providing up-to-date information without requiring additional tool configuration.
* **Message transformation**: Perplexity's API enforces stricter constraints than OpenAI on conversation history structure (strict role alternation, no non-initial system messages, last message must be user/tool). The service automatically transforms messages to satisfy these constraints before sending them to the API, so you don't need to manually structure your conversation history.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# Qwen
Source: https://docs.pipecat.ai/api-reference/server/services/llm/qwen
LLM service implementation using Alibaba Cloud's Qwen models through an OpenAI-compatible interface
## Overview
`QwenLLMService` provides access to Alibaba Cloud's Qwen language models through an OpenAI-compatible interface. It inherits from `OpenAILLMService` and supports streaming responses, function calling, and context management, with particularly strong capabilities for Chinese language processing.
Pipecat's API methods for Qwen integration
Complete example with function calling
Official Qwen API documentation and features
Access Qwen models and manage API keys
## Installation
To use Qwen services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[qwen]"
```
## Prerequisites
### Qwen Account Setup
Before using Qwen LLM services, you need:
1. **Alibaba Cloud Account**: Sign up at [Alibaba Cloud](https://www.alibabacloud.com/)
2. **API Key**: Generate an API key from your Model Studio dashboard
3. **Model Selection**: Choose from available Qwen models with multilingual capabilities
### Required Environment Variables
* `QWEN_API_KEY`: Your Qwen API key for authentication
## Configuration
Qwen (DashScope) API key for authentication.
Base URL for Qwen API endpoint.
*Deprecated in v0.0.105. Use `settings=QwenLLMService.Settings(model=...)`
instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `QwenLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
This service uses the same settings as `OpenAILLMService`. See [OpenAI LLM Settings](/api-reference/server/services/llm/openai#settings) for the full parameter reference.
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.qwen import QwenLLMService
llm = QwenLLMService(
api_key=os.getenv("QWEN_API_KEY"),
model="qwen-plus",
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.qwen import QwenLLMService
llm = QwenLLMService(
api_key=os.getenv("QWEN_API_KEY"),
settings=QwenLLMService.Settings(
model="qwen-plus",
temperature=0.7,
top_p=0.9,
max_completion_tokens=1024,
),
)
```
## Notes
* Qwen models are particularly strong for Chinese language processing and multilingual tasks.
* Qwen fully supports the OpenAI-compatible parameter set inherited from `OpenAILLMService`.
* The API endpoint uses the DashScope international URL by default. For users in mainland China, you may want to override `base_url` with the domestic endpoint.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# SambaNova
Source: https://docs.pipecat.ai/api-reference/server/services/llm/sambanova
LLM service implementation using SambaNova's API with OpenAI-compatible interface
## Overview
`SambaNovaLLMService` provides access to SambaNova's language models through an OpenAI-compatible interface. It inherits from `OpenAILLMService` and supports streaming responses, function calling, and context management with SambaNova's high-performance inference platform.
Pipecat's API methods for SambaNova integration
Complete example with function calling
Official SambaNova API documentation and features
Access models and manage API keys
## Installation
To use SambaNova services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[sambanova]"
```
## Prerequisites
### SambaNova Account Setup
Before using SambaNova LLM services, you need:
1. **SambaNova Account**: Sign up at [SambaNova Cloud](https://cloud.sambanova.ai/?utm_source=pipecat\&utm_medium=external\&utm_campaign=cloud_signup)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Selection**: Choose from available high-performance models
### Required Environment Variables
* `SAMBANOVA_API_KEY`: Your SambaNova API key for authentication
## Configuration
SambaNova API key for authentication.
*Deprecated in v0.0.105. Use
`settings=SambaNovaLLMService.Settings(model=...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
Base URL for SambaNova API endpoint.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `SambaNovaLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
This service uses the same settings as `OpenAILLMService`. See [OpenAI LLM Settings](/api-reference/server/services/llm/openai#settings) for the full parameter reference.
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.sambanova import SambaNovaLLMService
llm = SambaNovaLLMService(
api_key=os.getenv("SAMBANOVA_API_KEY"),
model="Llama-4-Maverick-17B-128E-Instruct",
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.sambanova import SambaNovaLLMService
llm = SambaNovaLLMService(
api_key=os.getenv("SAMBANOVA_API_KEY"),
settings=SambaNovaLLMService.Settings(
model="Llama-4-Maverick-17B-128E-Instruct",
temperature=0.7,
top_p=0.9,
max_tokens=1024,
),
)
```
## Notes
* SambaNova does not support `frequency_penalty`, `presence_penalty`, or `seed` parameters.
* SambaNova has custom handling for tool call indexing. The service includes compatibility logic for processing function calls from the SambaNova API.
* SambaNova is known for high-throughput inference on large language models.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# Sarvam
Source: https://docs.pipecat.ai/api-reference/server/services/llm/sarvam
Large Language Model services using Sarvam's OpenAI-compatible API
## Overview
`SarvamLLMService` provides chat completion capabilities using Sarvam's API with OpenAI-compatible interface. It supports streaming responses, function calling, and Sarvam-specific features like wiki grounding and configurable reasoning effort levels.
Pipecat's API methods for Sarvam integration
Function calling example with Sarvam
Official Sarvam documentation
Access models and manage API keys
## Installation
To use Sarvam LLM services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[sarvam]"
```
## Prerequisites
### Sarvam Account Setup
Before using Sarvam LLM services, you need:
1. **Sarvam Account**: Sign up at [Sarvam](https://sarvam.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Selection**: Choose from available models (sarvam-30b, sarvam-105b, etc.)
### Required Environment Variables
* `SARVAM_API_KEY`: Your Sarvam API key for authentication
## Configuration
Sarvam API key used for both OpenAI auth and Sarvam subscription header.
Sarvam OpenAI-compatible base URL. Override if using a different endpoint.
Runtime-configurable model settings. See [Settings](#settings) below.
Additional HTTP headers to include in every request.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `SarvamLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ------------------- | ---------------------------------- | -------------- | ------------------------------------------------------------------------------------------------------------ |
| `model` | `str` | `"sarvam-30b"` | Sarvam model identifier. Supported models: `sarvam-30b`, `sarvam-30b-16k`, `sarvam-105b`, `sarvam-105b-32k`. |
| `wiki_grounding` | `bool` | `None` | Enable or disable wiki grounding feature. Sarvam-specific parameter. |
| `reasoning_effort` | `Literal["low", "medium", "high"]` | `None` | Set reasoning effort level. Sarvam-specific parameter. |
| `temperature` | `float` | `NOT_GIVEN` | Sampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative. |
| `max_tokens` | `int` | `NOT_GIVEN` | Maximum tokens to generate. |
| `top_p` | `float` | `NOT_GIVEN` | Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output. |
| `frequency_penalty` | `float` | `NOT_GIVEN` | Penalty for frequent tokens (-2.0 to 2.0). Positive values discourage repetition. |
| `presence_penalty` | `float` | `NOT_GIVEN` | Penalty for new topics (-2.0 to 2.0). Positive values encourage the model to talk about new topics. |
`NOT_GIVEN` values are omitted from the API request entirely, letting the
Sarvam API use its own defaults. This is different from `None`, which would be
sent explicitly.
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.sarvam import SarvamLLMService
llm = SarvamLLMService(
api_key=os.getenv("SARVAM_API_KEY"),
)
```
### With Custom Settings
```python theme={null}
import os
from pipecat.services.sarvam import SarvamLLMService
llm = SarvamLLMService(
api_key=os.getenv("SARVAM_API_KEY"),
settings=SarvamLLMService.Settings(
model="sarvam-105b",
temperature=0.7,
max_tokens=1000,
wiki_grounding=True,
reasoning_effort="high",
),
)
```
### Updating Settings at Runtime
Model settings can be changed mid-conversation using `LLMUpdateSettingsFrame`:
```python theme={null}
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.sarvam.llm import SarvamLLMSettings
await task.queue_frame(
LLMUpdateSettingsFrame(
delta=SarvamLLMSettings(
temperature=0.3,
reasoning_effort="medium",
)
)
)
```
## Notes
* **OpenAI Compatibility**: Sarvam's API is OpenAI-compatible, allowing use of familiar patterns and parameters.
* **Sarvam-Specific Features**: The `wiki_grounding` and `reasoning_effort` parameters are unique to Sarvam and provide additional control over model behavior.
* **Function Calling**: Supports OpenAI-style tool/function calling format. When using `tool_choice`, you must provide a non-empty `tools` list.
* **Unsupported Parameters**: Some OpenAI parameters are not supported by Sarvam's API and are automatically removed from requests: `stream_options`, `max_completion_tokens`, `service_tier`.
## Event Handlers
`SarvamLLMService` supports the following event handlers, inherited from [LLMService](/api-reference/server/events/service-events):
| Event | Description |
| --------------------------- | ----------------------------------------------------------------------- |
| `on_completion_timeout` | Called when an LLM completion request times out |
| `on_function_calls_started` | Called when function calls are received and execution is about to start |
```python theme={null}
@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
print("LLM completion timed out")
```
# Together AI
Source: https://docs.pipecat.ai/api-reference/server/services/llm/together
LLM service implementation using Together AI's API with OpenAI-compatible interface
## Overview
`TogetherLLMService` provides access to Together AI's language models, including Meta's Llama 3.1 and 3.2 models, through an OpenAI-compatible interface. It inherits from `OpenAILLMService` and supports streaming responses, function calling, and context management with optimized open-source model hosting.
Pipecat's API methods for Together AI integration
Complete example with function calling
Official Together AI API documentation and features
Access open-source models and manage API keys
## Installation
To use Together AI services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[together]"
```
## Prerequisites
### Together AI Account Setup
Before using Together AI LLM services, you need:
1. **Together AI Account**: Sign up at [Together AI](https://together.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Selection**: Choose from available open-source models (Llama, Mistral, etc.)
### Required Environment Variables
* `TOGETHER_API_KEY`: Your Together AI API key for authentication
## Configuration
Together AI API key for authentication.
Base URL for Together AI API endpoint.
*Deprecated in v0.0.105. Use `settings=TogetherLLMService.Settings(model=...)`
instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `TogetherLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
This service uses the same settings as `OpenAILLMService`. See [OpenAI LLM Settings](/api-reference/server/services/llm/openai#settings) for the full parameter reference.
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.together import TogetherLLMService
llm = TogetherLLMService(
api_key=os.getenv("TOGETHER_API_KEY"),
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.together import TogetherLLMService
llm = TogetherLLMService(
api_key=os.getenv("TOGETHER_API_KEY"),
settings=TogetherLLMService.Settings(
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
temperature=0.7,
top_p=0.9,
max_completion_tokens=1024,
),
)
```
## Notes
* Together AI hosts a wide variety of open-source models. Model identifiers use the `organization/model-name` format.
* Together AI fully supports the OpenAI-compatible parameter set inherited from `OpenAILLMService`.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# Mem0
Source: https://docs.pipecat.ai/api-reference/server/services/memory/mem0
Long-term conversation memory service powered by Mem0
## Overview
`Mem0MemoryService` provides long-term memory capabilities for conversational agents by integrating with Mem0's API. It automatically stores conversation history and retrieves relevant past context based on the current conversation, enhancing LLM responses with persistent memory across sessions.
Pipecat's API methods for Mem0 memory integration
Browse examples using Mem0 memory
Official Mem0 API documentation and guides
Access memory services and manage API keys
## Installation
To use Mem0 memory services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[mem0]"
```
## Prerequisites
### Mem0 Account Setup
Before using Mem0 memory services, you need:
1. **Mem0 Account**: Sign up at [Mem0 Platform](https://mem0.ai)
2. **API Key**: Generate an API key from your account dashboard
3. **Host Configuration**: Set up your Mem0 host endpoint
4. **User Management**: Configure user IDs for memory association
### Required Environment Variables
* `MEM0_API_KEY`: Your Mem0 API key for authentication
### Configuration Options
* **User ID**: Unique identifier for associating memories with specific users
* **Agent ID**: Identifier for the agent using the memory service
* **Run ID**: Identifier for specific conversation sessions
* **Memory Retrieval**: Configure how past context is retrieved and used
### Key Features
* **Persistent Memory**: Long-term conversation history across sessions
* **Context Retrieval**: Automatic retrieval of relevant past conversations
* **User Association**: Memory tied to specific users for personalization
* **Session Management**: Track conversations across different runs and agents
## Configuration
The API key for accessing Mem0's cloud API.
Local configuration for Mem0 client as an alternative to the cloud API.
The user ID to associate with memories in Mem0. At least one of `user_id`,
`agent_id`, or `run_id` must be provided.
The agent ID to associate with memories in Mem0.
The run ID to associate with memories in Mem0.
Configuration parameters for memory retrieval and storage. See
[InputParams](#inputparams) below.
The host of the Mem0 server.
### InputParams
| Parameter | Type | Default | Description |
| ----------------------- | ------- | --------------------------------------------------- | -------------------------------------------------------------------------------- |
| `search_limit` | `int` | `10` | Maximum number of memories to retrieve per query (min: 1). |
| `search_threshold` | `float` | `0.1` | Minimum similarity threshold for memory retrieval (0.0-1.0). |
| `api_version` | `str` | `"v2"` | API version to use for Mem0 client operations. |
| `system_prompt` | `str` | `"Based on previous conversations, I recall: \n\n"` | Prefix text for memory context messages. |
| `add_as_system_message` | `bool` | `True` | Whether to add memories as system messages. When `False`, adds as user messages. |
| `position` | `int` | `1` | Position to insert memory messages in context. |
## Usage
### Cloud API Setup
```python theme={null}
from pipecat.services.mem0 import Mem0MemoryService
memory = Mem0MemoryService(
api_key=os.getenv("MEM0_API_KEY"),
user_id="user-123",
)
```
### With Custom Parameters
```python theme={null}
memory = Mem0MemoryService(
api_key=os.getenv("MEM0_API_KEY"),
user_id="user-123",
agent_id="assistant-1",
params=Mem0MemoryService.InputParams(
search_limit=5,
search_threshold=0.3,
add_as_system_message=True,
),
)
```
### Local Configuration
```python theme={null}
memory = Mem0MemoryService(
local_config={
"vector_store": {
"provider": "chroma",
"config": {"collection_name": "memories"},
},
},
user_id="user-123",
)
```
### Retrieving Memories Outside the Pipeline
The `get_memories()` method allows you to access stored memories outside the normal pipeline flow, such as when creating a personalized greeting at connection time:
```python theme={null}
# Get all stored memories for the configured user/agent/run IDs
memories = await memory.get_memories()
# Create a personalized greeting
if memories:
greeting = "Hello! Based on our previous conversations, I remember: "
for mem in memories[:3]:
greeting += f"{mem['memory']} "
else:
greeting = "Hello! It's nice to meet you."
```
## Notes
* **At least one ID required**: You must provide at least one of `user_id`, `agent_id`, or `run_id`. A `ValueError` is raised if none are provided.
* **Cloud vs local**: Use `api_key` for Mem0's cloud API, or `local_config` for a self-hosted Mem0 instance using `Memory.from_config()`.
* **Pipeline placement**: Place the `Mem0MemoryService` before your LLM service in the pipeline. It intercepts `LLMContextFrame`, `OpenAILLMContextFrame`, and `LLMMessagesFrame` to enhance context with relevant memories before passing them downstream.
* **Non-blocking operation**: All Mem0 API calls (storage, retrieval, search) run in background threads to avoid blocking the event loop. Message storage is fire-and-forget, so it doesn't delay downstream processing.
* **Message role filtering**: Only messages with `user` or `assistant` roles are stored in Mem0. Messages with other roles (such as `system` or `developer`) are automatically filtered out, as the Mem0 API does not accept them.
# AWS Nova Sonic
Source: https://docs.pipecat.ai/api-reference/server/services/s2s/aws
Real-time speech-to-speech service implementation using AWS Nova Sonic
## Overview
`AWSNovaSonicLLMService` enables natural, real-time conversations with AWS Nova Sonic. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with bidirectional audio streaming, text generation, and function calling capabilities.
Pipecat's API methods for AWS Nova Sonic integration
Complete AWS Nova Sonic conversation example
Official AWS Bedrock and Nova Sonic documentation
Access AWS Bedrock and manage Nova Sonic models
## Installation
To use AWS Nova Sonic services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[aws-nova-sonic]"
```
## Prerequisites
### AWS Account Setup
Before using AWS Nova Sonic services, you need:
1. **AWS Account**: Set up at [AWS Console](https://console.aws.amazon.com/)
2. **Bedrock Access**: Enable AWS Bedrock service in your region
3. **Model Access**: Request access to Nova Sonic models in Bedrock
4. **IAM Credentials**: Configure AWS access keys with Bedrock permissions
### Required Environment Variables
* `AWS_SECRET_ACCESS_KEY`: Your AWS secret access key
* `AWS_ACCESS_KEY_ID`: Your AWS access key ID
* `AWS_REGION`: AWS region where Bedrock is available
### Key Features
* **Real-time Speech-to-Speech**: Direct audio input to audio output processing
* **Built-in Transcription**: Automatic speech-to-text with real-time streaming
* **Voice Activity Detection**: Automatic detection of speech start/stop
* **Function Calling**: Support for external function and API integration
* **Multiple Voices**: Choose from matthew, tiffany, and amy voice options
## Configuration
### AWSNovaSonicLLMService
AWS secret access key for authentication.
AWS access key ID for authentication.
AWS session token for temporary credentials (e.g., when using AWS STS).
AWS region where the service is hosted. Supported regions for Nova 2 Sonic
(default): `"us-east-1"`, `"us-west-2"`, `"ap-northeast-1"`. Supported regions
for Nova Sonic (older model): `"us-east-1"`, `"ap-northeast-1"`.
Model identifier. Use `"amazon.nova-2-sonic-v1:0"` for the latest model or
`"amazon.nova-sonic-v1:0"` for the older model.
*Deprecated in v0.0.105. Use `settings=AWSNovaSonicLLMService.Settings(model=...)` instead.*
Voice ID for speech synthesis. Some voices are designed for specific
languages. See [AWS Nova 2 Sonic voice
support](https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-language-support.html)
for available voices.
*Deprecated in v0.0.105. Use `settings=AWSNovaSonicLLMService.Settings(voice=...)` instead.*
Model parameters for audio configuration and inference. See [Params](#params)
below.
*Deprecated in v0.0.105. Use `settings=AWSNovaSonicLLMService.Settings(...)` for inference settings and `audio_config=AudioConfig(...)` for audio configuration.*
Audio configuration (sample rates, sample sizes, channel counts). If not
provided, defaults are used (16kHz input, 24kHz output, 16-bit, mono). See
[AudioConfig](#audioconfig) below.
Runtime-configurable settings. See [Settings](#settings) below.
System-level instruction for the model.
*Deprecated in v0.0.105. Use `settings=AWSNovaSonicLLMService.Settings(system_instruction=...)` instead.*
Available tools/functions for the model to use.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `AWSNovaSonicLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ------------------------- | ------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `model` | `str` | `NOT_GIVEN` | Model identifier. *(Inherited from base settings.)* |
| `system_instruction` | `str` | `NOT_GIVEN` | System instruction/prompt. *(Inherited from base settings.)* |
| `temperature` | `float` | `NOT_GIVEN` | Sampling temperature for text generation. *(Inherited from base settings.)* |
| `max_tokens` | `int` | `NOT_GIVEN` | Maximum number of tokens to generate. *(Inherited from base settings.)* |
| `top_p` | `float` | `NOT_GIVEN` | Nucleus sampling parameter. *(Inherited from base settings.)* |
| `voice` | `str` | `NOT_GIVEN` | Voice ID for speech synthesis. |
| `endpointing_sensitivity` | `str \| None` | `NOT_GIVEN` | Controls how quickly Nova Sonic decides the user has stopped speaking. Values: `"LOW"`, `"MEDIUM"`, or `"HIGH"`. Only supported with Nova 2 Sonic (default model). |
`NOT_GIVEN` values are omitted, letting the service use its own defaults (e.g.
`"amazon.nova-2-sonic-v1:0"` for model, `"matthew"` for voice, `0.7` for
temperature, `1024` for max\_tokens). Only parameters that are explicitly set
are included.
### AudioConfig
Audio configuration passed via the `audio_config` constructor argument.
| Parameter | Type | Default | Description |
| ---------------------- | ----- | ------- | --------------------------------- |
| `input_sample_rate` | `int` | `16000` | Audio input sample rate in Hz. |
| `input_sample_size` | `int` | `16` | Audio input sample size in bits. |
| `input_channel_count` | `int` | `1` | Number of input audio channels. |
| `output_sample_rate` | `int` | `24000` | Audio output sample rate in Hz. |
| `output_sample_size` | `int` | `16` | Audio output sample size in bits. |
| `output_channel_count` | `int` | `1` | Number of output audio channels. |
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.aws.nova_sonic import AWSNovaSonicLLMService
llm = AWSNovaSonicLLMService(
secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
region=os.getenv("AWS_REGION"),
settings=AWSNovaSonicLLMService.Settings(
voice="matthew",
system_instruction="You are a helpful assistant.",
),
)
```
### With Settings
```python theme={null}
from pipecat.services.aws.nova_sonic import AWSNovaSonicLLMService, AudioConfig
llm = AWSNovaSonicLLMService(
secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
region="us-east-1",
audio_config=AudioConfig(
input_sample_rate=16000,
output_sample_rate=24000,
),
settings=AWSNovaSonicLLMService.Settings(
model="amazon.nova-2-sonic-v1:0",
voice="tiffany",
system_instruction="You are a helpful assistant.",
temperature=0.5,
max_tokens=2048,
endpointing_sensitivity="MEDIUM",
),
)
```
### With Function Calling
```python theme={null}
from pipecat.services.aws.nova_sonic import AWSNovaSonicLLMService
llm = AWSNovaSonicLLMService(
secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
region="us-east-1",
settings=AWSNovaSonicLLMService.Settings(
voice="matthew",
system_instruction="You are a helpful assistant that can check the weather.",
),
tools=tools, # ToolsSchema instance
)
@llm.function("get_weather")
async def get_weather(function_name, tool_call_id, args, llm, context, result_callback):
location = args.get("location", "unknown")
await result_callback({"temperature": 72, "condition": "sunny", "location": location})
```
The `Params` / `params=` pattern is deprecated as of v0.0.105. Use `Settings`
/ `settings=` for inference settings and `AudioConfig` / `audio_config=` for
audio configuration instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Model versions**: Nova 2 Sonic (`amazon.nova-2-sonic-v1:0`) is the default and recommended model. The older Nova Sonic (`amazon.nova-sonic-v1:0`) has fewer features and requires an assistant response trigger mechanism.
* **Endpointing sensitivity**: Only supported with Nova 2 Sonic. Controls how quickly the model decides the user has stopped speaking -- `"HIGH"` causes the model to respond most quickly.
* **Transcription frames**: User speech transcription frames are always emitted upstream. Assistant text transcripts are delivered in real-time using speculative text events, providing text synchronized with audio output for responsive client UIs.
* **Connection resilience**: If a connection error occurs while the service wants to stay connected, it automatically resets the conversation and reconnects.
* **System instruction precedence**: The `system_instruction` from service settings takes precedence over an initial system message in the LLM context. A warning is logged when both are set. Tools provided in the LLM context take precedence over those provided at initialization time.
* **Audio format**: Uses LPCM (Linear PCM) audio format for both input and output. Input defaults to 16kHz and output defaults to 24kHz.
# Gemini Live
Source: https://docs.pipecat.ai/api-reference/server/services/s2s/gemini-live
A real-time, multimodal conversational AI service powered by Google's Gemini
## Overview
`GeminiLiveLLMService` enables natural, real-time conversations with Google's Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with multimodal capabilities including audio, video, and text processing.
Want to start building? Check out our [Gemini Live
Guide](/pipecat/features/gemini-live).
Pipecat's API methods for Gemini Live integration
Complete Gemini Live function calling example
Official Google Gemini Live API documentation
Gemini Live available models
## Installation
To use Gemini Live services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[google]"
```
## Prerequisites
### Google AI Setup
Before using Gemini Live services, you need:
1. **Google Account**: Set up at [Google AI Studio](https://aistudio.google.com/)
2. **API Key**: Generate a Gemini API key from AI Studio
3. **Model Access**: Ensure access to Gemini Live models
4. **Multimodal Configuration**: Set up audio, video, and text modalities
### Required Environment Variables
* `GOOGLE_API_KEY`: Your Google Gemini API key for authentication
### Key Features
* **Multimodal Processing**: Handle audio, video, and text inputs simultaneously
* **Real-time Streaming**: Low-latency audio and video processing
* **Voice Activity Detection**: Automatic speech detection and turn management
* **Function Calling**: Advanced tool integration and API calling capabilities
* **Context Management**: Intelligent conversation history and system instruction handling
## Configuration
### GeminiLiveLLMService
Google AI API key for authentication.
Gemini model identifier to use.
*Deprecated in v0.0.105. Use `settings=GeminiLiveLLMService.Settings(model=...)` instead.*
TTS voice identifier for audio responses.
*Deprecated in v0.0.105. Use `settings=GeminiLiveLLMService.Settings(voice=...)` instead.*
System prompt for the model. Can also be provided via the LLM context.
Tools/functions available to the model. Can also be provided via the LLM
context.
Runtime-configurable generation and session settings. See
[InputParams](#inputparams) below.
*Deprecated in v0.0.105. Use `settings=GeminiLiveLLMService.Settings(...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
Whether to start with audio input paused.
Whether to start with video input paused.
Whether to generate a response when context is first set. Set to `False` to
wait for user input before the model responds.
HTTP options for the Google API client. Use this to set API version (e.g.
`HttpOptions(api_version="v1alpha")`) or other request options.
Base URL for the Gemini File API.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `GeminiLiveLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------------------------- | ---------------------------------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------ |
| `model` | `str` | `NOT_GIVEN` | Model identifier. *(Inherited from base settings.)* |
| `system_instruction` | `str` | `NOT_GIVEN` | System instruction/prompt. *(Inherited from base settings.)* |
| `temperature` | `float` | `NOT_GIVEN` | Sampling temperature (0.0-2.0). *(Inherited from base settings.)* |
| `max_tokens` | `int` | `NOT_GIVEN` | Maximum tokens to generate. *(Inherited from base settings.)* |
| `top_k` | `int` | `NOT_GIVEN` | Top-k sampling parameter. *(Inherited from base settings.)* |
| `top_p` | `float` | `NOT_GIVEN` | Top-p (nucleus) sampling parameter (0.0-1.0). *(Inherited from base settings.)* |
| `frequency_penalty` | `float` | `NOT_GIVEN` | Frequency penalty for generation (0.0-2.0). *(Inherited from base settings.)* |
| `presence_penalty` | `float` | `NOT_GIVEN` | Presence penalty for generation (0.0-2.0). *(Inherited from base settings.)* |
| `voice` | `str` | `NOT_GIVEN` | TTS voice identifier (e.g. `"Charon"`, `"Puck"`). |
| `modalities` | `GeminiModalities` | `NOT_GIVEN` | Response modality: `GeminiModalities.AUDIO` or `GeminiModalities.TEXT`. *Note: TEXT modality may not be supported by recent models.* |
| `language` | `Language \| str` | `NOT_GIVEN` | Language for generation and transcription. |
| `media_resolution` | `GeminiMediaResolution` | `NOT_GIVEN` | Media resolution for video input: `UNSPECIFIED`, `LOW`, `MEDIUM`, or `HIGH`. |
| `vad` | `GeminiVADParams` | `NOT_GIVEN` | Voice activity detection parameters. See [GeminiVADParams](#geminivadparams) below. |
| `context_window_compression` | `ContextWindowCompressionParams \| dict` | `NOT_GIVEN` | Context window compression settings. |
| `thinking` | `ThinkingConfig \| dict` | `NOT_GIVEN` | Thinking/reasoning configuration. Requires a model that supports it. |
| `enable_affective_dialog` | `bool` | `NOT_GIVEN` | Enable affective dialog for expression and tone adaptation. |
| `proactivity` | `ProactivityConfig \| dict` | `NOT_GIVEN` | Proactivity settings for model behavior. |
`NOT_GIVEN` values are omitted, letting the service use its own defaults (e.g.
`"models/gemini-2.5-flash-native-audio-preview-12-2025"` for model, `"Charon"`
for voice, `4096` for max\_tokens). Only parameters that are explicitly set are
included.
### GeminiVADParams
Voice activity detection configuration passed via the `vad` Settings field:
| Parameter | Type | Default | Description |
| --------------------- | ------------------ | ------- | --------------------------------------------------------------------------------------------------------------- |
| `disabled` | `bool` | `None` | Whether to disable server-side VAD. `None`/`False` enables server-side VAD (default), `True` enables local VAD. |
| `start_sensitivity` | `StartSensitivity` | `None` | Sensitivity for speech start detection. |
| `end_sensitivity` | `EndSensitivity` | `None` | Sensitivity for speech end detection. |
| `prefix_padding_ms` | `int` | `None` | Padding before speech starts in milliseconds. |
| `silence_duration_ms` | `int` | `None` | Silence duration threshold in milliseconds to detect speech end. |
### ContextWindowCompressionParams
| Parameter | Type | Default | Description |
| ---------------- | ------ | ------- | ------------------------------------------------------------------------------------ |
| `enabled` | `bool` | `False` | Whether context window compression is enabled. |
| `trigger_tokens` | `int` | `None` | Token count to trigger compression. `None` uses the default (80% of context window). |
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.google.gemini_live import GeminiLiveLLMService
llm = GeminiLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
settings=GeminiLiveLLMService.Settings(
voice="Charon",
system_instruction="You are a helpful assistant.",
),
)
```
### With Settings
```python theme={null}
from pipecat.services.google.gemini_live import (
GeminiLiveLLMService,
GeminiVADParams,
ContextWindowCompressionParams,
)
llm = GeminiLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
settings=GeminiLiveLLMService.Settings(
model="models/gemini-2.5-flash-native-audio-preview-12-2025",
system_instruction="You are a helpful assistant.",
voice="Puck",
temperature=0.7,
max_tokens=2048,
language="en-US",
vad=GeminiVADParams(
silence_duration_ms=500,
),
context_window_compression={"enabled": True},
),
)
```
### With Local VAD
```python theme={null}
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.services.google.gemini_live import (
GeminiLiveLLMService,
GeminiVADParams,
)
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
llm = GeminiLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
settings=GeminiLiveLLMService.Settings(
voice="Charon",
vad=GeminiVADParams(disabled=True), # Disable server-side VAD
),
)
# Configure local VAD in your aggregator
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
vad_analyzer=SileroVADAnalyzer(),
),
)
```
### Text-Only Mode
TEXT modality may not be supported by recent Gemini Live models. The service
will log a warning if you configure `modalities=GeminiModalities.TEXT`.
```python theme={null}
from pipecat.services.google.gemini_live import (
GeminiLiveLLMService,
GeminiModalities,
)
llm = GeminiLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
settings=GeminiLiveLLMService.Settings(
system_instruction="You are a helpful assistant.",
modalities=GeminiModalities.TEXT,
),
)
```
### With Thinking Enabled
```python theme={null}
from google.genai.types import ThinkingConfig
llm = GeminiLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
settings=GeminiLiveLLMService.Settings(
model="models/gemini-2.5-flash-native-audio-preview-12-2025",
system_instruction="You are a helpful assistant.",
thinking=ThinkingConfig(include_thoughts=True),
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Model support**: The service supports both Gemini 2.5 and Gemini 3.x models. The service automatically detects and handles model-specific behavior.
* **System instruction precedence**: The `system_instruction` from service settings takes precedence over an initial system message in the LLM context. A warning is logged when both are set.
* **VAD modes**: By default, Gemini Live uses server-side VAD for detecting when the user starts and stops speaking. To use local VAD (e.g., Silero), set `vad=GeminiVADParams(disabled=True)` and configure an external VAD analyzer in your `LLMUserAggregatorParams`. The service will automatically send activity signals to the Gemini API when local VAD detects speech.
* **Tools precedence**: Similarly, tools provided in the context override tools provided at init time.
* **Transcription aggregation**: Gemini Live sends user transcriptions in small chunks. The service aggregates them into complete sentences using end-of-sentence detection with a 0.5-second timeout fallback.
* **Session resumption**: The service automatically handles session resumption on reconnection using session resumption handles.
* **Connection resilience**: The service will attempt up to 3 consecutive reconnections before treating a connection failure as fatal.
* **Video frame rate**: Video frames are throttled to a maximum of one per second.
* **Affective dialog and proactivity**: These features require both a supporting model and API version (`v1alpha`).
# Gemini Live Vertex AI
Source: https://docs.pipecat.ai/api-reference/server/services/s2s/gemini-live-vertex
A real-time, multimodal conversational AI service powered by Google's Gemini via Vertex AI
## Overview
`GeminiLiveVertexLLMService` enables natural, real-time conversations with Google's Gemini model through Vertex AI. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with multimodal capabilities including audio, video, and text processing.
Want to start building? Check out our [Gemini Live
Guide](/pipecat/features/gemini-live) for general concepts, then follow the
Vertex AI-specific setup below.
Pipecat's API methods for Gemini Live Vertex AI integration
Complete Gemini Live Vertex AI function calling example
Official Vertex AI Gemini Live API documentation
Gemini Live available models
## Installation
To use Gemini Live Vertex AI services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[google]"
```
## Prerequisites
### Google Cloud Setup
Before using Gemini Live Vertex AI services, you need:
1. **Google Cloud Project**: Set up a project in the [Google Cloud Console](https://console.cloud.google.com/)
2. **Vertex AI API**: Enable the Vertex AI API in your project
3. **Service Account**: Create a service account with `roles/aiplatform.user` and `roles/ml.developer` permissions
4. **Authentication**: Set up service account credentials or Application Default Credentials
### Required Environment Variables
* `GOOGLE_VERTEX_TEST_CREDENTIALS`: JSON string of service account credentials (optional if using ADC)
* `GOOGLE_CLOUD_PROJECT_ID`: Your Google Cloud project ID
* `GOOGLE_CLOUD_LOCATION`: Vertex AI region (e.g., "us-east4")
### Key Features
* **Enterprise Authentication**: Secure service account-based authentication
* **Multimodal Processing**: Handle audio, video, and text inputs simultaneously
* **Real-time Streaming**: Low-latency audio and video processing
* **Voice Activity Detection**: Automatic speech detection and turn management
* **Function Calling**: Advanced tool integration and API calling capabilities
* **Context Management**: Intelligent conversation history and system instruction handling
## Configuration
### GeminiLiveVertexLLMService
This service extends `GeminiLiveLLMService` with Vertex AI authentication. It accepts all the same parameters as the [Gemini Live](/api-reference/server/services/s2s/gemini-live) service, with these differences:
JSON string of Google service account credentials. If not provided, falls back
to `credentials_path` or Application Default Credentials (ADC).
Path to a service account JSON file. Used if `credentials` is not provided.
GCP region for the Vertex AI endpoint (e.g., `"us-east4"`).
Google Cloud project ID.
Vertex AI model identifier to use.
*Deprecated in v0.0.105. Use `settings=GeminiLiveVertexLLMService.Settings(model=...)` instead.*
TTS voice identifier for audio responses.
*Deprecated in v0.0.105. Use `settings=GeminiLiveVertexLLMService.Settings(voice=...)` instead.*
System prompt for the model. Can also be provided via the LLM context.
Tools/functions available to the model. Can also be provided via the LLM
context.
Runtime-configurable generation and session settings. See the [Gemini Live
InputParams](/api-reference/server/services/s2s/gemini-live#inputparams) for details.
*Deprecated in v0.0.105. Use `settings=GeminiLiveVertexLLMService.Settings(...)` instead.*
Runtime-configurable settings. See the [Gemini Live
Settings](/api-reference/server/services/s2s/gemini-live#settings) for the full reference.
Whether to start with audio input paused.
Whether to start with video input paused.
Whether to generate a response when context is first set. Set to `False` to
wait for user input before the model responds.
HTTP options for the Google API client.
### Settings
The Vertex AI variant uses the same Settings as the base Gemini Live service. See [Gemini Live Settings](/api-reference/server/services/s2s/gemini-live#settings) for the full reference.
## Usage
### Basic Setup with Service Account Credentials
```python theme={null}
import os
from pipecat.services.google.gemini_live import GeminiLiveVertexLLMService
llm = GeminiLiveVertexLLMService(
credentials=os.getenv("GOOGLE_VERTEX_TEST_CREDENTIALS"),
project_id=os.getenv("GOOGLE_CLOUD_PROJECT_ID"),
location=os.getenv("GOOGLE_CLOUD_LOCATION"),
settings=GeminiLiveVertexLLMService.Settings(
voice="Charon",
system_instruction="You are a helpful assistant.",
),
)
```
### With Credentials File
```python theme={null}
llm = GeminiLiveVertexLLMService(
credentials_path="/path/to/service-account.json",
project_id="my-gcp-project",
location="us-east4",
settings=GeminiLiveVertexLLMService.Settings(
voice="Puck",
system_instruction="You are a helpful assistant.",
),
)
```
### Using Application Default Credentials (ADC)
```python theme={null}
# When running on GCP or with gcloud auth application-default login
llm = GeminiLiveVertexLLMService(
project_id="my-gcp-project",
location="us-east4",
settings=GeminiLiveVertexLLMService.Settings(
system_instruction="You are a helpful assistant.",
),
)
```
### With Settings
```python theme={null}
from pipecat.services.google.gemini_live import GeminiVADParams
llm = GeminiLiveVertexLLMService(
credentials=os.getenv("GOOGLE_VERTEX_TEST_CREDENTIALS"),
project_id=os.getenv("GOOGLE_CLOUD_PROJECT_ID"),
location="us-east4",
settings=GeminiLiveVertexLLMService.Settings(
model="google/gemini-live-2.5-flash-native-audio",
voice="Charon",
system_instruction="You are a helpful assistant.",
temperature=0.7,
max_tokens=2048,
vad=GeminiVADParams(
silence_duration_ms=500,
),
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **No `api_key` parameter**: Unlike the base `GeminiLiveLLMService`, Vertex AI uses service account credentials or ADC for authentication. Passing `api_key` will raise a `ValueError`.
* **Authentication priority**: The service tries credentials in this order: (1) `credentials` JSON string, (2) `credentials_path` file, (3) Application Default Credentials (ADC).
* **File API not supported**: The Gemini File API is not available through Vertex AI. Use Google Cloud Storage for file handling instead.
* **Model naming**: Vertex AI uses different model identifiers (e.g., `"google/gemini-live-2.5-flash-native-audio"`) compared to the Google AI variant.
* **All other features** (VAD, context compression, thinking, function calling, etc.) work identically to the base [Gemini Live](/api-reference/server/services/s2s/gemini-live) service.
# Grok Realtime
Source: https://docs.pipecat.ai/api-reference/server/services/s2s/grok
Real-time speech-to-speech service implementation using xAI's Grok Voice Agent API
## Overview
`GrokRealtimeLLMService` provides real-time, multimodal conversation capabilities using xAI's Grok Voice Agent API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management with low-latency response times.
Pipecat's API methods for Grok Realtime integration
Complete Grok Realtime conversation example
Official xAI Grok Voice Agent API documentation
Access Grok models and manage API keys
## Installation
To use Grok Realtime services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[grok]"
```
## Prerequisites
### xAI Account Setup
Before using Grok Realtime services, you need:
1. **xAI Account**: Sign up at [xAI Console](https://console.x.ai/)
2. **API Key**: Generate a Grok API key from your account dashboard
3. **Model Access**: Ensure access to Grok Voice Agent models
4. **Usage Limits**: Configure appropriate usage limits and billing
### Required Environment Variables
* `XAI_API_KEY`: Your xAI API key for authentication
### Key Features
* **Real-time Speech-to-Speech**: Direct audio processing with low latency
* **Multilingual Support**: Support for multiple languages
* **Voice Activity Detection**: Server-side VAD for automatic speech detection
* **Function Calling**: Seamless support for external functions and tool integration
* **Multiple Voice Options**: Various voice personalities available
* **WebSocket Support**: Real-time bidirectional audio streaming
## Configuration
### GrokRealtimeLLMService
xAI API key for authentication.
WebSocket base URL for the Grok Realtime API. Override for custom deployments.
Configuration properties for the realtime session. If `None`, uses default
`SessionProperties` with voice `"Ara"` and server-side VAD enabled. See
[SessionProperties](#sessionproperties) below.
*Deprecated in v0.0.105. Use `settings=GrokRealtimeLLMService.Settings(session_properties=...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
Whether to start with audio input paused.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `GrokRealtimeLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| -------------------- | ------------------- | ----------- | --------------------------------------------------------------- |
| `model` | `str` | `NOT_GIVEN` | Model identifier. *(Inherited from base settings.)* |
| `system_instruction` | `str` | `NOT_GIVEN` | System instruction/prompt. *(Inherited from base settings.)* |
| `session_properties` | `SessionProperties` | `NOT_GIVEN` | Session-level configuration (voice, audio config, tools, etc.). |
`NOT_GIVEN` values are omitted, letting the service use its own defaults. Only
parameters that are explicitly set are included.
### SessionProperties
| Parameter | Type | Default | Description |
| ---------------- | -------------------------------------------- | ---------------------------------- | ------------------------------------------------------------------------------------- |
| `instructions` | `str` | `None` | System instructions for the assistant. |
| `voice` | `Literal["Ara", "Rex", "Sal", "Eve", "Leo"]` | `"Ara"` | Voice the model uses to respond. |
| `turn_detection` | `TurnDetection` | `TurnDetection(type="server_vad")` | Turn detection configuration. Set to `None` for manual turn detection. |
| `audio` | `AudioConfiguration` | `None` | Configuration for input and output audio formats. |
| `tools` | `List[GrokTool]` | `None` | Available tools: `web_search`, `x_search`, `file_search`, or custom `function` tools. |
### AudioConfiguration
The `audio` field in `SessionProperties` accepts an `AudioConfiguration` with `input` and `output` sub-configurations:
**AudioInput** (`audio.input`):
| Parameter | Type | Default | Description |
| --------- | ------------- | ------- | ------------------------------------------------------------------------------------------------------------------------- |
| `format` | `AudioFormat` | `None` | Input audio format. Supports `PCMAudioFormat` (configurable rate), `PCMUAudioFormat` (8kHz), or `PCMAAudioFormat` (8kHz). |
**AudioOutput** (`audio.output`):
| Parameter | Type | Default | Description |
| --------- | ------------- | ------- | -------------------------------------------------- |
| `format` | `AudioFormat` | `None` | Output audio format. Same format options as input. |
Grok PCM audio supports sample rates: 8000, 16000, 21050, 24000, 32000, 44100, and 48000 Hz.
### Built-in Tools
Grok provides several built-in tools in addition to custom function tools:
| Tool | Type | Description |
| ---------------- | ------------- | ------------------------------------------------------------------ |
| `WebSearchTool` | `web_search` | Search the web for current information |
| `XSearchTool` | `x_search` | Search X (Twitter) for posts. Supports `allowed_x_handles` filter. |
| `FileSearchTool` | `file_search` | Search uploaded document collections by `vector_store_ids` |
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.xai.realtime.llm import GrokRealtimeLLMService
llm = GrokRealtimeLLMService(
api_key=os.getenv("XAI_API_KEY"),
)
```
### With Session Configuration
```python theme={null}
from pipecat.services.xai.realtime.llm import GrokRealtimeLLMService
from pipecat.services.xai.realtime.events import (
SessionProperties,
TurnDetection,
AudioConfiguration,
AudioInput,
AudioOutput,
PCMAudioFormat,
)
session_properties = SessionProperties(
instructions="You are a helpful assistant.",
voice="Rex",
turn_detection=TurnDetection(type="server_vad"),
audio=AudioConfiguration(
input=AudioInput(format=PCMAudioFormat(rate=16000)),
output=AudioOutput(format=PCMAudioFormat(rate=16000)),
),
)
llm = GrokRealtimeLLMService(
api_key=os.getenv("XAI_API_KEY"),
settings=GrokRealtimeLLMService.Settings(
session_properties=session_properties,
),
)
```
### With Built-in Tools
```python theme={null}
from pipecat.services.xai.realtime.llm import GrokRealtimeLLMService
from pipecat.services.xai.realtime.events import (
SessionProperties,
WebSearchTool,
XSearchTool,
)
llm = GrokRealtimeLLMService(
api_key=os.getenv("XAI_API_KEY"),
settings=GrokRealtimeLLMService.Settings(
session_properties=SessionProperties(
instructions="You are a helpful assistant with access to web search.",
voice="Ara",
tools=[
WebSearchTool(),
XSearchTool(allowed_x_handles=["@elonmusk"]),
],
),
),
)
```
### Updating Settings at Runtime
```python theme={null}
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.xai.realtime.llm import GrokRealtimeLLMSettings
from pipecat.services.xai.realtime.events import SessionProperties
await task.queue_frame(
LLMUpdateSettingsFrame(
delta=GrokRealtimeLLMSettings(
session_properties=SessionProperties(
instructions="Now speak in Spanish.",
voice="Eve",
),
)
)
)
```
The deprecated `session_properties` constructor parameter is replaced by
`Settings` as of v0.0.105. Use `Settings` / `settings=` instead. See the
[Service Settings guide](/pipecat/fundamentals/service-settings) for migration
details.
## Notes
* **Audio format auto-configuration**: If audio format is not specified in `session_properties`, the service automatically configures PCM input/output using the pipeline's sample rates.
* **Server-side VAD**: Enabled by default. When VAD is enabled, the server handles speech detection and turn management automatically. Set `turn_detection` to `None` to manage turns manually.
* **Audio before setup**: Audio is not sent to Grok until the conversation setup is complete, preventing sample rate mismatches.
* **Available voices**: Ara (default), Rex, Sal, Eve, and Leo.
* **G.711 support**: PCMU and PCMA formats are supported at a fixed 8000 Hz rate, useful for telephony integrations.
* **System instruction precedence**: The `system_instruction` from service settings takes precedence over an initial system message in the LLM context. A warning is logged when both are set.
## Event Handlers
| Event | Description |
| ------------------------------ | ------------------------------------------------------------- |
| `on_conversation_item_created` | Called when a new conversation item is created in the session |
| `on_conversation_item_updated` | Called when a conversation item is updated or completed |
```python theme={null}
@llm.event_handler("on_conversation_item_created")
async def on_item_created(service, item_id, item):
print(f"New conversation item: {item_id}")
@llm.event_handler("on_conversation_item_updated")
async def on_item_updated(service, item_id, item):
print(f"Conversation item updated: {item_id}")
```
# OpenAI Realtime
Source: https://docs.pipecat.ai/api-reference/server/services/s2s/openai
Real-time speech-to-speech service implementation using OpenAI's Realtime API
## Overview
`OpenAIRealtimeLLMService` provides real-time, multimodal conversation capabilities using OpenAI's Realtime API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management with minimal latency response times.
Pipecat's API methods for OpenAI Realtime integration
Complete OpenAI Realtime conversation example
Official OpenAI Realtime API documentation
Access Realtime models and manage API keys
## Installation
To use OpenAI Realtime services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[openai]"
```
## Prerequisites
### OpenAI Account Setup
Before using OpenAI Realtime services, you need:
1. **OpenAI Account**: Sign up at [OpenAI Platform](https://platform.openai.com/)
2. **API Key**: Generate an OpenAI API key from your account dashboard
3. **Model Access**: Ensure access to GPT-4o Realtime models
4. **Usage Limits**: Configure appropriate usage limits and billing
### Required Environment Variables
* `OPENAI_API_KEY`: Your OpenAI API key for authentication
### Key Features
* **Real-time Speech-to-Speech**: Direct audio processing with minimal latency
* **Advanced Turn Detection**: Multiple voice activity detection options including semantic detection
* **Function Calling**: Seamless support for external functions and APIs
* **Voice Options**: Multiple voice personalities and speaking styles
* **Conversation Management**: Intelligent context handling and conversation flow control
## Configuration
### OpenAIRealtimeLLMService
OpenAI API key for authentication.
OpenAI Realtime model name. This is a connection-level parameter set via the
WebSocket URL and cannot be changed during the session.
*Deprecated in v0.0.105. Use `settings=OpenAIRealtimeLLMService.Settings(model=...)` instead.*
WebSocket base URL for the Realtime API. Override for custom or proxied
deployments.
Configuration properties for the realtime session. These are session-level
settings that can be updated during the session (except for voice and model).
See [SessionProperties](#sessionproperties) below.
*Deprecated in v0.0.105. Use `settings=OpenAIRealtimeLLMService.Settings(session_properties=...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
Whether to start with audio input paused. Useful when you want to control when
audio processing begins.
Whether to start with video input paused.
Detail level for video processing. Can be `"auto"`, `"low"`, or `"high"`.
`"auto"` lets the model decide, `"low"` is faster and uses fewer tokens,
`"high"` provides more detail.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `OpenAIRealtimeLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| -------------------- | ------------------- | ----------- | ------------------------------------------------------------- |
| `model` | `str` | `NOT_GIVEN` | Model identifier. *(Inherited from base settings.)* |
| `system_instruction` | `str` | `NOT_GIVEN` | System instruction/prompt. *(Inherited from base settings.)* |
| `session_properties` | `SessionProperties` | `NOT_GIVEN` | Session-level configuration (modalities, audio, tools, etc.). |
`NOT_GIVEN` values are omitted, letting the service use its own defaults
(`"gpt-realtime-1.5"` for model). Only parameters that are explicitly set are
included.
### SessionProperties
| Parameter | Type | Default | Description |
| ------------------- | ------------------------------------- | ------- | -------------------------------------------------------------------------------------------------------------------- |
| `output_modalities` | `List[Literal["text", "audio"]]` | `None` | Modalities the model can respond with. The API supports single modality responses: either `["text"]` or `["audio"]`. |
| `instructions` | `str` | `None` | System instructions for the assistant. |
| `audio` | `AudioConfiguration` | `None` | Configuration for input and output audio (format, transcription, turn detection, voice, speed). |
| `tools` | `List[Dict]` | `None` | Available function tools for the assistant. |
| `tool_choice` | `Literal["auto", "none", "required"]` | `None` | Tool usage strategy. |
| `max_output_tokens` | `int \| Literal["inf"]` | `None` | Maximum tokens in response, or `"inf"` for unlimited. |
| `tracing` | `Literal["auto"] \| Dict` | `None` | Configuration options for tracing. |
### AudioConfiguration
The `audio` field in `SessionProperties` accepts an `AudioConfiguration` with `input` and `output` sub-configurations:
**AudioInput** (`audio.input`):
| Parameter | Type | Default | Description |
| ----------------- | ------------------------------------------------ | ------- | --------------------------------------------------------------------------------------- |
| `format` | `AudioFormat` | `None` | Input audio format (`PCMAudioFormat`, `PCMUAudioFormat`, or `PCMAAudioFormat`). |
| `transcription` | `InputAudioTranscription` | `None` | Transcription settings: `model` (e.g. `"gpt-4o-transcribe"`), `language`, and `prompt`. |
| `noise_reduction` | `InputAudioNoiseReduction` | `None` | Noise reduction type: `"near_field"` or `"far_field"`. |
| `turn_detection` | `TurnDetection \| SemanticTurnDetection \| bool` | `None` | Turn detection config, or `False` to disable server-side turn detection. |
**AudioOutput** (`audio.output`):
| Parameter | Type | Default | Description |
| --------- | ------------- | ------- | ------------------------------------------------------------------------ |
| `format` | `AudioFormat` | `None` | Output audio format. |
| `voice` | `str` | `None` | Voice the model uses to respond (e.g. `"alloy"`, `"echo"`, `"shimmer"`). |
| `speed` | `float` | `None` | Speed of the model's spoken response. |
### TurnDetection
Server-side VAD configuration via `TurnDetection`:
| Parameter | Type | Default | Description |
| --------------------- | ----------------------- | -------------- | ------------------------------------------------------ |
| `type` | `Literal["server_vad"]` | `"server_vad"` | Detection type. |
| `threshold` | `float` | `0.5` | Voice activity detection threshold (0.0-1.0). |
| `prefix_padding_ms` | `int` | `300` | Padding before speech starts in milliseconds. |
| `silence_duration_ms` | `int` | `500` | Silence duration to detect speech end in milliseconds. |
Alternatively, use `SemanticTurnDetection` for semantic-based detection:
| Parameter | Type | Default | Description |
| -------------------- | ------------------------------------------ | ---------------- | ------------------------------------------------------------ |
| `type` | `Literal["semantic_vad"]` | `"semantic_vad"` | Detection type. |
| `eagerness` | `Literal["low", "medium", "high", "auto"]` | `None` | Turn detection eagerness level. |
| `create_response` | `bool` | `None` | Whether to automatically create responses on turn detection. |
| `interrupt_response` | `bool` | `None` | Whether to interrupt ongoing responses on turn detection. |
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.openai.realtime import OpenAIRealtimeLLMService
llm = OpenAIRealtimeLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-realtime-1.5",
)
```
### With Session Configuration
```python theme={null}
from pipecat.services.openai.realtime import OpenAIRealtimeLLMService
from pipecat.services.openai.realtime.events import (
SessionProperties,
AudioConfiguration,
AudioInput,
AudioOutput,
InputAudioTranscription,
SemanticTurnDetection,
)
session_properties = SessionProperties(
audio=AudioConfiguration(
input=AudioInput(
transcription=InputAudioTranscription(model="gpt-4o-transcribe"),
turn_detection=SemanticTurnDetection(eagerness="medium"),
),
output=AudioOutput(
voice="alloy",
speed=1.0,
),
),
max_output_tokens=4096,
)
llm = OpenAIRealtimeLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAIRealtimeLLMService.Settings(
model="gpt-realtime-1.5",
session_properties=session_properties,
system_instruction="You are a helpful assistant.",
),
)
```
### With Disabled Turn Detection (Manual Control)
```python theme={null}
session_properties = SessionProperties(
audio=AudioConfiguration(
input=AudioInput(
turn_detection=False,
),
),
)
llm = OpenAIRealtimeLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAIRealtimeLLMService.Settings(
model="gpt-realtime-1.5",
session_properties=session_properties,
system_instruction="You are a helpful assistant.",
),
)
```
### Updating Settings at Runtime
```python theme={null}
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.openai.realtime.llm import OpenAIRealtimeLLMService
from pipecat.services.openai.realtime.events import SessionProperties
await task.queue_frame(
LLMUpdateSettingsFrame(
delta=OpenAIRealtimeLLMService.Settings(
system_instruction="Now speak in Spanish.",
session_properties=SessionProperties(
max_output_tokens=2048,
),
)
)
)
```
The deprecated `model` and `session_properties` constructor parameters are
replaced by `Settings` as of v0.0.105. Use `Settings` / `settings=` instead.
See the [Service Settings guide](/pipecat/fundamentals/service-settings) for
migration details.
## Notes
* **Model is connection-level**: The `model` parameter is set via the WebSocket URL at connection time and cannot be changed during a session.
* **Output modalities are single-mode**: The API supports either `["text"]` or `["audio"]` output, not both simultaneously.
* **Turn detection options**: Use `TurnDetection` for traditional VAD, `SemanticTurnDetection` for AI-based turn detection, or `False` to disable server-side detection and manage turns manually.
* **Audio output format**: The service outputs 24kHz PCM audio by default.
* **Video support**: Video frames can be sent to the model for multimodal input. Control the detail level with `video_frame_detail` and pause/resume with `set_video_input_paused()`.
* **Transcription frames**: User speech transcription frames are always emitted upstream when input audio transcription is configured.
* **System instruction precedence**: The `system_instruction` from service settings takes precedence over an initial system message in the LLM context. A warning is logged when both are set.
## Event Handlers
| Event | Description |
| ------------------------------ | ------------------------------------------------------------- |
| `on_conversation_item_created` | Called when a new conversation item is created in the session |
| `on_conversation_item_updated` | Called when a conversation item is updated or completed |
```python theme={null}
@llm.event_handler("on_conversation_item_created")
async def on_item_created(service, item_id, item):
print(f"New conversation item: {item_id}")
@llm.event_handler("on_conversation_item_updated")
async def on_item_updated(service, item_id, item):
print(f"Conversation item updated: {item_id}")
```
# Ultravox Realtime
Source: https://docs.pipecat.ai/api-reference/server/services/s2s/ultravox
Real-time speech-to-speech service implementation using Ultravox's Realtime API
## Overview
`UltravoxRealtimeLLMService` provides real-time conversational AI capabilities using Ultravox's Realtime API. It supports both text and audio modalities with voice transcription, streaming responses, and tool usage for creating interactive AI experiences.
Pipecat's API methods for Ultravox Realtime integration
Complete Ultravox Realtime conversation example
Official Ultravox API documentation
Access Ultravox models and manage API keys
## Installation
To use Ultravox Realtime services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[ultravox]"
```
## Prerequisites
### Ultravox Account Setup
Before using Ultravox Realtime services, you need:
1. **Ultravox Account**: Sign up at [Ultravox Console](https://app.ultravox.ai/)
2. **API Key**: Generate an Ultravox API key from your account dashboard
3. **Model Access**: Ensure access to Ultravox Realtime models
4. **Usage Limits**: Configure appropriate usage limits and billing
### Required Environment Variables
* `ULTRAVOX_API_KEY`: Your Ultravox API key for authentication
### Key Features
* **Audio-Native Model**: Ultravox is an audio-native model for natural voice interactions
* **Real-time Streaming**: Low-latency audio processing and streaming responses
* **Multiple Input Modes**: Support for Agent, One-Shot, and Join URL input parameters
* **Voice Transcription**: Built-in transcription with streaming output
* **Function Calling**: Support for tool integration and API calling
* **Configurable Duration**: Set maximum call duration limits
## Configuration
### UltravoxRealtimeLLMService
Configuration parameters for connecting to Ultravox. One of three input
parameter types must be provided. See [Input Parameter
Types](#input-parameter-types) below.
Tools to use with a one-shot call. May only be set when using
`OneShotInputParams`.
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `UltravoxRealtimeLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| --------------- | ----- | ----------- | --------------------------------------------------------------- |
| `model` | `str` | `NOT_GIVEN` | Model identifier. *(Inherited from base settings.)* |
| `output_medium` | `str` | `NOT_GIVEN` | Output medium: `"voice"` for audio or `"text"` for text output. |
`NOT_GIVEN` values are omitted, letting the service use its own defaults. Only
parameters that are explicitly set are included.
### Input Parameter Types
Ultravox supports three different ways to create or join a call:
#### AgentInputParams
Use a pre-configured Ultravox Agent to handle calls consistently.
| Parameter | Type | Default | Description |
| ------------------ | ---------------- | -------- | ------------------------------------------------------------------------------------------------------------------------ |
| `api_key` | `str` | required | Ultravox API key for authentication. |
| `agent_id` | `UUID` | required | The ID of the Ultravox agent. Create and edit agents in the [Ultravox Console](https://app.ultravox.ai/agents). |
| `template_context` | `Dict[str, Any]` | `{}` | Context variables for agent template instantiation. |
| `metadata` | `Dict[str, str]` | `{}` | Metadata to attach to the call. |
| `max_duration` | `timedelta` | `None` | Maximum call duration (10s to 1h). `None` uses the agent's default. |
| `extra` | `Dict[str, Any]` | `{}` | Extra parameters for the [agent call creation request](https://docs.ultravox.ai/api-reference/agents/agents-calls-post). |
#### OneShotInputParams
Create a one-off call with inline configuration.
| Parameter | Type | Default | Description |
| --------------- | ---------------- | -------- | ---------------------------------------------------------------------------------------------------------- |
| `api_key` | `str` | required | Ultravox API key for authentication. |
| `system_prompt` | `str` | `None` | System prompt to guide the model's behavior. |
| `temperature` | `float` | `0.0` | Sampling temperature for response generation (0.0-1.0). |
| `model` | `str` | `None` | Model identifier to use (e.g., `"fixie-ai/ultravox"`). |
| `voice` | `UUID` | `None` | Voice identifier for speech generation. |
| `metadata` | `Dict[str, str]` | `{}` | Metadata to attach to the call. |
| `max_duration` | `timedelta` | `1 hour` | Maximum call duration (10s to 1h). |
| `extra` | `Dict[str, Any]` | `{}` | Extra parameters for the [call creation request](https://docs.ultravox.ai/api-reference/calls/calls-post). |
#### JoinUrlInputParams
Join an existing Ultravox call using a join URL.
| Parameter | Type | Default | Description |
| ---------- | ----- | -------- | ----------------------------------------------------- |
| `join_url` | `str` | required | The join URL for the existing Ultravox Realtime call. |
## Usage
### Basic Setup with Agent
```python theme={null}
import os
import uuid
from pipecat.services.ultravox import UltravoxRealtimeLLMService, AgentInputParams
llm = UltravoxRealtimeLLMService(
params=AgentInputParams(
api_key=os.getenv("ULTRAVOX_API_KEY"),
agent_id=uuid.UUID("your-agent-id-here"),
),
)
```
### One-Shot Call
```python theme={null}
from pipecat.services.ultravox import UltravoxRealtimeLLMService, OneShotInputParams
llm = UltravoxRealtimeLLMService(
params=OneShotInputParams(
api_key=os.getenv("ULTRAVOX_API_KEY"),
system_prompt="You are a helpful assistant.",
temperature=0.3,
model="fixie-ai/ultravox",
),
)
```
### One-Shot with Tools
```python theme={null}
from pipecat.services.ultravox import UltravoxRealtimeLLMService, OneShotInputParams
llm = UltravoxRealtimeLLMService(
params=OneShotInputParams(
api_key=os.getenv("ULTRAVOX_API_KEY"),
system_prompt="You are a helpful assistant that can check the weather.",
),
one_shot_selected_tools=tools, # ToolsSchema instance
)
@llm.function("get_weather")
async def get_weather(function_name, tool_call_id, args, llm, context, result_callback):
location = args.get("location", "unknown")
await result_callback({"temperature": 72, "condition": "sunny", "location": location})
```
### Join Existing Call
```python theme={null}
from pipecat.services.ultravox import UltravoxRealtimeLLMService, JoinUrlInputParams
llm = UltravoxRealtimeLLMService(
params=JoinUrlInputParams(
join_url="wss://your-ultravox-join-url",
),
)
```
### Switching Output Medium at Runtime
```python theme={null}
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.ultravox.llm import UltravoxRealtimeLLMService
# Switch to text-only output
await task.queue_frame(
LLMUpdateSettingsFrame(
delta=UltravoxRealtimeLLMService.Settings(
output_medium="text",
)
)
)
# Switch back to voice output
await task.queue_frame(
LLMUpdateSettingsFrame(
delta=UltravoxRealtimeLLMService.Settings(
output_medium="voice",
)
)
)
```
## Notes
* **Audio-native model**: Ultravox processes audio directly rather than relying on a separate STT step. Voice transcriptions are provided for reference but may not always align with the model's understanding of user input.
* **Server-side context management**: Ultravox handles conversation context server-side. The LLM context in Pipecat is only used for passing function call results back to the service.
* **Audio sample rate**: The service uses a 48kHz sample rate. Input audio at different sample rates is automatically resampled.
* **Output medium**: The service supports both `"voice"` and `"text"` output modes, switchable at runtime using `LLMUpdateSettingsFrame`.
* **Call duration limits**: When using `AgentInputParams` or `OneShotInputParams`, you can set a maximum call duration between 10 seconds and 1 hour.
* **Tools with agents**: When using `AgentInputParams`, tools are configured on the agent itself. Use `one_shot_selected_tools` only with `OneShotInputParams`.
# ExotelFrameSerializer
Source: https://docs.pipecat.ai/api-reference/server/services/serializers/exotel
Serializer for Exotel WebSocket media streaming protocol
## Overview
`ExotelFrameSerializer` enables integration with Exotel's WebSocket media streaming protocol, allowing your Pipecat application to handle phone calls via Exotel's voice services with bidirectional audio conversion and DTMF event handling for Indian telephony infrastructure.
Pipecat's API methods for Exotel WebSocket integration
Complete telephony examples with Exotel
Official Exotel developer documentation
Manage phone numbers and streaming configuration
## Installation
The `ExotelFrameSerializer` does not require any additional dependencies beyond the core Pipecat library:
```bash theme={null}
pip install "pipecat-ai"
```
## Prerequisites
### Exotel Account Setup
Before using ExotelFrameSerializer, you need:
1. **Exotel Account**: Sign up at [Exotel Console](https://my.exotel.com/)
2. **Phone Number**: Purchase an Exotel phone number with voice capabilities
3. **Media Streaming**: Configure your phone number for WebSocket streaming
4. **Webhook Configuration**: Set up webhook endpoints for call handling
### Required Configuration
* **Stream ID**: Provided by Exotel during WebSocket connection
* **Call SID**: Associated Exotel Call SID (optional)
### Key Features
* **Bidirectional Audio**: Convert between Pipecat and Exotel audio formats
* **DTMF Handling**: Process touch-tone events from callers
* **Indian Telephony**: Optimized for Indian voice infrastructure
* **WebSocket Streaming**: Real-time audio streaming via WebSocket protocol
## Configuration
The Exotel Media Stream SID.
The associated Exotel Call SID (optional).
Configuration parameters for audio settings. See [InputParams](#inputparams)
below.
### InputParams
| Parameter | Type | Default | Description |
| ---------------------- | ------ | ------- | --------------------------------------------------------------------------------------------------- |
| `exotel_sample_rate` | `int` | `8000` | Sample rate used by Exotel (Hz). |
| `sample_rate` | `int` | `None` | Optional override for pipeline input sample rate. When `None`, uses the pipeline's configured rate. |
| `ignore_rtvi_messages` | `bool` | `True` | Whether to ignore RTVI protocol messages during serialization. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.serializers.exotel import ExotelFrameSerializer
from pipecat.transports.network.websocket_server import WebSocketServerTransport
serializer = ExotelFrameSerializer(
stream_sid=stream_sid,
call_sid=call_sid,
)
transport = WebSocketServerTransport(
params=WebSocketServerParams(
audio_out_enabled=True,
add_wav_header=False,
serializer=serializer,
)
)
```
## Notes
* **Linear PCM audio**: Exotel uses raw 16-bit linear PCM audio, not mu-law encoding. The serializer handles resampling between Exotel's sample rate and the pipeline rate.
* **No auto hang-up**: Unlike Twilio and Plivo, the Exotel serializer does not include automatic call termination.
* **DTMF support**: Touch-tone digit events from callers are converted to `InputDTMFFrame` objects.
# GenesysAudioHookSerializer
Source: https://docs.pipecat.ai/api-reference/server/services/serializers/genesys
Serializer for Genesys Cloud AudioHook WebSocket protocol
## Overview
`GenesysAudioHookSerializer` enables integration with Genesys Cloud Contact Center via the AudioHook protocol (v2), allowing your Pipecat application to handle contact center interactions with bidirectional audio streaming, DTMF event handling, barge-in support, and Architect flow variable passing.
Pipecat's API methods for Genesys AudioHook integration
Official Genesys AudioHook protocol reference
## Installation
The `GenesysAudioHookSerializer` does not require any additional dependencies beyond the core Pipecat library:
```bash theme={null}
pip install "pipecat-ai"
```
## Prerequisites
### Genesys Cloud Setup
Before using GenesysAudioHookSerializer, you need:
1. **Genesys Cloud Organization**: Access to a Genesys Cloud org with AudioHook enabled
2. **AudioHook Integration**: Configure an AudioHook integration in Genesys Cloud admin
3. **Architect Flow**: Create an Architect flow that uses the AudioHook action to connect calls to your Pipecat application
4. **WebSocket Endpoint**: A publicly accessible WebSocket endpoint for Genesys to connect to
### Key Features
* **Bidirectional Audio**: Stream audio between Genesys and Pipecat in PCMU format at 8kHz
* **Protocol Handshake**: Automatic handling of open/opened, close/closed, and ping/pong messages
* **DTMF Handling**: Process touch-tone events from callers
* **Barge-in Support**: Notify Genesys when the user interrupts bot audio
* **Pause/Resume**: Handle hold scenarios when audio streaming is temporarily suspended
* **Architect Variables**: Pass input/output variables between Architect flows and your bot
* **Stereo Support**: Process external (customer) audio, internal (agent) audio, or both channels
## Configuration
Configuration parameters for audio and protocol behavior. See
[InputParams](#inputparams) below.
### InputParams
| Parameter | Type | Default | Description |
| ---------------------- | ---------------------- | ------------ | ----------------------------------------------------------------------------------------------------- |
| `genesys_sample_rate` | `int` | `8000` | Sample rate used by Genesys (Hz). |
| `sample_rate` | `int` | `None` | Optional override for pipeline input sample rate. When `None`, uses the pipeline's configured rate. |
| `channel` | `AudioHookChannel` | `"external"` | Which audio channels to process: `"external"` (customer), `"internal"` (agent), or `"both"` (stereo). |
| `media_format` | `AudioHookMediaFormat` | `"PCMU"` | Audio format: `"PCMU"` (mu-law) or `"L16"` (16-bit linear PCM). |
| `process_external` | `bool` | `True` | Whether to process external (customer) audio. |
| `process_internal` | `bool` | `False` | Whether to process internal (agent) audio. |
| `supported_languages` | `list[str]` | `None` | List of language codes the bot supports (e.g., `["en-US", "es-ES"]`). |
| `selected_language` | `str` | `None` | Default language code to use. |
| `start_paused` | `bool` | `False` | Whether to start the session in paused state. |
| `ignore_rtvi_messages` | `bool` | `True` | Whether to ignore RTVI protocol messages during serialization. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.serializers.genesys import GenesysAudioHookSerializer
from pipecat.transports.network.fastapi_websocket import (
FastAPIWebsocketTransport,
FastAPIWebsocketParams,
)
serializer = GenesysAudioHookSerializer()
transport = FastAPIWebsocketTransport(
websocket=websocket,
params=FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
serializer=serializer,
audio_out_fixed_packet_size=1600,
),
)
```
### With Language Support
```python theme={null}
serializer = GenesysAudioHookSerializer(
params=GenesysAudioHookSerializer.InputParams(
supported_languages=["en-US", "es-ES", "fr-FR"],
selected_language="en-US",
)
)
```
### Accessing Call Metadata
After the AudioHook session opens, you can access call metadata from the serializer:
```python theme={null}
# Participant info (ani, dnis, etc.)
participant = serializer.participant
# Custom input variables from the Architect flow
input_vars = serializer.input_variables
# Conversation and session IDs
conversation_id = serializer.conversation_id
session_id = serializer.session_id
```
### Setting Output Variables
Output variables are passed back to the Genesys Architect flow when the session closes, allowing your bot to influence downstream call routing and logic:
```python theme={null}
# Set variables during the conversation
serializer.set_output_variables({
"intent": "billing_inquiry",
"customer_verified": True,
"summary": "Customer asked about their bill",
"transfer_to": "billing_queue",
})
```
### Server-Initiated Disconnect
To disconnect the session from the server side (e.g., when the bot has finished):
```python theme={null}
from pipecat.frames.frames import OutputTransportMessageUrgentFrame
# Send a disconnect message through the pipeline
disconnect_msg = serializer.create_disconnect_message(
reason="completed",
action="transfer",
output_variables={"intent": "resolved"},
)
await task.queue_frame(OutputTransportMessageUrgentFrame(message=disconnect_msg))
```
### Event Handlers
The serializer emits events that you can handle for custom logic:
```python theme={null}
@serializer.event_handler("on_open")
async def on_open(serializer, message):
logger.info(f"Session opened: {serializer.conversation_id}")
@serializer.event_handler("on_close")
async def on_close(serializer, message):
logger.info("Session closing")
@serializer.event_handler("on_dtmf")
async def on_dtmf(serializer, message):
digit = message.get("parameters", {}).get("digit")
logger.info(f"DTMF digit pressed: {digit}")
@serializer.event_handler("on_pause")
async def on_pause(serializer, message):
logger.info("Audio paused (caller on hold)")
```
## Protocol Details
### AudioHook v2 Protocol
The Genesys AudioHook protocol uses WebSocket connections with two frame types:
* **Text frames**: JSON control messages for session lifecycle (open, close, ping, pause, etc.)
* **Binary frames**: Raw audio data in PCMU or L16 format
### Message Flow
A typical session follows this sequence:
1. Genesys connects to your WebSocket endpoint
2. Genesys sends an `open` message with session metadata
3. The serializer automatically responds with `opened`
4. Bidirectional audio streaming begins via binary frames
5. Genesys sends periodic `ping` messages; the serializer responds with `pong`
6. When the call ends, Genesys sends `close`; the serializer responds with `closed` (including any output variables)
### Audio Format
* **Default encoding**: PCMU (mu-law) at 8kHz mono
* **Automatic resampling**: The serializer converts between the 8kHz Genesys format and your pipeline's sample rate using SOXR resampling
* **Stereo handling**: When channel is set to `"both"`, Genesys sends stereo audio with external (customer) on the left channel and internal (agent) on the right. The serializer extracts the external channel for processing.
## Notes
* **Fixed packet size**: Set `audio_out_fixed_packet_size=1600` on your transport parameters. This batches outbound audio into consistent chunks and prevents 429 rate limiting from Genesys.
* **No extra dependencies**: The serializer uses Pipecat's built-in audio conversion utilities (`pcm_to_ulaw`, `ulaw_to_pcm`) and SOXR resampler.
* **Barge-in**: When the pipeline emits an `InterruptionFrame`, the serializer automatically sends a barge-in event to Genesys, which stops any queued audio playback on the Genesys side.
* **Pause/resume**: When Genesys sends a `pause` message (e.g., caller placed on hold), audio processing is suspended. The serializer drops incoming and outgoing audio while paused. Use the `on_pause` event handler and `create_resumed_response()` to control when streaming resumes.
* **Output variables**: Variables set via `set_output_variables()` are included in the `closed` response when Genesys terminates the session. These variables become available in the Architect flow for routing decisions.
* **DTMF support**: Phone keypad events are converted to `InputDTMFFrame` objects and can be processed in your pipeline.
* **L16 format**: While the serializer accepts `AudioHookMediaFormat.L16` as a configuration option, L16 support is not yet fully implemented. Use PCMU (the default) for production deployments.
# Frame Serializer Overview
Source: https://docs.pipecat.ai/api-reference/server/services/serializers/introduction
Overview of frame serializers for converting between Pipecat frames and external protocols
## Overview
Frame serializers are components that convert between Pipecat's internal frame format and external protocols or formats. They're essential when integrating with third-party services or APIs that have their own message formats.
## Core Responsibilities
Serializers handle:
1. **Serialization**: Converting Pipecat frames to external formats or protocols
2. **Deserialization**: Converting external messages to Pipecat frames
3. **Protocol-specific behaviors**: Managing unique aspects of each integration
## Available Serializers
Pipecat includes serializers for popular voice and communications platforms:
For integrating with Exotel WebSocket media streaming
For integrating with Telnyx WebSocket media streaming
For integrating with Telnyx WebSocket media streaming
For integrating with Twilio Media Streams WebSocket protocol
For integrating with Vonage Video API Audio Connector WebSocket protocol
## Custom Serializers
You can create custom serializers by implementing the `FrameSerializer` base class:
```python theme={null}
from pipecat.serializers.base_serializer import FrameSerializer, FrameSerializerType
from pipecat.frames.frames import Frame, StartFrame
class MyCustomSerializer(FrameSerializer):
@property
def type(self) -> FrameSerializerType:
return FrameSerializerType.TEXT # or BINARY
async def setup(self, frame: StartFrame):
# Initialize with pipeline configuration
pass
async def serialize(self, frame: Frame) -> str | bytes | None:
# Convert Pipecat frame to external format
pass
async def deserialize(self, data: str | bytes) -> Frame | None:
# Convert external data to Pipecat frame
pass
```
# PlivoFrameSerializer
Source: https://docs.pipecat.ai/api-reference/server/services/serializers/plivo
Serializer for Plivo Audio Streaming WebSocket protocol
## Overview
`PlivoFrameSerializer` enables integration with Plivo's Audio Streaming WebSocket protocol, allowing your Pipecat application to handle phone calls via Plivo's voice services with bidirectional audio conversion, DTMF event handling, and automatic call termination.
Pipecat's API methods for Plivo Audio Streaming integration
Complete telephony examples with Plivo
Official Plivo Audio Streaming documentation
Manage phone numbers and streaming configuration
## Installation
The `PlivoFrameSerializer` does not require any additional dependencies beyond the core Pipecat library:
```bash theme={null}
pip install "pipecat-ai"
```
## Prerequisites
### Plivo Account Setup
Before using PlivoFrameSerializer, you need:
1. **Plivo Account**: Sign up at [Plivo Console](https://console.plivo.com/)
2. **Phone Number**: Purchase a Plivo phone number with voice capabilities
3. **Audio Streaming**: Configure your phone number for WebSocket streaming
4. **Webhook Configuration**: Set up webhook endpoints for call handling
### Required Environment Variables
* `PLIVO_AUTH_ID`: Your Plivo Auth ID for authentication
* `PLIVO_AUTH_TOKEN`: Your Plivo Auth Token for API operations
### Required Configuration
* **Stream ID**: Provided by Plivo during Audio Streaming connection
* **Call ID**: Required for automatic call termination (optional)
### Key Features
* **Bidirectional Audio**: Convert between Pipecat and Plivo audio formats
* **DTMF Handling**: Process touch-tone events from callers
* **Auto Hang-up**: Terminate calls via Plivo's REST API
* **μ-law Encoding**: Handle Plivo's standard audio encoding format
## Configuration
The Plivo Stream ID.
The associated Plivo Call ID. Required when `auto_hang_up` is enabled.
Plivo auth ID. Required when `auto_hang_up` is enabled.
Plivo auth token. Required when `auto_hang_up` is enabled.
Configuration parameters for audio and hang-up behavior. See
[InputParams](#inputparams) below.
### InputParams
| Parameter | Type | Default | Description |
| ---------------------- | ------ | ------- | --------------------------------------------------------------------------------------------------- |
| `plivo_sample_rate` | `int` | `8000` | Sample rate used by Plivo (Hz). |
| `sample_rate` | `int` | `None` | Optional override for pipeline input sample rate. When `None`, uses the pipeline's configured rate. |
| `auto_hang_up` | `bool` | `True` | Whether to automatically terminate the call on `EndFrame` or `CancelFrame`. |
| `ignore_rtvi_messages` | `bool` | `True` | Whether to ignore RTVI protocol messages during serialization. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.serializers.plivo import PlivoFrameSerializer
from pipecat.transports.network.websocket_server import WebSocketServerTransport
serializer = PlivoFrameSerializer(
stream_id=stream_id,
call_id=call_id,
auth_id=os.getenv("PLIVO_AUTH_ID"),
auth_token=os.getenv("PLIVO_AUTH_TOKEN"),
)
transport = WebSocketServerTransport(
params=WebSocketServerParams(
audio_out_enabled=True,
add_wav_header=False,
serializer=serializer,
)
)
```
### Without Auto Hang-up
```python theme={null}
serializer = PlivoFrameSerializer(
stream_id=stream_id,
params=PlivoFrameSerializer.InputParams(
auto_hang_up=False,
),
)
```
## Notes
* **Auto hang-up credentials**: When `auto_hang_up` is enabled (the default), the serializer uses `call_id`, `auth_id`, and `auth_token` to terminate the call via Plivo's REST API. If any are missing, a warning is logged and the hang-up is skipped.
* **Audio format**: Plivo uses 8kHz mu-law (PCMU) audio encoding. The serializer automatically converts between this format and Pipecat's PCM audio.
* **DTMF support**: Touch-tone digit events from callers are converted to `InputDTMFFrame` objects.
# TelnyxFrameSerializer
Source: https://docs.pipecat.ai/api-reference/server/services/serializers/telnyx
Serializer for Telnyx WebSocket media streaming protocol
## Overview
`TelnyxFrameSerializer` enables integration with Telnyx's WebSocket media streaming protocol, allowing your Pipecat application to handle phone calls via Telnyx's voice services with bidirectional audio conversion, DTMF event handling, and support for multiple audio encodings.
Pipecat's API methods for Telnyx WebSocket integration
Complete telephony examples with Telnyx
Official Telnyx media streaming documentation
Manage phone numbers and streaming configuration
## Installation
The `TelnyxFrameSerializer` does not require any additional dependencies beyond the core Pipecat library:
```bash theme={null}
pip install "pipecat-ai"
```
## Prerequisites
### Telnyx Account Setup
Before using TelnyxFrameSerializer, you need:
1. **Telnyx Account**: Sign up at [Telnyx Portal](https://portal.telnyx.com/)
2. **Phone Number**: Purchase a Telnyx phone number with voice capabilities
3. **Media Streaming**: Configure your phone number for WebSocket streaming
4. **Webhook Configuration**: Set up webhook endpoints for call handling
### Required Environment Variables
* `TELNYX_API_KEY`: Your Telnyx API key for authentication and call control
### Required Configuration
* **Stream ID**: Provided by Telnyx during WebSocket connection
* **Audio Encodings**: Configure inbound/outbound encodings (PCMU, PCMA)
* **Call Control ID**: Required for automatic call termination (optional)
### Key Features
* **Bidirectional Audio**: Convert between Pipecat and Telnyx audio formats
* **DTMF Handling**: Process touch-tone events from callers
* **Auto Hang-up**: Terminate calls via Telnyx's REST API
* **Multiple Encodings**: Support for PCMU and PCMA audio formats
## Configuration
The Telnyx Stream ID.
The encoding type for outbound audio received from Telnyx (e.g., `"PCMU"`,
`"PCMA"`).
The encoding type for inbound audio sent to Telnyx (e.g., `"PCMU"`, `"PCMA"`).
The Telnyx Call Control ID. Required when `auto_hang_up` is enabled.
Telnyx API key. Required when `auto_hang_up` is enabled.
Configuration parameters for audio and hang-up behavior. See
[InputParams](#inputparams) below.
### InputParams
| Parameter | Type | Default | Description |
| -------------------- | ------ | -------- | --------------------------------------------------------------------------------------------------- |
| `telnyx_sample_rate` | `int` | `8000` | Sample rate used by Telnyx (Hz). |
| `sample_rate` | `int` | `None` | Optional override for pipeline input sample rate. When `None`, uses the pipeline's configured rate. |
| `inbound_encoding` | `str` | `"PCMU"` | Audio encoding for data sent to Telnyx. |
| `outbound_encoding` | `str` | `"PCMU"` | Audio encoding for data received from Telnyx. |
| `auto_hang_up` | `bool` | `True` | Whether to automatically terminate the call on `EndFrame` or `CancelFrame`. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.serializers.telnyx import TelnyxFrameSerializer
from pipecat.transports.network.websocket_server import WebSocketServerTransport
serializer = TelnyxFrameSerializer(
stream_id=stream_id,
outbound_encoding="PCMU",
inbound_encoding="PCMU",
call_control_id=call_control_id,
api_key=os.getenv("TELNYX_API_KEY"),
)
transport = WebSocketServerTransport(
params=WebSocketServerParams(
audio_out_enabled=True,
add_wav_header=False,
serializer=serializer,
)
)
```
### Without Auto Hang-up
```python theme={null}
serializer = TelnyxFrameSerializer(
stream_id=stream_id,
outbound_encoding="PCMU",
inbound_encoding="PCMU",
params=TelnyxFrameSerializer.InputParams(
auto_hang_up=False,
),
)
```
## Notes
* **Multiple audio encodings**: Telnyx supports both PCMU (mu-law) and PCMA (A-law) encodings. The `inbound_encoding` and `outbound_encoding` must be specified as constructor arguments and will override any values set in `InputParams`.
* **Auto hang-up credentials**: When `auto_hang_up` is enabled (the default), `call_control_id` and `api_key` are required to terminate the call via Telnyx's REST API. If missing, a warning is logged and the hang-up is skipped.
* **DTMF support**: Touch-tone digit events from callers are converted to `InputDTMFFrame` objects.
# TwilioFrameSerializer
Source: https://docs.pipecat.ai/api-reference/server/services/serializers/twilio
Serializer for Twilio Media Streams WebSocket protocol
## Overview
`TwilioFrameSerializer` enables integration with Twilio's Media Streams WebSocket protocol, allowing your Pipecat application to handle phone calls via Twilio's voice services with bidirectional audio conversion, DTMF event handling, and automatic call termination.
Pipecat's API methods for Twilio Media Streams integration
Complete telephony examples with Twilio
Official Twilio Media Streams documentation
Manage phone numbers and Media Stream configuration
## Installation
The `TwilioFrameSerializer` does not require any additional dependencies beyond the core Pipecat library:
```bash theme={null}
pip install "pipecat-ai"
```
## Prerequisites
### Twilio Account Setup
Before using TwilioFrameSerializer, you need:
1. **Twilio Account**: Sign up at [Twilio Console](https://console.twilio.com/)
2. **Phone Number**: Purchase a Twilio phone number with voice capabilities
3. **Media Streams**: Configure your phone number to use Media Streams
4. **Webhook Configuration**: Set up webhook endpoints for call handling
### Required Environment Variables
* `TWILIO_ACCOUNT_SID`: Your Twilio Account SID for authentication
* `TWILIO_AUTH_TOKEN`: Your Twilio Auth Token for API operations
### Required Configuration
* **Stream SID**: Provided by Twilio during Media Stream connection
* **Call SID**: Required for automatic call termination (optional)
### Key Features
* **Bidirectional Audio**: Convert between Pipecat and Twilio audio formats
* **DTMF Handling**: Process touch-tone events from callers
* **Auto Hang-up**: Terminate calls via Twilio's REST API
* **μ-law Encoding**: Handle Twilio's standard audio encoding format
## Configuration
The Twilio Media Stream SID.
The associated Twilio Call SID. Required when `auto_hang_up` is enabled.
Twilio account SID. Required when `auto_hang_up` is enabled.
Twilio auth token. Required when `auto_hang_up` is enabled.
Twilio region (e.g., `"au1"`, `"ie1"`). Must be specified together with
`edge`.
Twilio edge location (e.g., `"sydney"`, `"dublin"`). Must be specified
together with `region`.
Configuration parameters for audio and hang-up behavior. See
[InputParams](#inputparams) below.
### InputParams
| Parameter | Type | Default | Description |
| ---------------------- | ------ | ------- | --------------------------------------------------------------------------------------------------- |
| `twilio_sample_rate` | `int` | `8000` | Sample rate used by Twilio (Hz). |
| `sample_rate` | `int` | `None` | Optional override for pipeline input sample rate. When `None`, uses the pipeline's configured rate. |
| `auto_hang_up` | `bool` | `True` | Whether to automatically terminate the call on `EndFrame` or `CancelFrame`. |
| `ignore_rtvi_messages` | `bool` | `True` | Whether to ignore RTVI protocol messages during serialization. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.serializers.twilio import TwilioFrameSerializer
from pipecat.transports.network.websocket_server import WebSocketServerTransport
serializer = TwilioFrameSerializer(
stream_sid=stream_sid,
call_sid=call_sid,
account_sid=os.getenv("TWILIO_ACCOUNT_SID"),
auth_token=os.getenv("TWILIO_AUTH_TOKEN"),
)
transport = WebSocketServerTransport(
params=WebSocketServerParams(
audio_out_enabled=True,
add_wav_header=False,
serializer=serializer,
)
)
```
### Without Auto Hang-up
```python theme={null}
serializer = TwilioFrameSerializer(
stream_sid=stream_sid,
params=TwilioFrameSerializer.InputParams(
auto_hang_up=False,
),
)
```
## Notes
* **Auto hang-up requires credentials**: When `auto_hang_up` is enabled (the default), you must provide `call_sid`, `account_sid`, and `auth_token`. A `ValueError` is raised at initialization if any are missing.
* **Region and edge pairing**: If either `region` or `edge` is specified, both must be provided. Twilio's API uses the FQDN format `api.{edge}.{region}.twilio.com`.
* **Audio format**: Twilio uses 8kHz mu-law (PCMU) audio encoding. The serializer automatically converts between this format and Pipecat's PCM audio.
* **DTMF support**: Touch-tone digit events from callers are converted to `InputDTMFFrame` objects.
# VonageFrameSerializer
Source: https://docs.pipecat.ai/api-reference/server/services/serializers/vonage
Serializer for Vonage Video API Audio Connector WebSocket protocol
## Overview
`VonageFrameSerializer` enables integration with the Vonage Video API Audio Connector WebSocket protocol, allowing Pipecat applications to process real-time audio streams from active Vonage video sessions.
Pipecat's API methods for Vonage Audio Connector Streams integration
End-to-end Pipecat example using Vonage Audio Connector
Official Vonage Video API Audio Connector documentation
Manage Vonage Video API projects
## Installation
The `VonageFrameSerializer` does not require any additional dependencies beyond the core Pipecat library:
```bash theme={null}
pip install "pipecat-ai"
```
## Prerequisites
### Vonage Video API Account Setup
Before using VonageFrameSerializer, you need:
1. **Vonage (TokBox) Account**: Sign up at [Vonage Video API Console](https://tokbox.com/account/)
2. **Vonage Video API Project**: Create a project to obtain Project API Key and Project Secret
3. **Existing Vonage Video Session**: A Vonage session must already exist. Sessions can be created using TokBox Playground or Vonage Video API SDKs
### Required Environment Variables
* `VONAGE_API_KEY`: Your Vonage Video API project key
* `VONAGE_API_SECRET`: Your Vonage Video API project secret
* `VONAGE_SESSION_ID`: The existing routed session ID
* `WS_URI`: Public WebSocket endpoint URI of the server application running Pipecat (e.g. via ngrok)
### Required Configuration
* **WebSocket Endpoint (/ws)**: A WebSocket server application (e.g. FastAPI) running Pipecat that accepts raw PCM audio frames.
* **Audio Connector /connect Request**: Triggers Vonage to open a WebSocket connection to your server and begin streaming audio from the active session.
### Key Features
* **Bidirectional Audio**: Convert between Pipecat and Vonage Audio Connector formats
* **Real-Time AI Pipelines**: Stream live audio into Pipecat and process it through any real-time pipeline configuration supported by the framework
* **Session Control Events**: Handle Vonage Audio Connector JSON events
* **Linear PCM Audio**: Handle raw 16-bit linear PCM audio streams used by the Vonage Video API Audio Connector
## Configuration
Configuration parameters for audio settings. See [InputParams](#inputparams)
below.
### InputParams
| Parameter | Type | Default | Description |
| ---------------------- | ------ | ------- | --------------------------------------------------------------------------------------------------- |
| `vonage_sample_rate` | `int` | `16000` | Sample rate used by Vonage (Hz). Common values: 8000, 16000, 24000. |
| `sample_rate` | `int` | `None` | Optional override for pipeline input sample rate. When `None`, uses the pipeline's configured rate. |
| `ignore_rtvi_messages` | `bool` | `True` | Whether to ignore RTVI protocol messages during serialization. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.serializers.vonage import VonageFrameSerializer
from pipecat.transports.network.websocket_server import WebSocketServerTransport
serializer = VonageFrameSerializer()
transport = WebSocketServerTransport(
params=WebSocketServerParams(
audio_out_enabled=True,
add_wav_header=False,
serializer=serializer,
)
)
```
### With Custom Sample Rate
```python theme={null}
serializer = VonageFrameSerializer(
params=VonageFrameSerializer.InputParams(
vonage_sample_rate=8000,
),
)
```
## Notes
* **Linear PCM audio**: Unlike Twilio and Plivo, Vonage uses raw 16-bit linear PCM audio (not mu-law encoded). Audio data is sent as binary WebSocket messages rather than base64-encoded JSON.
* **No auto hang-up**: The Vonage serializer does not include automatic call termination. Session lifecycle is managed through the Vonage Video API.
* **Event handling**: The serializer handles Vonage-specific WebSocket events including `websocket:connected`, `websocket:cleared`, `websocket:notify`, and `websocket:dtmf`.
* **DTMF support**: Touch-tone digit events are converted to `InputDTMFFrame` objects.
# AssemblyAI
Source: https://docs.pipecat.ai/api-reference/server/services/stt/assemblyai
Speech-to-text service implementation using AssemblyAI's real-time transcription API
## Overview
`AssemblyAISTTService` provides real-time speech recognition using AssemblyAI's WebSocket API with support for interim results, end-of-turn detection, and configurable audio processing parameters for accurate transcription in conversational AI applications.
Pipecat's API methods for AssemblyAI STT integration
Example with AssemblyAI built-in turn detection
U3 Pro streaming documentation and features
Complete U3 Pro streaming API reference
Access API keys and transcription features
## Installation
To use AssemblyAI services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[assemblyai]"
```
## Prerequisites
### AssemblyAI Account Setup
Before using AssemblyAI STT services, you need:
1. **AssemblyAI Account**: Sign up at [AssemblyAI Console](https://www.assemblyai.com/dashboard/signup)
2. **API Key**: Generate an API key from your dashboard
3. **Model Selection**: Choose from available transcription models and features
### Required Environment Variables
* `ASSEMBLYAI_API_KEY`: Your AssemblyAI API key for authentication
## Configuration
### AssemblyAISTTService
AssemblyAI API key for authentication.
Language code for transcription. AssemblyAI currently supports English.
*Deprecated in v0.0.105. Use `settings=AssemblyAISTTService.Settings(...)`
instead.*
WebSocket endpoint URL. Override for custom or proxied deployments.
Audio sample rate in Hz.
Audio encoding format.
Connection configuration parameters. *Deprecated in v0.0.105. Use
`settings=AssemblyAISTTService.Settings(...)` instead. See
[AssemblyAIConnectionParams](#assemblyaiconnectionparams) below for field
mapping.*
Controls turn detection mode. When `True` (Pipecat mode, default): Forces
AssemblyAI to return finals ASAP so Pipecat's turn detection (e.g., Smart
Turn) decides when the user is done. VAD stop sends ForceEndpoint as ceiling.
No UserStarted/StoppedSpeakingFrame emitted from STT. When `False` (AssemblyAI
turn detection mode, u3-rt-pro only): AssemblyAI's model controls turn endings
using built-in turn detection. Uses AssemblyAI API defaults for all parameters
unless explicitly set. Emits UserStarted/StoppedSpeakingFrame from STT.
Whether to interrupt the bot when the user starts speaking in AssemblyAI turn
detection mode (`vad_force_turn_endpoint=False`). Only applies when using
AssemblyAI's built-in turn detection.
Optional format string for speaker labels when diarization is enabled. Use
`{speaker}` for speaker label and `{text}` for transcript text. Example:
`"<{speaker}>{text}{speaker}>"` or `"{speaker}: {text}"`. If None, transcript
text is not modified.
Runtime-configurable settings for the STT service. See [Settings](#settings)
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
### AssemblyAIConnectionParams
`connection_params` is deprecated as of v0.0.105. Use
`settings=AssemblyAISTTService.Settings(...)` instead. The `sample_rate` and
`encoding` fields remain as direct constructor arguments. All other fields
have moved into Settings — `speech_model` maps to `model`.
Connection-level parameters previously passed via the `connection_params` constructor argument.
| Parameter | Type | Default | Description |
| ---------------------------------------- | ----------- | ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `sample_rate` | `int` | `16000` | Audio sample rate in Hz. |
| `encoding` | `Literal` | `"pcm_s16le"` | Audio encoding format. Options: `"pcm_s16le"`, `"pcm_mulaw"`. |
| `end_of_turn_confidence_threshold` | `float` | `None` | Confidence threshold for end-of-turn detection. |
| `min_turn_silence` | `int` | `None` | Minimum silence duration (ms) when confident about end-of-turn. |
| `min_end_of_turn_silence_when_confident` | `int` | `None` | **DEPRECATED**. Use `min_turn_silence` instead. Will be removed in a future version. |
| `max_turn_silence` | `int` | `None` | Maximum silence duration (ms) before forcing end-of-turn. |
| `keyterms_prompt` | `List[str]` | `None` | List of key terms to guide transcription. Will be JSON serialized before sending. |
| `prompt` | `str` | `None` | **BETA**: Optional text prompt to guide transcription. Only used when `speech_model` is `"u3-rt-pro"`. Cannot be used with `keyterms_prompt`. We suggest starting with no prompt. See [AssemblyAI prompting best practices](https://www.assemblyai.com/docs/speech-to-text/streaming/prompting) for guidance. |
| `speech_model` | `Literal` | `"u3-rt-pro"` | **Required**. Speech model to use. Options: `"universal-streaming-english"`, `"universal-streaming-multilingual"`, `"u3-rt-pro"`. Defaults to `"u3-rt-pro"` if not specified. |
| `language_detection` | `bool` | `None` | Enable automatic language detection. Only applicable to `universal-streaming-multilingual`. Turn messages include language information. |
| `format_turns` | `bool` | `True` | Whether to format transcript turns. Only applicable to `universal-streaming-english` and `universal-streaming-multilingual` models. For `u3-rt-pro`, formatting is automatic and built-in. |
| `speaker_labels` | `bool` | `None` | Enable speaker diarization. Final transcripts include a speaker field (e.g., "Speaker A", "Speaker B"). |
| `vad_threshold` | `float` | `None` | Voice activity detection confidence threshold. Only applicable to `u3-rt-pro`. The confidence threshold (0.0 to 1.0) for classifying audio frames as silence. Frames with VAD confidence below this value are considered silent. Increase for noisy environments to reduce false speech detection. Defaults to 0.3 (API default). For best performance when using with external VAD (e.g., Silero), align this value with your VAD's activation threshold. Defaults to `None` (not sent). |
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `AssemblyAISTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------------------------------- | ----------------- | ------------- | ------------------------------------------------------------------------------------------ |
| `model` | `str` | `None` | STT model identifier. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `Language.EN` | Language for speech recognition. *(Inherited from base STT settings.)* |
| `formatted_finals` | `bool` | `True` | Whether to enable transcript formatting. |
| `word_finalization_max_wait_time` | `int` | `None` | Maximum time to wait for word finalization in milliseconds. |
| `end_of_turn_confidence_threshold` | `float` | `None` | Confidence threshold for end-of-turn detection. |
| `min_turn_silence` | `int` | `None` | Minimum silence duration (ms) when confident about end-of-turn. |
| `max_turn_silence` | `int` | `None` | Maximum silence duration (ms) before forcing end-of-turn. |
| `keyterms_prompt` | `List[str]` | `None` | List of key terms to guide transcription. |
| `prompt` | `str` | `None` | Optional text prompt to guide transcription (u3-rt-pro only). |
| `language_detection` | `bool` | `None` | Enable automatic language detection. |
| `format_turns` | `bool` | `True` | Whether to format transcript turns. |
| `speaker_labels` | `bool` | `None` | Enable speaker diarization. |
| `vad_threshold` | `float` | `None` | VAD confidence threshold (0.0–1.0) for classifying audio frames as silence. |
| `domain` | `str` | `None` | Optional domain for specialized recognition modes (e.g., `"medical-v1"` for Medical Mode). |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.assemblyai.stt import AssemblyAISTTService
stt = AssemblyAISTTService(
api_key=os.getenv("ASSEMBLYAI_API_KEY"),
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.assemblyai.stt import AssemblyAISTTService
stt = AssemblyAISTTService(
api_key=os.getenv("ASSEMBLYAI_API_KEY"),
settings=AssemblyAISTTService.Settings(
keyterms_prompt=["Pipecat", "AssemblyAI"],
),
vad_force_turn_endpoint=True,
)
```
### With AssemblyAI Built-in Turn Detection
AssemblyAI's u3-rt-pro model supports built-in turn detection for more natural conversation flow:
```python theme={null}
from pipecat.services.assemblyai.stt import AssemblyAISTTService
stt = AssemblyAISTTService(
api_key=os.getenv("ASSEMBLYAI_API_KEY"),
vad_force_turn_endpoint=False, # Use AssemblyAI's built-in turn detection
settings=AssemblyAISTTService.Settings(
# Optional: Tune turn detection timing
min_turn_silence=100, # Minimum silence (ms) when confident about end-of-turn
max_turn_silence=1000, # Maximum silence (ms) before forcing end-of-turn
),
)
```
### With Speaker Diarization
Enable speaker identification for multi-party conversations:
```python theme={null}
from pipecat.services.assemblyai.stt import AssemblyAISTTService
stt = AssemblyAISTTService(
api_key=os.getenv("ASSEMBLYAI_API_KEY"),
settings=AssemblyAISTTService.Settings(
speaker_labels=True, # Enable speaker diarization
),
speaker_format="{speaker}: {text}", # Format transcripts with speaker labels
)
```
## Notes
* **u3-rt-pro model**: The default model is now `u3-rt-pro`, which provides the best performance and supports built-in turn detection.
* **Turn detection modes**:
* **Pipecat mode** (`vad_force_turn_endpoint=True`, default): Forces AssemblyAI to return finals ASAP so Pipecat's turn detection (e.g., Smart Turn) decides when the user is done. The service sends a `ForceEndpoint` message when VAD detects the user has stopped speaking.
* **AssemblyAI mode** (`vad_force_turn_endpoint=False`, u3-rt-pro only): AssemblyAI's model controls turn endings using built-in turn detection. The service emits `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` based on AssemblyAI's detection.
* **Speaker diarization**: Enable `speaker_labels=True` in Settings to automatically identify different speakers. Final transcripts will include a speaker field (e.g., "Speaker A", "Speaker B"). Use the `speaker_format` parameter to format transcripts with speaker labels.
* **Language detection**: When using `universal-streaming-multilingual` with `language_detection=True`, Turn messages include `language_code` and `language_confidence` fields for automatic language detection.
* **Prompting**: The `prompt` parameter (u3-rt-pro only) allows you to guide transcription for specific names, terms, or domain vocabulary. This is a beta feature - AssemblyAI recommends testing without a prompt first. Cannot be used with `keyterms_prompt`.
* **Dynamic settings updates**: You can update `keyterms_prompt`, `prompt`, `min_turn_silence`, and `max_turn_silence` at runtime using `STTUpdateSettingsFrame` without reconnecting.
The `connection_params=` / `InputParams` / `params=` pattern is deprecated as
of v0.0.105. Use `Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Event Handlers
AssemblyAI STT supports the standard [service connection events](/api-reference/server/events/service-events), plus turn-level events for conversation tracking:
| Event | Description |
| ----------------- | ------------------------------------------------------------- |
| `on_connected` | Connected to AssemblyAI WebSocket |
| `on_disconnected` | Disconnected from AssemblyAI WebSocket |
| `on_end_of_turn` | End of turn detected (fires after final transcript is pushed) |
```python theme={null}
@stt.event_handler("on_connected")
async def on_connected(service):
print("Connected to AssemblyAI")
@stt.event_handler("on_end_of_turn")
async def on_end_of_turn(service, transcript):
print(f"Turn ended: {transcript}")
```
The `on_end_of_turn` event receives `(service, transcript)` where `transcript` is the final transcript text. This event fires after the final transcript is pushed, providing a reliable hook for end-of-turn logic that doesn't race with `TranscriptionFrame`. Works in both Pipecat and AssemblyAI turn detection modes.
# AWS Transcribe
Source: https://docs.pipecat.ai/api-reference/server/services/stt/aws
Speech-to-text service implementation using Amazon Transcribe's real-time transcription API
## Overview
`AWSTranscribeSTTService` provides real-time speech recognition using Amazon Transcribe's WebSocket streaming API with support for interim results, multiple languages, and configurable audio processing parameters for enterprise-grade transcription.
Pipecat's API methods for AWS Transcribe integration
Complete example with AWS services integration
Official AWS Transcribe documentation and features
Access AWS Transcribe services and IAM setup
## Installation
To use AWS Transcribe services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[aws]"
```
## Prerequisites
### AWS Account Setup
Before using AWS Transcribe STT services, you need:
1. **AWS Account**: Sign up at [AWS Console](https://console.aws.amazon.com/)
2. **IAM User**: Create an IAM user with Amazon Transcribe permissions
3. **Credentials**: Set up AWS access keys and region configuration
### Required Environment Variables
* `AWS_ACCESS_KEY_ID`: Your AWS access key ID
* `AWS_SECRET_ACCESS_KEY`: Your AWS secret access key
* `AWS_SESSION_TOKEN`: Session token (if using temporary credentials)
* `AWS_REGION`: AWS region (defaults to "us-east-1")
## Configuration
AWS secret access key. If `None`, uses `AWS_SECRET_ACCESS_KEY` environment
variable.
AWS access key ID. If `None`, uses `AWS_ACCESS_KEY_ID` environment variable.
AWS session token for temporary credentials. If `None`, uses
`AWS_SESSION_TOKEN` environment variable.
AWS region for the service. If `None`, uses `AWS_REGION` environment variable
(defaults to `"us-east-1"`).
Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
rate. AWS Transcribe only supports `8000` or `16000` Hz; other values are
clamped to `16000` Hz at connect time.
Language for transcription. Supports a wide range of languages including
English, Spanish, French, German, and many more. See [AWS Transcribe supported
languages](https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html).
*Deprecated in v0.0.105. Use `settings=AWSTranscribeSTTService.Settings(...)`
instead.*
Runtime-configurable settings for the STT service. See [Settings](#settings)
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `AWSTranscribeSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ------------- | ----------------------------------------------------------------- |
| `model` | `str` | `None` | STT model identifier. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `Language.EN` | Language for transcription. *(Inherited from base STT settings.)* |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.aws.stt import AWSTranscribeSTTService
stt = AWSTranscribeSTTService(
api_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
region=os.getenv("AWS_REGION", "us-east-1"),
)
```
### With Custom Language and Sample Rate
```python theme={null}
from pipecat.services.aws.stt import AWSTranscribeSTTService
from pipecat.transcriptions.language import Language
stt = AWSTranscribeSTTService(
api_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
region="eu-west-1",
sample_rate=8000,
settings=AWSTranscribeSTTService.Settings(
language=Language.ES,
),
)
```
## Notes
* **Supported sample rates**: AWS Transcribe only supports `8000` Hz and `16000` Hz. If a different rate is provided, the service automatically falls back to `16000` Hz with a warning.
* **Pre-signed URL authentication**: The service uses pre-signed URLs for WebSocket authentication rather than passing credentials directly, following AWS best practices.
* **Partial results stabilization**: Enabled by default with `"high"` stability, which reduces changes to interim transcripts at the cost of slightly higher latency.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Event Handlers
AWS Transcribe STT supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| ----------------- | ------------------------------------------ |
| `on_connected` | Connected to AWS Transcribe WebSocket |
| `on_disconnected` | Disconnected from AWS Transcribe WebSocket |
```python theme={null}
@stt.event_handler("on_connected")
async def on_connected(service):
print("Connected to AWS Transcribe")
```
# Azure
Source: https://docs.pipecat.ai/api-reference/server/services/stt/azure
Speech-to-text service using Azure Cognitive Services Speech SDK
## Overview
`AzureSTTService` provides real-time speech recognition using Azure's Cognitive Services Speech SDK with support for continuous recognition, extensive language support, and configurable audio processing for enterprise applications.
Pipecat's API methods for Azure Speech integration
Complete example with Azure services integration
Official Azure Speech Service documentation and features
Create Speech Services resource and get API keys
## Installation
To use Azure Speech services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[azure]"
```
## Prerequisites
### Azure Account Setup
Before using Azure STT services, you need:
1. **Azure Account**: Sign up at [Azure Portal](https://portal.azure.com/)
2. **Speech Services Resource**: Create a Speech Services resource in Azure
3. **API Credentials**: Get your API key and region from the resource
### Required Environment Variables
* `AZURE_SPEECH_API_KEY`: Your Azure Speech API key
* `AZURE_SPEECH_REGION`: Your Azure Speech region (required unless using `private_endpoint`)
## Configuration
Azure Cognitive Services subscription key.
Azure region for the Speech service (e.g., `"eastus"`, `"westus2"`). Required
unless `private_endpoint` is provided.
Language for speech recognition. *Deprecated in v0.0.105. Use
`settings=AzureSTTService.Settings(...)` instead.*
Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
rate.
Private endpoint for STT behind firewall. Enables use in private networks.
When provided, `region` becomes optional (takes priority if both are
specified). See [Azure Speech private link
documentation](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-services-private-link?tabs=portal)
for setup details.
Custom model endpoint ID. Use this for custom speech models deployed in Azure.
Runtime-configurable settings for the STT service. See [Settings](#settings)
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `AzureSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ---------------- | ---------------------------------------------------------------------- |
| `model` | `str` | `None` | STT model identifier. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `Language.EN_US` | Language for speech recognition. *(Inherited from base STT settings.)* |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.azure.stt import AzureSTTService
stt = AzureSTTService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
)
```
### With Custom Language
```python theme={null}
from pipecat.services.azure.stt import AzureSTTService
from pipecat.transcriptions.language import Language
stt = AzureSTTService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region="westus2",
settings=AzureSTTService.Settings(
language=Language.FR,
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **SDK-based (not WebSocket)**: Unlike most other STT services in Pipecat, Azure STT uses the Azure Cognitive Services Speech SDK rather than a raw WebSocket connection. Recognition callbacks run on SDK-managed threads and are bridged to asyncio via `asyncio.run_coroutine_threadsafe`.
* **Continuous recognition**: The service uses Azure's `start_continuous_recognition_async` for always-on transcription. It provides both interim (`recognizing`) and final (`recognized`) results automatically.
* **Custom endpoints**: Use the `endpoint_id` parameter to point to a custom speech model deployed in your Azure subscription for domain-specific accuracy improvements.
* **Region vs private endpoint**: Either `region` or `private_endpoint` must be provided (but not both). If both are specified, `private_endpoint` takes priority and a warning is logged. If neither is provided, a `ValueError` is raised.
# Cartesia
Source: https://docs.pipecat.ai/api-reference/server/services/stt/cartesia
Speech-to-text service implementation using Cartesia's real-time transcription API
## Overview
`CartesiaSTTService` provides real-time speech recognition using Cartesia's WebSocket API with the `ink-whisper` model, supporting streaming transcription with both interim and final results for low-latency applications.
Pipecat's API methods for Cartesia STT integration
Complete example with transcription logging
Official Cartesia STT documentation and features
Access API keys and transcription models
## Installation
To use Cartesia services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[cartesia]"
```
## Prerequisites
### Cartesia Account Setup
Before using Cartesia STT services, you need:
1. **Cartesia Account**: Sign up at [Cartesia](https://cartesia.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Access**: Ensure access to the ink-whisper transcription model
### Required Environment Variables
* `CARTESIA_API_KEY`: Your Cartesia API key for authentication
## Configuration
### CartesiaSTTService
Cartesia API key for authentication.
Custom API endpoint URL. If empty, defaults to `"api.cartesia.ai"`. Override
for proxied deployments.
Audio encoding format.
Audio sample rate in Hz.
Configuration options for the transcription service. *Deprecated in v0.0.105.
Use `settings=CartesiaSTTService.Settings(...)` for model/language and direct
init parameters for encoding/sample\_rate instead.*
Runtime-configurable settings for the STT service. See [Settings](#settings)
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `CartesiaSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | --------------- | ------------------------------------------------------------------------ |
| `model` | `str` | `"ink-whisper"` | The transcription model to use. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `"en"` | Target language for transcription. *(Inherited from base STT settings.)* |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.cartesia.stt import CartesiaSTTService
stt = CartesiaSTTService(
api_key=os.getenv("CARTESIA_API_KEY"),
)
```
### With Custom Options
```python theme={null}
from pipecat.services.cartesia.stt import CartesiaSTTService
stt = CartesiaSTTService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaSTTService.Settings(
model="ink-whisper",
language="es",
),
sample_rate=16000,
)
```
## Notes
* **Inactivity timeout**: Cartesia disconnects WebSocket connections after 3 minutes of inactivity. The timeout resets with each message sent. Silence-based keepalive is enabled by default to prevent disconnections.
* **Auto-reconnect on send**: If the connection is closed (e.g., due to timeout), the service automatically reconnects when the next audio data is sent.
* **Finalize on VAD stop**: When the pipeline's VAD detects the user has stopped speaking, the service sends a `"finalize"` command to flush the transcription session and produce a final result.
The `InputParams` / `params=` / `live_options=` pattern is deprecated as of
v0.0.105. Use `Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Event Handlers
Cartesia STT supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| ----------------- | ------------------------------------ |
| `on_connected` | Connected to Cartesia WebSocket |
| `on_disconnected` | Disconnected from Cartesia WebSocket |
```python theme={null}
@stt.event_handler("on_connected")
async def on_connected(service):
print("Connected to Cartesia STT")
```
# Deepgram
Source: https://docs.pipecat.ai/api-reference/server/services/stt/deepgram
Speech-to-text service implementations using Deepgram's real-time transcription and Flux APIs
## Overview
Deepgram provides four STT service implementations:
* `DeepgramSTTService` for real-time speech recognition using Deepgram's standard WebSocket API with support for interim results, language detection, and voice activity detection (VAD)
* `DeepgramFluxSTTService` for advanced conversational AI with Flux capabilities including intelligent turn detection, eager end-of-turn events, and enhanced speech processing for improved response timing
* `DeepgramSageMakerSTTService` for real-time speech recognition using Deepgram Nova models deployed on AWS SageMaker endpoints via HTTP/2 bidirectional streaming
* `DeepgramFluxSageMakerSTTService` for advanced conversational AI using Deepgram Flux models deployed on AWS SageMaker endpoints with native turn detection and low-latency streaming
Pipecat's API methods for standard Deepgram STT
Pipecat's API methods for Deepgram Flux STT
Complete example with standard Deepgram STT
Complete example with Deepgram Flux STT
Complete example with Deepgram Nova on SageMaker
Complete example with Deepgram Flux on SageMaker
Official Deepgram documentation and features
Access API keys and transcription models
## Installation
To use Deepgram STT services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[deepgram]"
```
For the SageMaker variant, install both the Deepgram and SageMaker dependencies:
```bash theme={null}
pip install "pipecat-ai[deepgram,sagemaker]"
```
## Prerequisites
### Deepgram Account Setup
Before using `DeepgramSTTService` or `DeepgramFluxSTTService`, you need:
1. **Deepgram Account**: Sign up at [Deepgram Console](https://console.deepgram.com/signup)
2. **API Key**: Generate an API key from your console dashboard
3. **Model Selection**: Choose from available transcription models and features
### Required Environment Variables
* `DEEPGRAM_API_KEY`: Your Deepgram API key for authentication
### AWS SageMaker Setup
Before using `DeepgramSageMakerSTTService` or `DeepgramFluxSageMakerSTTService`, you need:
1. **AWS Account**: With credentials configured (via environment variables, AWS CLI, or instance metadata)
2. **SageMaker Endpoint**: A deployed SageMaker endpoint with a [Deepgram model](https://developers.deepgram.com/docs/deploy-amazon-sagemaker) (Nova for standard service, Flux for advanced turn detection)
3. **Deepgram SDK**: The Deepgram SDK may be needed for certain advanced configurations
## DeepgramSTTService
Deepgram API key for authentication.
Custom Deepgram API base URL. Leave empty for the default endpoint. Supports
`wss://`, `https://`, `ws://`, `http://`, or bare hostname (defaults to
secure). Preserves the specified scheme, useful for air-gapped or private
deployments that don't use TLS.
Audio encoding format.
Number of audio channels.
Transcribe each audio channel independently.
Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
rate.
Callback URL for async transcription delivery.
HTTP method for the callback (`"GET"` or `"POST"`).
Custom billing tag.
Opt out of the Deepgram Model Improvement Program.
Legacy configuration options. *Deprecated in v0.0.105. Use
`settings=DeepgramSTTService.Settings(...)` for runtime-updatable fields and
direct constructor parameters for connection-level config instead.*
Runtime-configurable settings for the STT service. See [Settings](#settings)
below.
Additional Deepgram features to enable.
Whether to interrupt the bot when Deepgram VAD detects user speech.
*Deprecated in v0.0.99. Will be removed along with `vad_events` support.*
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `DeepgramSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ------------------ | ----------------- | ------------------ | ------------------------------------------------------------ |
| `model` | `str` | `"nova-3-general"` | Deepgram model to use. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `Language.EN` | Recognition language. *(Inherited from base STT settings.)* |
| `detect_entities` | `bool` | `False` | Enable named entity detection. |
| `diarize` | `bool` | `False` | Enable speaker diarization. |
| `dictation` | `bool` | `False` | Enable dictation mode (converts commands to punctuation). |
| `endpointing` | `int \| bool` | `None` | Endpointing sensitivity in ms, or `False` to disable. |
| `interim_results` | `bool` | `True` | Stream partial recognition results. |
| `keyterm` | `str \| list` | `None` | Keyterms to boost recognition accuracy. |
| `keywords` | `str \| list` | `None` | Keywords to boost (str or list of str). |
| `numerals` | `bool` | `False` | Convert spoken numbers to numerals. |
| `profanity_filter` | `bool` | `True` | Filter profanity from transcripts. |
| `punctuate` | `bool` | `True` | Add punctuation to transcripts. |
| `redact` | `str \| list` | `None` | Redact sensitive information. |
| `replace` | `str \| list` | `None` | Word replacement rules. |
| `search` | `str \| list` | `None` | Search terms to highlight. |
| `smart_format` | `bool` | `False` | Apply smart formatting to transcripts. |
| `utterance_end_ms` | `int` | `None` | Silence duration in ms before an utterance-end event. |
| `vad_events` | `bool` | `False` | Enable Deepgram's built-in VAD events (deprecated). |
### Usage
```python theme={null}
from pipecat.services.deepgram.stt import DeepgramSTTService
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
)
```
#### With Custom Settings
```python theme={null}
from pipecat.services.deepgram.stt import DeepgramSTTService
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramSTTService.Settings(
model="nova-3-general",
language="es",
punctuate=True,
smart_format=True,
),
)
```
### Notes
* **Finalize on VAD stop**: When the pipeline's VAD detects the user has stopped speaking, the service sends a [finalize](https://developers.deepgram.com/docs/finalize) request to Deepgram for faster final transcript delivery.
* **Deprecated vad\_events**: The `vad_events` setting is deprecated. Use Silero VAD instead.
* **Multilingual support**: Deepgram Nova models support many languages. The default is `Language.EN` (English). Set `language="multi"` in settings to enable multilingual transcription, which will detect and transcribe multiple languages within the same audio stream.
### Event Handlers
Supports the standard [service connection events](/api-reference/server/events/service-events) (`on_connected`, `on_disconnected`, `on_connection_error`), plus:
| Event | Description |
| ------------------- | ------------------------------------- |
| `on_speech_started` | Speech detected in the audio stream |
| `on_utterance_end` | End of utterance detected by Deepgram |
```python theme={null}
@stt.event_handler("on_speech_started")
async def on_speech_started(service):
print("User started speaking")
@stt.event_handler("on_utterance_end")
async def on_utterance_end(service):
print("Utterance ended")
```
## DeepgramFluxSTTService
Since Deepgram Flux provides its own user turn start and end detection, you
should use `ExternalUserTurnStrategies` to let Flux handle turn management.
See [User Turn
Strategies](/api-reference/server/utilities/turn-management/user-turn-strategies) for
configuration details.
Deepgram API key for authentication.
WebSocket URL for the Deepgram Flux API.
Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
rate.
Deepgram Flux model to use for transcription. *Deprecated in v0.0.105. Use
`settings=DeepgramFluxSTTService.Settings(...)` instead.*
Opt out of the Deepgram Model Improvement Program.
Audio encoding format required by the Flux API. Must be `"linear16"`.
Tags to label requests for identification during usage reporting.
Legacy configuration options. *Deprecated in v0.0.105. Use
`settings=DeepgramFluxSTTService.Settings(...)` instead.*
Configuration settings for the Flux API. See [Settings](#settings-2) below.
Whether the bot should be interrupted when Flux detects user speech.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `DeepgramFluxSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description | On-the-fly |
| --------------------- | ----------------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- |
| `model` | `str` | `"flux-general-en"` | Deepgram Flux model to use. *(Inherited from base STT settings.)* | |
| `language` | `Language \| str` | `None` | Recognition language. *(Inherited from base STT settings.)* | |
| `eager_eot_threshold` | `float` | `None` | EagerEndOfTurn threshold. Lower values trigger faster responses with more LLM calls; higher values are more conservative. `None` disables EagerEndOfTurn. | ✓ |
| `eot_threshold` | `float` | `None` | End-of-turn confidence threshold (default 0.7). Lower = faster turn endings. | ✓ |
| `eot_timeout_ms` | `int` | `None` | Time in ms after speech to finish a turn regardless of confidence (default 5000). | ✓ |
| `keyterm` | `list` | `[]` | Key terms to boost recognition accuracy for specialized terminology. | ✓ |
| `min_confidence` | `float` | `None` | Minimum average confidence required to produce a `TranscriptionFrame`. | |
Parameters marked with ✓ in the "On-the-fly" column can be updated mid-stream
using `STTUpdateSettingsFrame` without requiring a WebSocket reconnect.
### Usage
```python theme={null}
from pipecat.services.deepgram.flux import DeepgramFluxSTTService
stt = DeepgramFluxSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
)
```
#### With EagerEndOfTurn
```python theme={null}
from pipecat.services.deepgram.flux import DeepgramFluxSTTService
stt = DeepgramFluxSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramFluxSTTService.Settings(
eager_eot_threshold=0.5,
eot_threshold=0.8,
keyterm=["Pipecat", "Deepgram"],
),
)
```
#### Updating Settings Mid-Stream
The `keyterm`, `eot_threshold`, `eager_eot_threshold`, and `eot_timeout_ms` settings can be updated on-the-fly using `STTUpdateSettingsFrame`:
```python theme={null}
from pipecat.frames.frames import STTUpdateSettingsFrame
from pipecat.services.deepgram.flux import DeepgramFluxSTTSettings
# During pipeline execution, update settings without reconnecting
await task.queue_frame(
STTUpdateSettingsFrame(
delta=DeepgramFluxSTTSettings(
eot_threshold=0.8,
keyterm=["Pipecat", "Deepgram"],
)
)
)
```
This sends a `Configure` message to Deepgram over the existing WebSocket connection, allowing you to adjust turn detection behavior and key terms without interrupting the conversation.
### Notes
* **Turn management**: Flux provides its own turn detection via `StartOfTurn`/`EndOfTurn` events and broadcasts `UserStartedSpeakingFrame`/`UserStoppedSpeakingFrame` directly. Use `ExternalUserTurnStrategies` to avoid conflicting VAD-based turn management.
* **On-the-fly configuration**: Supports updating `keyterm`, `eot_threshold`, `eager_eot_threshold`, and `eot_timeout_ms` mid-stream via `STTUpdateSettingsFrame`. These updates are sent as `Configure` messages over the existing WebSocket connection without requiring a reconnect.
* **EagerEndOfTurn**: Enabling `eager_eot_threshold` provides faster response times by predicting end-of-turn before it is confirmed. EagerEndOfTurn transcripts are pushed as `InterimTranscriptionFrame`s. If the user resumes speaking, a `TurnResumed` event is fired.
### Event Handlers
Supports the standard [service connection events](/api-reference/server/events/service-events) (`on_connected`, `on_disconnected`, `on_connection_error`), plus turn-level events for more granular conversation tracking:
| Event | Description |
| ---------------------- | ------------------------------------ |
| `on_start_of_turn` | Start of a new turn detected |
| `on_turn_resumed` | A previously paused turn has resumed |
| `on_end_of_turn` | End of turn detected |
| `on_eager_end_of_turn` | Early end-of-turn prediction |
| `on_update` | Transcript updated |
```python theme={null}
@stt.event_handler("on_start_of_turn")
async def on_start_of_turn(service, transcript):
print(f"Turn started: {transcript}")
@stt.event_handler("on_end_of_turn")
async def on_end_of_turn(service, transcript):
print(f"Turn ended: {transcript}")
@stt.event_handler("on_eager_end_of_turn")
async def on_eager_end_of_turn(service, transcript):
print(f"Early end-of-turn prediction: {transcript}")
```
Turn events receive `(service, transcript)` where `transcript` is the current transcript text. The `on_turn_resumed` event receives only `(service)`.
## DeepgramSageMakerSTTService
Name of the SageMaker endpoint with Deepgram model deployed.
AWS region where the SageMaker endpoint is deployed (e.g., `"us-east-2"`).
Audio encoding format.
Number of audio channels.
Transcribe each audio channel independently.
Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
rate.
Opt out of the Deepgram Model Improvement Program.
Legacy configuration options. *Deprecated in v0.0.105. Use
`settings=DeepgramSageMakerSTTService.Settings(...)` instead.*
Runtime-configurable settings for the STT service. See [Settings](#settings-3)
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `DeepgramSageMakerSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
The SageMaker service inherits all settings from `DeepgramSTTService.Settings`. See [DeepgramSTTService Settings](#settings) above for the full list.
### Usage
```python theme={null}
from pipecat.services.deepgram.sagemaker.stt import DeepgramSageMakerSTTService
stt = DeepgramSageMakerSTTService(
endpoint_name=os.getenv("SAGEMAKER_STT_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
settings=DeepgramSageMakerSTTService.Settings(
model="nova-3",
language="en",
interim_results=True,
punctuate=True,
),
)
```
### Notes
* **Finalize on VAD stop**: Like `DeepgramSTTService`, the SageMaker service sends a [finalize](https://developers.deepgram.com/docs/finalize) request when the pipeline's VAD detects the user has stopped speaking.
* **SageMaker deployment**: Requires a Deepgram model deployed to an AWS SageMaker endpoint. See the [Deepgram SageMaker deployment guide](https://developers.deepgram.com/docs/deploy-amazon-sagemaker) for setup instructions.
* **Keepalive**: Automatically sends KeepAlive messages every 5 seconds to maintain the connection during periods of silence.
### Event Handlers
Supports the standard [service connection events](/api-reference/server/events/service-events) (`on_connected`, `on_disconnected`, `on_connection_error`).
## DeepgramFluxSageMakerSTTService
Since Deepgram Flux provides its own user turn start and end detection, you
should use `ExternalUserTurnStrategies` to let Flux handle turn management.
See [User Turn
Strategies](/api-reference/server/utilities/turn-management/user-turn-strategies) for
configuration details.
Name of the SageMaker endpoint with Deepgram Flux model deployed (e.g.,
`"my-deepgram-flux-endpoint"`).
AWS region where the endpoint is deployed (e.g., `"us-east-2"`).
Audio encoding format. Must be `"linear16"`.
Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
rate.
Opt out of the Deepgram Model Improvement Program.
Tags to label requests for identification during usage reporting.
Whether to interrupt the bot when Flux detects user speech.
Runtime-configurable settings. See [Settings](#settings-4) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `DeepgramFluxSageMakerSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
The Flux SageMaker service inherits all settings from `DeepgramFluxSTTService.Settings` with the same on-the-fly configuration support:
| Parameter | Type | Default | Description | On-the-fly |
| --------------------- | ----------------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- |
| `model` | `str` | `"flux-general-en"` | Deepgram Flux model to use. *(Inherited from base STT settings.)* | |
| `language` | `Language \| str` | `Language.EN` | Recognition language. *(Inherited from base STT settings.)* | |
| `eager_eot_threshold` | `float` | `None` | EagerEndOfTurn threshold. Lower values trigger faster responses with more LLM calls; higher values are more conservative. `None` disables EagerEndOfTurn. | ✓ |
| `eot_threshold` | `float` | `None` | End-of-turn confidence threshold (default 0.7). Lower = faster turn endings. | ✓ |
| `eot_timeout_ms` | `int` | `None` | Time in ms after speech to finish a turn regardless of confidence (default 5000). | ✓ |
| `keyterm` | `list` | `[]` | Key terms to boost recognition accuracy for specialized terminology. | ✓ |
| `min_confidence` | `float` | `None` | Minimum average confidence required to produce a `TranscriptionFrame`. | |
Parameters marked with ✓ in the "On-the-fly" column can be updated mid-stream
using `STTUpdateSettingsFrame` without requiring a session restart. These
updates are sent as `Configure` messages to the Flux model over the existing
connection.
### Usage
```python theme={null}
from pipecat.services.deepgram.flux.sagemaker.stt import DeepgramFluxSageMakerSTTService
stt = DeepgramFluxSageMakerSTTService(
endpoint_name=os.getenv("SAGEMAKER_FLUX_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
)
```
#### With Custom Settings
```python theme={null}
from pipecat.services.deepgram.flux.sagemaker.stt import DeepgramFluxSageMakerSTTService
stt = DeepgramFluxSageMakerSTTService(
endpoint_name=os.getenv("SAGEMAKER_FLUX_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
settings=DeepgramFluxSageMakerSTTService.Settings(
model="flux-general-en",
eot_threshold=0.7,
eager_eot_threshold=0.5,
keyterm=["Pipecat", "AI"],
),
)
```
#### Updating Settings Mid-Stream
The `keyterm`, `eot_threshold`, `eager_eot_threshold`, and `eot_timeout_ms` settings can be updated on-the-fly:
```python theme={null}
from pipecat.frames.frames import STTUpdateSettingsFrame
from pipecat.services.deepgram.flux.sagemaker.stt import DeepgramFluxSageMakerSTTSettings
# Update settings without reconnecting
await task.queue_frame(
STTUpdateSettingsFrame(
delta=DeepgramFluxSageMakerSTTSettings(
eot_threshold=0.8,
keyterm=["Pipecat", "Deepgram", "SageMaker"],
)
)
)
```
### Notes
* **Turn management**: Flux provides native turn detection via `StartOfTurn`/`EndOfTurn` events and broadcasts `UserStartedSpeakingFrame`/`UserStoppedSpeakingFrame` directly. Use `ExternalUserTurnStrategies` to avoid conflicting VAD-based turn management.
* **On-the-fly configuration**: Supports updating `keyterm`, `eot_threshold`, `eager_eot_threshold`, and `eot_timeout_ms` mid-stream via `STTUpdateSettingsFrame`. These updates are sent as `Configure` messages over the existing HTTP/2 connection without requiring a reconnect.
* **EagerEndOfTurn**: Enabling `eager_eot_threshold` provides faster response times by predicting end-of-turn before it is confirmed. EagerEndOfTurn transcripts are pushed as `InterimTranscriptionFrame`s. If the user resumes speaking, a `TurnResumed` event is fired.
* **SageMaker deployment**: Requires a Deepgram Flux model deployed to an AWS SageMaker endpoint. Unlike Nova models, Flux provides native turn detection and does not require external VAD.
* **No KeepAlive needed**: The Flux protocol uses a watchdog mechanism that sends silence when needed to maintain the connection, so manual KeepAlive messages are not required.
### Event Handlers
Supports the standard [service connection events](/api-reference/server/events/service-events) (`on_connected`, `on_disconnected`, `on_connection_error`), plus turn-level events for granular conversation tracking:
| Event | Description |
| ---------------------- | ------------------------------------ |
| `on_start_of_turn` | Start of a new turn detected |
| `on_turn_resumed` | A previously paused turn has resumed |
| `on_end_of_turn` | End of turn detected |
| `on_eager_end_of_turn` | Early end-of-turn prediction |
| `on_update` | Transcript updated |
```python theme={null}
@stt.event_handler("on_start_of_turn")
async def on_start_of_turn(service, transcript):
print(f"Turn started: {transcript}")
@stt.event_handler("on_end_of_turn")
async def on_end_of_turn(service, transcript):
print(f"Turn ended: {transcript}")
@stt.event_handler("on_eager_end_of_turn")
async def on_eager_end_of_turn(service, transcript):
print(f"Early end-of-turn prediction: {transcript}")
```
Turn events receive `(service, transcript)` where `transcript` is the current transcript text. The `on_turn_resumed` event receives only `(service)`.
The `InputParams` / `params=` / `live_options=` pattern is deprecated as of
v0.0.105. Use `Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# ElevenLabs
Source: https://docs.pipecat.ai/api-reference/server/services/stt/elevenlabs
Speech-to-text service implementation using ElevenLabs' file-based transcription API
## Overview
ElevenLabs provides two STT service implementations:
* **`ElevenLabsSTTService`** (HTTP) -- File-based transcription using ElevenLabs' Speech-to-Text API with segmented audio processing. Uploads audio files and receives transcription results directly.
* **`ElevenLabsRealtimeSTTService`** (WebSocket) -- Real-time streaming transcription with ultra-low latency, supporting both partial (interim) and committed (final) transcripts with manual or VAD-based commit strategies.
Pipecat's API methods for ElevenLabs STT integration
Complete example with ElevenLabs STT and TTS
Official ElevenLabs STT API documentation
Access API keys and speech-to-text models
## Installation
To use ElevenLabs STT services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[elevenlabs]"
```
## Prerequisites
### ElevenLabs Account Setup
Before using ElevenLabs STT services, you need:
1. **ElevenLabs Account**: Sign up at [ElevenLabs Platform](https://elevenlabs.io/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Access**: Ensure access to the Scribe v2 transcription model (default: `scribe_v2`)
### Required Environment Variables
* `ELEVENLABS_API_KEY`: Your ElevenLabs API key for authentication
## ElevenLabsSTTService
ElevenLabs API key for authentication.
An aiohttp session for HTTP requests. You must create and manage this
yourself.
Base URL for the ElevenLabs API.
Model ID for transcription. *Deprecated in v0.0.105. Use
`settings=ElevenLabsSTTService.Settings(...)` instead.*
Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
rate.
Runtime-configurable settings for the STT service. See [Settings](#settings)
below.
Configuration parameters for the STT service. *Deprecated in v0.0.105. Use
`settings=ElevenLabsSTTService.Settings(...)` instead.*
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `ElevenLabsSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ------------------ | ----------------- | ------------- | ------------------------------------------------------------------------ |
| `model` | `str` | `None` | Model ID for transcription. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `Language.EN` | Target language for transcription. *(Inherited from base STT settings.)* |
| `tag_audio_events` | `bool` | `True` | Include audio events like (laughter), (coughing) in transcription. |
### Usage
```python theme={null}
import aiohttp
from pipecat.services.elevenlabs.stt import ElevenLabsSTTService
async with aiohttp.ClientSession() as session:
stt = ElevenLabsSTTService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
aiohttp_session=session,
)
```
#### With Language and Audio Events
```python theme={null}
import aiohttp
from pipecat.services.elevenlabs.stt import ElevenLabsSTTService
from pipecat.transcriptions.language import Language
async with aiohttp.ClientSession() as session:
stt = ElevenLabsSTTService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
aiohttp_session=session,
settings=ElevenLabsSTTService.Settings(
language=Language.ES,
tag_audio_events=False,
),
)
```
### Notes
* The HTTP service uploads complete audio segments and is best for VAD-segmented transcription.
* Does not have connection events since it uses per-request HTTP calls.
* **Multilingual support**: ElevenLabs Scribe supports 99+ languages. The default is `Language.EN` (English). Set `language=None` in settings to enable automatic language detection, which will transcribe whatever language the user speaks.
## ElevenLabsRealtimeSTTService
ElevenLabs API key for authentication.
Base URL for the ElevenLabs WebSocket API.
Model ID for real-time transcription. *Deprecated in v0.0.105. Use
`settings=ElevenLabsRealtimeSTTService.Settings(...)` instead.*
Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
rate.
Runtime-configurable settings for the Realtime STT service. See
[Settings](#settings-2) below.
How to segment speech. `CommitStrategy.MANUAL` uses Pipecat's VAD to control
when transcript segments are committed. `CommitStrategy.VAD` uses ElevenLabs'
built-in VAD for segment boundaries.
Whether to include word-level timestamps in transcripts.
Whether to enable logging on ElevenLabs' side.
Whether to include language detection in transcripts.
Configuration parameters for the STT service. *Deprecated in v0.0.105. Use
`settings=ElevenLabsRealtimeSTTService.Settings(...)` instead.*
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `ElevenLabsRealtimeSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------------------------- | ----------------- | ------------- | --------------------------------------------------------------------------------------- |
| `model` | `str` | `None` | Model ID for transcription. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `Language.EN` | Language for speech recognition. *(Inherited from base STT settings.)* |
| `vad_silence_threshold_secs` | `float` | `None` | Seconds of silence before VAD commits (0.3-3.0). Only used with VAD commit strategy. |
| `vad_threshold` | `float` | `None` | VAD sensitivity (0.1-0.9, lower is more sensitive). Only used with VAD commit strategy. |
| `min_speech_duration_ms` | `int` | `None` | Minimum speech duration for VAD (50-2000ms). Only used with VAD commit strategy. |
| `min_silence_duration_ms` | `int` | `None` | Minimum silence duration for VAD (50-2000ms). Only used with VAD commit strategy. |
### Usage
```python theme={null}
from pipecat.services.elevenlabs.stt import ElevenLabsRealtimeSTTService
stt = ElevenLabsRealtimeSTTService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
)
```
#### With Timestamps and Custom Commit Strategy
```python theme={null}
from pipecat.services.elevenlabs.stt import ElevenLabsRealtimeSTTService, CommitStrategy
stt = ElevenLabsRealtimeSTTService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
language_code="eng",
commit_strategy=CommitStrategy.VAD,
include_timestamps=True,
settings=ElevenLabsRealtimeSTTService.Settings(
vad_silence_threshold_secs=1.0,
),
)
```
### Notes
* **Commit strategies**: Defaults to `manual` commit strategy, where Pipecat's VAD controls when transcription segments are committed. Set `commit_strategy=CommitStrategy.VAD` to let ElevenLabs handle segment boundaries. When using `MANUAL` commit strategy, transcription frames are marked as finalized (`TranscriptionFrame.finalized=True`).
* **Keepalive**: Sends silent audio chunks as keepalive to prevent idle disconnections (keepalive interval: 5s, timeout: 10s).
* **Auto-reconnect**: Automatically reconnects if the WebSocket connection is closed when new audio arrives.
* **Multilingual support**: ElevenLabs Scribe supports 99+ languages. The Realtime service defaults to automatic language detection (`language=None`). To restrict transcription to a specific language, set `language` in settings.
### Event Handlers
Supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| ----------------- | --------------------------------------------------- |
| `on_connected` | Connected to ElevenLabs Realtime STT WebSocket |
| `on_disconnected` | Disconnected from ElevenLabs Realtime STT WebSocket |
```python theme={null}
@stt.event_handler("on_connected")
async def on_connected(service):
print("Connected to ElevenLabs Realtime STT")
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# Fal (Wizper)
Source: https://docs.pipecat.ai/api-reference/server/services/stt/fal
Speech-to-text service implementation using Fal's Wizper API
## Overview
`FalSTTService` provides speech-to-text capabilities using Fal's Wizper API with Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time for efficient transcription.
Pipecat's API methods for Fal Wizper integration
Complete example with VAD integration
Official Fal Wizper documentation and features
Access API keys and Wizper models
## Installation
To use Fal services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[fal]"
```
## Prerequisites
### Fal Account Setup
Before using Fal STT services, you need:
1. **Fal Account**: Sign up at [Fal Platform](https://fal.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Access**: Ensure access to the Wizper transcription model
### Required Environment Variables
* `FAL_KEY`: Your Fal API key for authentication
## Configuration
### FalSTTService
Fal API key. If not provided, uses `FAL_KEY` environment variable.
Optional aiohttp ClientSession for HTTP requests. If not provided, a session
will be created and managed internally.
Task to perform (`"transcribe"` or `"translate"`).
Level of chunking (`"segment"`).
Version of the Wizper model to use.
Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
rate.
Configuration parameters for the Wizper API. *Deprecated in v0.0.105. Use
`settings=FalSTTService.Settings(...)` instead.*
Runtime-configurable settings for the STT service. See [Settings](#settings)
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `FalSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ------------- | ------------------------------------------------------------------ |
| `model` | `str` | `None` | STT model identifier. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `Language.EN` | Language of the audio input. *(Inherited from base STT settings.)* |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.fal.stt import FalSTTService
stt = FalSTTService(
api_key=os.getenv("FAL_KEY"),
)
```
### With Custom Parameters
```python theme={null}
from pipecat.services.fal.stt import FalSTTService
from pipecat.transcriptions.language import Language
stt = FalSTTService(
api_key=os.getenv("FAL_KEY"),
task="transcribe",
version="3",
settings=FalSTTService.Settings(
language=Language.ES,
),
)
```
### Translation Mode
```python theme={null}
stt = FalSTTService(
api_key=os.getenv("FAL_KEY"),
task="translate",
settings=FalSTTService.Settings(
language=Language.FR,
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Segmented processing**: `FalSTTService` inherits from `SegmentedSTTService`, which buffers audio during speech (detected by VAD) and sends complete segments for transcription. This means it does not provide interim results -- only final transcriptions after each speech segment.
* **Translation support**: Set `task="translate"` to translate audio into English, regardless of the input language.
* **Wizper versions**: The `version` parameter selects the underlying Whisper model version. Version `"3"` is the default and recommended for best accuracy.
# Gladia
Source: https://docs.pipecat.ai/api-reference/server/services/stt/gladia
Speech-to-text service implementation using Gladia's API
## Overview
`GladiaSTTService` provides real-time speech recognition using Gladia's WebSocket API with support for 99+ languages, custom vocabulary, translation, sentiment analysis, and advanced audio processing features for comprehensive transcription.
Pipecat's API methods for Gladia STT integration
Complete example with interruption handling
Official Gladia documentation and features
Access multilingual transcription and API keys
## Installation
To use Gladia services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[gladia]"
```
## Prerequisites
### Gladia Account Setup
Before using Gladia STT services, you need:
1. **Gladia Account**: Sign up at [Gladia](https://www.gladia.io/)
2. **API Key**: Generate an API key from your account dashboard
3. **Region Selection**: Choose your preferred region (EU-West or US-West)
### Required Environment Variables
* `GLADIA_API_KEY`: Your Gladia API key for authentication
* `GLADIA_REGION`: Your preferred region (optional, defaults to "eu-west")
## Configuration
### GladiaSTTService
Gladia API key for authentication.
Region used to process audio. Defaults to `"eu-west"` when `None`.
Gladia API URL for session initialization.
Minimum confidence threshold for transcriptions (0.0-1.0). Deprecated -- no
confidence threshold is applied.
Audio encoding format. Init-only -- not part of runtime-updatable settings.
Audio bit depth. Init-only -- not part of runtime-updatable settings.
Number of audio channels. Init-only -- not part of runtime-updatable settings.
Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
rate.
Model to use for transcription. *Deprecated in v0.0.105. Use
`settings=GladiaSTTService.Settings(...)` instead.*
Additional configuration parameters. *Deprecated in v0.0.105. Use
`settings=GladiaSTTService.Settings(...)` instead.*
Runtime-configurable settings for the STT service. See [Settings](#settings)
below.
Maximum size of audio buffer in bytes (default 20MB).
Whether the bot should be interrupted when Gladia VAD detects user speech.
P99 latency from speech end to final transcript in seconds. Override for your
deployment. See [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark).
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `GladiaSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| -------------------------------------- | -------------------------- | ------- | ------------------------------------------------------------------------------------- |
| `model` | `str` | `None` | STT model identifier. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `None` | Language for speech recognition. *(Inherited from base STT settings.)* |
| `language_config` | `LanguageConfig` | `None` | Detailed language configuration with code switching support. |
| `custom_metadata` | `Dict[str, Any]` | `None` | Additional metadata to include with requests. |
| `endpointing` | `float` | `None` | Silence duration in seconds to mark end of speech. |
| `maximum_duration_without_endpointing` | `int` | `5` | Maximum utterance duration (seconds) without silence. |
| `pre_processing` | `PreProcessingConfig` | `None` | Audio pre-processing options (audio enhancer, speech threshold). |
| `realtime_processing` | `RealtimeProcessingConfig` | `None` | Real-time processing features (custom vocabulary, translation, NER, sentiment). |
| `messages_config` | `MessagesConfig` | `None` | WebSocket message filtering options. |
| `enable_vad` | `bool` | `False` | Enable Gladia VAD for end-of-utterance detection. Use without other VAD in the agent. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.gladia.stt import GladiaSTTService
stt = GladiaSTTService(
api_key=os.getenv("GLADIA_API_KEY"),
)
```
### With Language Configuration
```python theme={null}
from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.gladia.config import LanguageConfig
stt = GladiaSTTService(
api_key=os.getenv("GLADIA_API_KEY"),
region="us-west",
settings=GladiaSTTService.Settings(
model="solaria-1",
language_config=LanguageConfig(
languages=["en", "es"],
code_switching=True,
),
),
)
```
### With Real-time Processing
```python theme={null}
from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.gladia.config import (
RealtimeProcessingConfig,
CustomVocabularyConfig,
CustomVocabularyItem,
TranslationConfig,
)
stt = GladiaSTTService(
api_key=os.getenv("GLADIA_API_KEY"),
settings=GladiaSTTService.Settings(
realtime_processing=RealtimeProcessingConfig(
custom_vocabulary=True,
custom_vocabulary_config=CustomVocabularyConfig(
vocabulary=[
CustomVocabularyItem(value="Pipecat", intensity=0.8),
"Gladia",
],
),
translation=True,
translation_config=TranslationConfig(
target_languages=["fr", "de"],
model="enhanced",
),
),
),
)
```
## Notes
* **Session-based connection**: Gladia uses a two-step connection process: first an HTTP POST to initialize a session, then a WebSocket connection to the returned session URL. The session URL and ID are managed automatically.
* **Audio buffering**: The service buffers audio data locally and sends it when connected. If the connection drops and reconnects, buffered audio is automatically re-sent to minimize transcript gaps.
* **Keepalive**: Empty audio chunks are sent periodically to keep the Gladia connection alive (keepalive interval: 5s, timeout: 20s).
* **Built-in VAD**: Set `enable_vad=True` in Settings to use Gladia's server-side VAD, which emits `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame`. When using this, do not enable another VAD in your pipeline.
* **Translation**: Gladia supports real-time translation to multiple target languages. Translation results are pushed as `TranslationFrame`s.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Event Handlers
Gladia STT supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| ----------------- | ---------------------------------- |
| `on_connected` | Connected to Gladia WebSocket |
| `on_disconnected` | Disconnected from Gladia WebSocket |
```python theme={null}
@stt.event_handler("on_connected")
async def on_connected(service):
print("Connected to Gladia")
```
# Google
Source: https://docs.pipecat.ai/api-reference/server/services/stt/google
Speech-to-text service implementation using Google Cloud's Speech-to-Text V2 API
## Overview
`GoogleSTTService` provides real-time speech recognition using Google Cloud's Speech-to-Text V2 API with support for 125+ languages, multiple models, voice activity detection, and advanced features like automatic punctuation and word-level confidence scores.
Pipecat's API methods for Google Cloud STT integration
Complete example with Google Cloud services
Official Google Cloud Speech-to-Text documentation
Create service accounts and manage API access
## Installation
To use Google Cloud Speech services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[google]"
```
## Prerequisites
### Google Cloud Setup
Before using Google Cloud STT services, you need:
1. **Google Cloud Account**: Sign up at [Google Cloud Console](https://console.cloud.google.com/)
2. **Project Setup**: Create a project and enable the Speech-to-Text API
3. **Service Account**: Create a service account with Speech-to-Text permissions
4. **Authentication**: Set up credentials via service account key or Application Default Credentials
### Required Environment Variables
* `GOOGLE_APPLICATION_CREDENTIALS`: Path to your service account key file (recommended)
* Or use Application Default Credentials for cloud deployments
## Configuration
### GoogleSTTService
JSON string containing Google Cloud service account credentials.
Path to service account credentials JSON file.
Google Cloud location (e.g., `"global"`, `"us-central1"`). Non-global
locations use regional endpoints.
Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
rate.
Configuration parameters for the STT service. *Deprecated in v0.0.105. Use
`settings=GoogleSTTService.Settings(...)` instead.*
Runtime-configurable settings for the STT service. See [Settings](#settings)
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
You must provide either `credentials` (JSON string), `credentials_path` (file
path), or have Application Default Credentials configured. At least one
authentication method is required.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `GoogleSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| -------------------------------------- | ---------------------------- | ------------------ | ---------------------------------------------------------------------------- |
| `model` | `str` | `"latest_long"` | Speech recognition model to use. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `None` | Language for speech recognition. *(Inherited from base STT settings.)* |
| `languages` | `Language \| List[Language]` | `[Language.EN_US]` | Single language or list of recognition languages. First language is primary. |
| `use_separate_recognition_per_channel` | `bool` | `False` | Process each audio channel separately. |
| `enable_automatic_punctuation` | `bool` | `True` | Add punctuation to transcripts. |
| `enable_spoken_punctuation` | `bool` | `False` | Include spoken punctuation in transcript. |
| `enable_spoken_emojis` | `bool` | `False` | Include spoken emojis in transcript. |
| `profanity_filter` | `bool` | `False` | Filter profanity from transcript. |
| `enable_word_time_offsets` | `bool` | `False` | Include timing information for each word. |
| `enable_word_confidence` | `bool` | `False` | Include confidence scores for each word. |
| `enable_interim_results` | `bool` | `True` | Stream partial recognition results. |
| `enable_voice_activity_events` | `bool` | `False` | Detect voice activity in audio. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.google.stt import GoogleSTTService
stt = GoogleSTTService(
credentials_path=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
)
```
### With Credentials JSON String
```python theme={null}
import json
from pipecat.services.google.stt import GoogleSTTService
stt = GoogleSTTService(
credentials=json.dumps(credentials_dict),
location="us-central1",
)
```
### With Custom Parameters
```python theme={null}
from pipecat.services.google.stt import GoogleSTTService
from pipecat.transcriptions.language import Language
stt = GoogleSTTService(
credentials_path=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
settings=GoogleSTTService.Settings(
languages=[Language.EN_US, Language.ES],
model="latest_long",
enable_automatic_punctuation=True,
enable_word_time_offsets=True,
enable_word_confidence=True,
),
)
```
### Updating Settings at Runtime
Google STT supports dynamic settings updates via `STTUpdateSettingsFrame`:
```python theme={null}
from pipecat.frames.frames import STTUpdateSettingsFrame
from pipecat.transcriptions.language import Language
await task.queue_frame(
STTUpdateSettingsFrame(
delta=GoogleSTTService.Settings(
languages=[Language.FR],
model="latest_short",
enable_automatic_punctuation=False,
)
)
)
```
## Notes
* **Streaming time limit**: Google Cloud STT has a 5-minute streaming limit per connection. The service automatically handles stream reconnection at 4 minutes to provide seamless transcription without interruption.
* **Multi-language support**: Pass a list of `Language` values to `languages` for multi-language recognition. The first language is the primary language.
* **Regional endpoints**: Use the `location` parameter to route requests through regional endpoints (e.g., `"us-central1"`, `"europe-west1"`) for data residency requirements. The default `"global"` endpoint works for most use cases.
* **Stream abort on inactivity**: If no audio is sent for \~10 seconds (e.g., when audio frames are blocked by an `STTMuteFilter`), Google automatically closes the stream. The service recovers by automatically reconnecting.
* **Authentication priority**: The service checks for credentials in this order: `credentials` (JSON string), `credentials_path` (file), then Application Default Credentials.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Event Handlers
Google STT supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| ----------------- | --------------------------------------------- |
| `on_connected` | Connected to Google Cloud Speech-to-Text |
| `on_disconnected` | Disconnected from Google Cloud Speech-to-Text |
```python theme={null}
@stt.event_handler("on_connected")
async def on_connected(service):
print("Connected to Google STT")
```
# Gradium
Source: https://docs.pipecat.ai/api-reference/server/services/stt/gradium
Speech-to-text service using Gradium's real-time streaming API
## Overview
`GradiumSTTService` provides real-time speech recognition using Gradium's WebSocket API with support for multilingual transcription, semantic voice activity detection for smart turn-taking, and robust performance in noisy environments.
Pipecat's API methods for Gradium STT integration
Complete example with interruption handling
Official Gradium STT API documentation
Access API keys and speech models
## Installation
To use Gradium services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[gradium]"
```
## Prerequisites
### Gradium Account Setup
Before using Gradium STT services, you need:
1. **Gradium Account**: Sign up at [Gradium](https://gradium.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Region Selection**: Choose your preferred region (EU or US)
### Required Environment Variables
* `GRADIUM_API_KEY`: Your Gradium API key for authentication
## Configuration
### GradiumSTTService
Gradium API key for authentication.
WebSocket endpoint URL. Override for different regions or custom deployments.
Base audio encoding type. One of `"pcm"`, `"wav"`, or `"opus"`. For PCM, the
sample rate is appended automatically to form the input format (e.g., `"pcm"`
becomes `"pcm_16000"`). PCM accepts 8000, 16000, and 24000 Hz sample rates.
Audio sample rate in Hz. If `None`, uses the pipeline's audio sample rate.
Configuration parameters for language and delay settings. *Deprecated in
v0.0.105. Use `settings=GradiumSTTService.Settings(...)` instead.*
Optional JSON configuration string for additional model settings. Deprecated
in favor of `params`.
Runtime-configurable settings for the STT service. See [Settings](#settings)
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment. See [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark).
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `GradiumSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ----------------- | ----------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model` | `str` | `"default"` | STT model identifier. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `None` | Expected language of the audio. *(Inherited from base STT settings.)* Helps ground the model to a specific language and improve transcription quality. |
| `delay_in_frames` | `int` | `None` | Server-side delay in audio frames (80ms each) before text is generated. Higher delays allow more context but increase latency. Allowed values: 7, 8, 10, 12, 14, 16, 20, 24, 36, 48. Default is 10 (800ms). Sent to Gradium API via json\_config. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.gradium.stt import GradiumSTTService
stt = GradiumSTTService(
api_key=os.getenv("GRADIUM_API_KEY"),
)
```
### With Language and Delay Configuration
```python theme={null}
from pipecat.services.gradium.stt import GradiumSTTService
from pipecat.transcriptions.language import Language
stt = GradiumSTTService(
api_key=os.getenv("GRADIUM_API_KEY"),
settings=GradiumSTTService.Settings(
language=Language.EN,
delay_in_frames=8,
),
)
```
## Notes
* **Supported languages**: German, English, Spanish, French, and Portuguese.
* **Audio format**: Configurable via `encoding` and `sample_rate` parameters. Defaults to PCM with the pipeline's sample rate. Supported PCM rates: 8000, 16000, and 24000 Hz. Audio is sent in 80ms chunks.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Event Handlers
Gradium STT supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| ----------------- | ----------------------------------- |
| `on_connected` | Connected to Gradium WebSocket |
| `on_disconnected` | Disconnected from Gradium WebSocket |
```python theme={null}
@stt.event_handler("on_connected")
async def on_connected(service):
print("Connected to Gradium")
```
# Groq (Whisper)
Source: https://docs.pipecat.ai/api-reference/server/services/stt/groq
Speech-to-text service implementation using Groq's Whisper API
## Overview
`GroqSTTService` provides high-accuracy speech recognition using Groq's hosted Whisper API with ultra-fast inference speeds. It uses Voice Activity Detection (VAD) to process speech segments efficiently for optimal performance and accuracy.
Pipecat's API methods for Groq STT integration
Complete example with Groq ecosystem integration
Official Groq STT documentation and features
Access API keys and Whisper models
## Installation
To use Groq services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[groq]"
```
## Prerequisites
### Groq Account Setup
Before using Groq STT services, you need:
1. **Groq Account**: Sign up at [Groq Console](https://console.groq.com/)
2. **API Key**: Generate an API key from your console dashboard
3. **Model Access**: Ensure access to Whisper transcription models
### Required Environment Variables
* `GROQ_API_KEY`: Your Groq API key for authentication
## Configuration
Whisper model to use for transcription. *Deprecated in v0.0.105. Use
`settings=GroqSTTService.Settings(...)` instead.*
Groq API key. If not provided, uses `GROQ_API_KEY` environment variable.
API base URL. Override for custom or proxied deployments.
Language of the audio input. *Deprecated in v0.0.105. Use
`settings=GroqSTTService.Settings(...)` instead.*
Optional text to guide the model's style or continue a previous segment.
*Deprecated in v0.0.105. Use `settings=GroqSTTService.Settings(...)` instead.*
Sampling temperature between 0 and 1. Lower values are more deterministic.
Defaults to 0.0. *Deprecated in v0.0.105. Use
`settings=GroqSTTService.Settings(...)` instead.*
Runtime-configurable settings for the STT service. See [Settings](#settings)
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
If true, allow empty `TranscriptionFrame` frames to be pushed downstream
instead of discarding them. This is intended for situations where VAD fires
even though the user did not speak. In these cases, it is useful to know that
nothing was transcribed so that the agent can resume speaking, instead of
waiting longer for a transcription.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `GroqSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ------------- | ----------------- | -------------------------- | ------------------------------------------------------------------------ |
| `model` | `str` | `"whisper-large-v3-turbo"` | Whisper model to use. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `Language.EN` | Language of the audio input. *(Inherited from base STT settings.)* |
| `prompt` | `str` | `None` | Optional text to guide the model's style or continue a previous segment. |
| `temperature` | `float` | `None` | Sampling temperature between 0 and 1. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.groq.stt import GroqSTTService
stt = GroqSTTService(
api_key=os.getenv("GROQ_API_KEY"),
)
```
### With Custom Model and Language
```python theme={null}
from pipecat.services.groq.stt import GroqSTTService
from pipecat.transcriptions.language import Language
stt = GroqSTTService(
api_key=os.getenv("GROQ_API_KEY"),
settings=GroqSTTService.Settings(
model="whisper-large-v3-turbo",
language=Language.ES,
),
)
```
### With Prompt and Temperature
```python theme={null}
from pipecat.services.groq.stt import GroqSTTService
stt = GroqSTTService(
api_key=os.getenv("GROQ_API_KEY"),
settings=GroqSTTService.Settings(
prompt="This is a conversation about artificial intelligence and machine learning.",
temperature=0.0,
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Segmented processing**: `GroqSTTService` inherits from `SegmentedSTTService` (via `BaseWhisperSTTService`), which buffers audio during speech (detected by VAD) and sends complete segments for transcription. This means it does not provide interim results -- only final transcriptions after each speech segment.
* **Whisper API compatible**: Groq uses the OpenAI-compatible Whisper API format. The service sends audio in WAV format and receives JSON transcription responses.
* **Ultra-fast inference**: Groq's LPU (Language Processing Unit) infrastructure provides significantly faster inference than CPU/GPU-based Whisper deployments, making it suitable for real-time applications despite the segmented processing approach.
* **Prompt guidance**: Use the `prompt` parameter to provide context that helps the model with domain-specific terminology or to maintain consistency across segments.
* **Multilingual support**: Whisper supports 99+ languages. The default is `Language.EN` (English). Set `language=None` in settings to enable automatic language detection, which will transcribe whatever language the user speaks.
# NVIDIA Riva
Source: https://docs.pipecat.ai/api-reference/server/services/stt/nvidia
Speech-to-text service implementation using NVIDIA Riva
## Overview
NVIDIA Riva provides two STT service implementations:
* **`NvidiaSTTService`** -- Real-time streaming transcription using Parakeet models with interim results and continuous audio processing.
* **`NvidiaSegmentedSTTService`** -- Segmented transcription using Canary models with advanced language support, word boosting, and enterprise-grade accuracy.
Pipecat's API methods for NVIDIA Riva STT integration
Complete example with NVIDIA services integration
Official NVIDIA Riva ASR documentation
Access API keys and Riva services
## Installation
To use NVIDIA Riva services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[nvidia]"
```
## Prerequisites
### NVIDIA Riva Setup
Before using NVIDIA Riva STT services, you need:
1. **NVIDIA Developer Account**: Sign up at [NVIDIA Developer Portal](https://developer.nvidia.com)
2. **API Key**: Generate an NVIDIA API key for Riva services
3. **Model Selection**: Choose between Parakeet (streaming) and Canary (segmented) models
### Required Environment Variables
* `NVIDIA_API_KEY`: Your NVIDIA API key for authentication
## NvidiaSTTService
Real-time streaming transcription using NVIDIA Riva's Parakeet models.
NVIDIA API key for authentication.
NVIDIA Riva server address.
Mapping containing `function_id` and `model_name` for the ASR model.
Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
rate.
Additional configuration parameters. *Deprecated in v0.0.105. Use
`settings=NvidiaSTTService.Settings(...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
Whether to use SSL for the gRPC connection.
P99 latency from speech end to final transcript in seconds. Override for your
deployment. See [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark).
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `NvidiaSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ---------------- | ------------------------------------------------------------------------ |
| `model` | `str` | `None` | STT model identifier. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `Language.EN_US` | Target language for transcription. *(Inherited from base STT settings.)* |
### Usage
```python theme={null}
from pipecat.services.nvidia.stt import NvidiaSTTService
stt = NvidiaSTTService(
api_key=os.getenv("NVIDIA_API_KEY"),
)
```
### Notes
* **Model cannot be changed after initialization**: Use the `model_function_map` parameter in the constructor to specify the model and function ID.
* **Streaming**: Provides real-time interim and final results through continuous audio streaming.
## NvidiaSegmentedSTTService
Batch/segmented transcription using NVIDIA Riva's Canary models. Processes complete audio segments after VAD detects speech boundaries.
NVIDIA API key for authentication.
NVIDIA Riva server address.
Mapping containing `function_id` and `model_name` for the ASR model.
Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
rate.
Additional configuration parameters. *Deprecated in v0.0.105. Use
`settings=NvidiaSegmentedSTTService.Settings(...)` instead.*
Runtime-configurable settings. See [Settings](#settings-2) below.
Whether to use SSL for the gRPC connection.
P99 latency from speech end to final transcript in seconds. Override for your
deployment. See [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark).
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `NvidiaSegmentedSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ----------------------- | ----------------- | ---------------- | ------------------------------------------------------------------------ |
| `model` | `str` | `None` | STT model identifier. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `Language.EN_US` | Target language for transcription. *(Inherited from base STT settings.)* |
| `profanity_filter` | `bool` | `False` | Whether to filter profanity from results. |
| `automatic_punctuation` | `bool` | `True` | Whether to add automatic punctuation. |
| `verbatim_transcripts` | `bool` | `False` | Whether to return verbatim transcripts. |
| `boosted_lm_words` | `list[str]` | `None` | List of words to boost in the language model. |
| `boosted_lm_score` | `float` | `4.0` | Score boost for specified words. |
### Usage
```python theme={null}
from pipecat.services.nvidia.stt import NvidiaSegmentedSTTService
from pipecat.transcriptions.language import Language
stt = NvidiaSegmentedSTTService(
api_key=os.getenv("NVIDIA_API_KEY"),
settings=NvidiaSegmentedSTTService.Settings(
language=Language.ES,
automatic_punctuation=True,
boosted_lm_words=["Pipecat", "NVIDIA"],
boosted_lm_score=6.0,
),
)
```
### Notes
* **Model cannot be changed after initialization**: Use the `model_function_map` parameter in the constructor to specify the model and function ID.
* **Segmented processing**: Processes complete audio segments for higher accuracy compared to streaming.
* **Language support**: Supports Arabic, English (US/GB), French, German, Hindi, Italian, Japanese, Korean, Portuguese (BR), Russian, and Spanish (ES/US).
* **Word boosting**: Use `boosted_lm_words` and `boosted_lm_score` to improve recognition of domain-specific terms.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# OpenAI
Source: https://docs.pipecat.ai/api-reference/server/services/stt/openai
Speech-to-text service implementations using OpenAI's Speech-to-Text APIs
## Overview
OpenAI provides two STT service implementations:
* **`OpenAISTTService`** (HTTP) -- VAD-segmented speech recognition using OpenAI's transcription API, supporting GPT-4o transcription and Whisper models.
* **`OpenAIRealtimeSTTService`** (WebSocket) -- Real-time streaming speech-to-text using OpenAI's Realtime API transcription sessions, with support for local VAD and server-side VAD modes.
Pipecat's API methods for OpenAI STT integration
Complete example with OpenAI ecosystem integration
Official OpenAI transcription documentation and features
Access API keys and transcription models
## Installation
To use OpenAI services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[openai]"
```
## Prerequisites
### OpenAI Account Setup
Before using OpenAI STT services, you need:
1. **OpenAI Account**: Sign up at [OpenAI Platform](https://platform.openai.com/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Access**: Ensure access to GPT-4o transcription and Whisper models
### Required Environment Variables
* `OPENAI_API_KEY`: Your OpenAI API key for authentication
## OpenAISTTService
Uses VAD-based audio segmentation with HTTP transcription requests. Records speech segments detected by local VAD and sends them to OpenAI's transcription API.
Transcription model to use. Options include `"gpt-4o-transcribe"`,
`"gpt-4o-mini-transcribe"`, and `"whisper-1"`. *Deprecated in v0.0.105. Use
`settings=OpenAISTTService.Settings(...)` instead.*
OpenAI API key. Falls back to the `OPENAI_API_KEY` environment variable.
API base URL. Override for custom or proxied deployments.
Language of the audio input. *Deprecated in v0.0.105. Use
`settings=OpenAISTTService.Settings(...)` instead.*
Optional text to guide the model's style or continue a previous segment.
*Deprecated in v0.0.105. Use `settings=OpenAISTTService.Settings(...)`
instead.*
Sampling temperature between 0 and 1. Lower values produce more deterministic
results. *Deprecated in v0.0.105. Use
`settings=OpenAISTTService.Settings(...)` instead.*
Runtime-configurable settings for the STT service. See [Settings](#settings)
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
If true, allow empty `TranscriptionFrame` frames to be pushed downstream
instead of discarding them. This is intended for situations where VAD fires
even though the user did not speak. In these cases, it is useful to know that
nothing was transcribed so that the agent can resume speaking, instead of
waiting longer for a transcription.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `OpenAISTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ------------- | ----------------- | --------------------- | ------------------------------------------------------------------------ |
| `model` | `str` | `"gpt-4o-transcribe"` | Transcription model to use. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `Language.EN` | Language of the audio input. *(Inherited from base STT settings.)* |
| `prompt` | `str` | `None` | Optional text to guide the model's style or continue a previous segment. |
| `temperature` | `float` | `None` | Sampling temperature between 0 and 1. |
### Usage
```python theme={null}
from pipecat.services.openai.stt import OpenAISTTService
stt = OpenAISTTService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAISTTService.Settings(
model="gpt-4o-transcribe",
),
)
```
### Notes
* **Segmented transcription**: Processes complete audio segments (after VAD detects silence) via HTTP. Only produces final transcriptions, not interim results.
* Does not have WebSocket connection events since it uses per-request HTTP calls.
* **Multilingual support**: Whisper and GPT-4o transcription models support many languages. The default is `Language.EN` (English). Set `language=None` in settings to enable automatic language detection, which will transcribe whatever language the user speaks.
## OpenAIRealtimeSTTService
Real-time streaming speech-to-text using OpenAI's Realtime API WebSocket transcription sessions. Audio is streamed continuously over a WebSocket connection for lower latency compared to HTTP-based transcription.
OpenAI API key for authentication.
Transcription model. Supported values are `"gpt-4o-transcribe"` and
`"gpt-4o-mini-transcribe"`. *Deprecated in v0.0.105. Use
`settings=OpenAIRealtimeSTTService.Settings(...)` instead.*
WebSocket base URL for the Realtime API.
Language of the audio input. *Deprecated in v0.0.105. Use
`settings=OpenAIRealtimeSTTService.Settings(...)` instead.*
Optional prompt text to guide transcription style or provide keyword hints.
*Deprecated in v0.0.105. Use `settings=OpenAIRealtimeSTTService.Settings(...)`
instead.*
Runtime-configurable settings for the Realtime STT service. See
[Settings](#settings-2) below.
Server-side VAD configuration. Defaults to `False` (disabled), which relies on a local VAD processor in the pipeline. Pass `None` to use server defaults (`server_vad`), or a dict with custom settings (e.g. `{"type": "server_vad", "threshold": 0.5}`).
Noise reduction mode. `"near_field"` for close microphones, `"far_field"` for
distant microphones, or `None` to disable.
**Deprecated in v0.0.106.** Use
`settings=OpenAIRealtimeSTTService.Settings(noise_reduction=...)` instead.
Whether to interrupt bot output when speech is detected by server-side VAD.
Only applies when turn detection is enabled.
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `OpenAIRealtimeSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | --------------------- | ------------------------------------------------------------------- |
| `model` | `str` | `"gpt-4o-transcribe"` | Transcription model to use. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `Language.EN` | Language of the audio input. *(Inherited from base STT settings.)* |
| `prompt` | `str` | `None` | Optional prompt text to guide transcription style or keyword hints. |
### Usage
#### With Local VAD
```python theme={null}
from pipecat.services.openai.stt import OpenAIRealtimeSTTService
# Local VAD mode (default) - use with a VAD processor in the pipeline
stt = OpenAIRealtimeSTTService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAIRealtimeSTTService.Settings(
model="gpt-4o-transcribe",
noise_reduction="near_field",
),
)
```
#### With Server-Side VAD
```python theme={null}
from pipecat.services.openai.stt import OpenAIRealtimeSTTService
# Server-side VAD mode - do NOT use a separate VAD processor
stt = OpenAIRealtimeSTTService(
api_key=os.getenv("OPENAI_API_KEY"),
turn_detection=None, # Enable server-side VAD
settings=OpenAIRealtimeSTTService.Settings(
model="gpt-4o-transcribe",
),
)
```
### Notes
* **Local VAD vs Server-side VAD**: Defaults to local VAD mode (`turn_detection=False`), where a local VAD processor in the pipeline controls when audio is committed for transcription. Set `turn_detection=None` for server-side VAD, but do not use a separate VAD processor in the pipeline in that mode.
* **Automatic resampling**: Automatically resamples audio to 24 kHz as required by the Realtime API, regardless of the pipeline's sample rate.
* **Interim transcriptions**: Produces interim transcriptions via delta events for real-time feedback.
* **Multilingual support**: GPT-4o transcription models support many languages. The default is `Language.EN` (English). Set `language=None` in settings to enable automatic language detection, which will transcribe whatever language the user speaks.
### Event Handlers
Supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| ----------------- | ------------------------------------------- |
| `on_connected` | Connected to OpenAI Realtime WebSocket |
| `on_disconnected` | Disconnected from OpenAI Realtime WebSocket |
```python theme={null}
@stt.event_handler("on_connected")
async def on_connected(service):
print("Connected to OpenAI Realtime STT")
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
# Sarvam
Source: https://docs.pipecat.ai/api-reference/server/services/stt/sarvam
Speech-to-text service implementation using Sarvam AI's WebSocket-based streaming API
## Overview
`SarvamSTTService` provides real-time speech recognition using Sarvam AI's WebSocket API, supporting Indian language transcription with Voice Activity Detection (VAD) and multiple audio formats for high-accuracy speech recognition.
Pipecat's API methods for Sarvam STT integration
Complete example with interruption handling
Official Sarvam AI STT documentation and features
Access API keys and speech models
## Installation
To use Sarvam services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[sarvam]"
```
## Prerequisites
### Sarvam AI Account Setup
Before using Sarvam STT services, you need:
1. **Sarvam AI Account**: Sign up at [Sarvam AI](https://dashboard.sarvam.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Access**: Access to Saarika (STT) or Saaras (STT-Translate) models, including the `saaras:v3` model with support for multiple modes (transcribe, translate, verbatim, translit, codemix)
### Required Environment Variables
* `SARVAM_API_KEY`: Your Sarvam AI API key for authentication
## Configuration
### SarvamSTTService
Sarvam API key for authentication.
Sarvam model to use. Allowed values: `"saarika:v2.5"` (standard STT),
`"saaras:v2.5"` (STT-Translate, auto-detects language), `"saaras:v3"`
(advanced, supports mode and prompts). *Deprecated in v0.0.105. Use
`settings=SarvamSTTService.Settings(...)` instead.*
Audio sample rate in Hz. Defaults to 16000 if not specified.
Mode of operation. Only applicable to models that support it (e.g.,
`saaras:v3`). Defaults to the model's default mode.
Audio codec/format of the input file.
Configuration parameters for Sarvam STT service. *Deprecated in v0.0.105. Use
`settings=SarvamSTTService.Settings(...)` instead.*
Runtime-configurable settings for the STT service. See [Settings](#settings)
below.
Seconds of no audio before sending silence to keep the connection alive.
`None` disables keepalive.
P99 latency from speech end to final transcript in seconds. Override for your
deployment. See [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark).
Seconds between idle checks when keepalive is enabled.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `SarvamSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------------------- | ----------------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model` | `str` | `None` | STT model identifier. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `None` | Target language for transcription. *(Inherited from base STT settings.)* Behavior varies by model: `saarika:v2.5` defaults to "unknown" (auto-detect), `saaras:v2.5` ignores this (auto-detects), `saaras:v3` defaults to "en-IN". |
| `prompt` | `str` | `None` | Optional prompt to guide transcription/translation style. Only applicable to saaras models (v2.5 and v3). |
| `vad_signals` | `bool` | `None` | Enable VAD signals in responses. When enabled, the service broadcasts `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` from the server. |
| `high_vad_sensitivity` | `bool` | `None` | Enable high VAD sensitivity for more responsive speech detection. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.sarvam.stt import SarvamSTTService
stt = SarvamSTTService(
api_key=os.getenv("SARVAM_API_KEY"),
)
```
### With Language and Model Configuration
```python theme={null}
from pipecat.services.sarvam.stt import SarvamSTTService
from pipecat.transcriptions.language import Language
stt = SarvamSTTService(
api_key=os.getenv("SARVAM_API_KEY"),
mode="transcribe",
settings=SarvamSTTService.Settings(
model="saaras:v3",
language=Language.HI_IN,
prompt="Transcribe Hindi conversation about technology.",
),
)
```
### With Server-Side VAD
```python theme={null}
from pipecat.services.sarvam.stt import SarvamSTTService
stt = SarvamSTTService(
api_key=os.getenv("SARVAM_API_KEY"),
settings=SarvamSTTService.Settings(
vad_signals=True,
high_vad_sensitivity=True,
),
)
```
## Notes
* **Supported languages**: Bengali (bn-IN), Gujarati (gu-IN), Hindi (hi-IN), Kannada (kn-IN), Malayalam (ml-IN), Marathi (mr-IN), Tamil (ta-IN), Telugu (te-IN), Punjabi (pa-IN), Odia (od-IN), English (en-IN), and Assamese (as-IN).
* **Model-specific parameter validation**: The service validates that parameters are compatible with the selected model. For example, `prompt` is not supported with `saarika:v2.5`, and `language` is not supported with `saaras:v2.5` (which auto-detects language).
* **VAD modes**: When `vad_signals=False` (default), the service relies on Pipecat's local VAD and flushes the server buffer on `VADUserStoppedSpeakingFrame`. When `vad_signals=True`, the service uses Sarvam's server-side VAD and broadcasts speaking frames from the server.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Event Handlers
In addition to the standard [service connection events](/api-reference/server/events/service-events) (`on_connected`, `on_disconnected`, `on_connection_error`), Sarvam STT provides:
| Event | Description |
| ------------------- | ----------------------------------- |
| `on_speech_started` | Speech detected in the audio stream |
| `on_speech_stopped` | Speech stopped |
| `on_utterance_end` | End of utterance detected |
```python theme={null}
@stt.event_handler("on_speech_started")
async def on_speech_started(service):
print("User started speaking")
@stt.event_handler("on_utterance_end")
async def on_utterance_end(service):
print("Utterance ended")
```
# Soniox
Source: https://docs.pipecat.ai/api-reference/server/services/stt/soniox
Speech-to-text service implementation using Soniox's WebSocket API
## Overview
`SonioxSTTService` provides real-time speech-to-text transcription using Soniox's WebSocket API with support for over 60 languages, custom context, multiple languages in the same conversation, and advanced features for accurate multilingual transcription.
By default, Soniox uses the `stt-rt-v4` model with `vad_force_turn_endpoint=True`, which disables Soniox's native turn detection and relies on Pipecat's local VAD to finalize transcripts. This configuration significantly reduces the time to final segment (\~250ms median). Pipecat enables smart-turn detection by default using `LocalSmartTurnAnalyzerV3`. To use Soniox's native turn detection instead, set `vad_force_turn_endpoint=False`.
Pipecat's API methods for Soniox STT integration
Complete example with interruption handling
Official Soniox documentation and features
Access multilingual models and API keys
## Installation
To use Soniox services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[soniox]"
```
## Prerequisites
### Soniox Account Setup
Before using Soniox STT services, you need:
1. **Soniox Account**: Sign up at [Soniox Console](https://console.soniox.com/)
2. **API Key**: Generate an API key from your console dashboard
3. **Language Selection**: Choose from 60+ supported languages and models
### Required Environment Variables
* `SONIOX_API_KEY`: Your Soniox API key for authentication
## Configuration
### SonioxSTTService
Soniox API key for authentication.
Soniox WebSocket API URL.
Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
rate.
Soniox model to use for transcription. *Deprecated in v0.0.105. Use
`settings=SonioxSTTService.Settings(model=...)` instead.*
Audio format for transcription. Init-only -- not part of runtime-updatable
settings.
Number of audio channels. Init-only -- not part of runtime-updatable settings.
Additional configuration parameters. *Deprecated in v0.0.105. Use
`settings=SonioxSTTService.Settings(...)` instead.*
Runtime-configurable settings for the STT service. See [Settings](#settings)
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment. See [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark).
Listen to `VADUserStoppedSpeakingFrame` to send a finalize message to Soniox.
When enabled, Pipecat's local VAD triggers transcript finalization. When
disabled, Soniox detects the end of speech natively.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `SonioxSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| -------------------------------- | ---------------------------- | ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model` | `str` | `"stt-rt-v4"` | Model to use for transcription. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `None` | Language for speech recognition. *(Inherited from base STT settings.)* |
| `language_hints` | `list[Language]` | `None` | Language hints for transcription. Helps the model prioritize expected languages. |
| `language_hints_strict` | `bool` | `None` | If true, strictly enforce language hints (only transcribe in provided languages). |
| `context` | `SonioxContextObject \| str` | `None` | Customization for transcription. String for models with context\_version 1, `SonioxContextObject` for context\_version 2 (stt-rt-v3-preview and higher). |
| `enable_speaker_diarization` | `bool` | `False` | Enable speaker diarization. Tokens are annotated with speaker IDs. |
| `enable_language_identification` | `bool` | `False` | Enable language identification. Tokens are annotated with language IDs. |
| `client_reference_id` | `str` | `None` | Client reference ID for transcription tracking. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.soniox.stt import SonioxSTTService
stt = SonioxSTTService(
api_key=os.getenv("SONIOX_API_KEY"),
)
```
### With Language Hints and Context
```python theme={null}
from pipecat.services.soniox.stt import SonioxSTTService
from pipecat.transcriptions.language import Language
stt = SonioxSTTService(
api_key=os.getenv("SONIOX_API_KEY"),
settings=SonioxSTTService.Settings(
model="stt-rt-v4",
language_hints=[Language.EN, Language.ES],
language_hints_strict=True,
enable_language_identification=True,
),
)
```
### With Context Object (v3+ models)
```python theme={null}
from pipecat.services.soniox.stt import (
SonioxSTTService,
SonioxContextObject,
SonioxContextGeneralItem,
)
stt = SonioxSTTService(
api_key=os.getenv("SONIOX_API_KEY"),
settings=SonioxSTTService.Settings(
model="stt-rt-v4",
context=SonioxContextObject(
general=[
SonioxContextGeneralItem(key="domain", value="medical"),
],
terms=["Pipecat", "transcription"],
),
),
)
```
### With Soniox Native Turn Detection
```python theme={null}
from pipecat.services.soniox.stt import SonioxSTTService
stt = SonioxSTTService(
api_key=os.getenv("SONIOX_API_KEY"),
vad_force_turn_endpoint=False,
)
```
## Notes
* **Turn finalization**: By default (`vad_force_turn_endpoint=True`), when Pipecat's VAD detects the user has stopped speaking, a finalize message is sent to Soniox to get the final transcript immediately. This significantly reduces latency.
* **Keepalive**: The service automatically sends protocol-level keepalive messages to maintain the WebSocket connection.
* **Context versions**: Use a string for `context` with older models (context\_version 1) and `SonioxContextObject` for newer models (stt-rt-v3-preview and higher, context\_version 2). See the [Soniox context documentation](https://soniox.com/docs/stt/concepts/context) for details.
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Event Handlers
Soniox STT supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| ----------------- | ---------------------------------- |
| `on_connected` | Connected to Soniox WebSocket |
| `on_disconnected` | Disconnected from Soniox WebSocket |
```python theme={null}
@stt.event_handler("on_connected")
async def on_connected(service):
print("Connected to Soniox")
```
# Speechmatics
Source: https://docs.pipecat.ai/api-reference/server/services/stt/speechmatics
Speech-to-text service implementation using Speechmatics' real-time transcription STT API
## Overview
`SpeechmaticsSTTService` enables real-time speech transcription using Speechmatics' WebSocket API with partial and final results, speaker diarization, and end of utterance detection (VAD) for comprehensive conversation analysis.
Since Speechmatics provides its own user turn start and end detection, you
should use `ExternalUserTurnStrategies` to let Speechmatics handle turn
management. See [User Turn
Strategies](/api-reference/server/utilities/turn-management/user-turn-strategies) for
configuration details.
Pipecat's API methods for Speechmatics STT integration
Complete example with interruption handling
Official Speechmatics documentation and features
Learn about separating different speakers in audio
## Installation
To use Speechmatics services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[speechmatics]"
```
## Prerequisites
### Speechmatics Account Setup
Before using Speechmatics STT services, you need:
1. **Speechmatics Account**: Sign up at [Speechmatics](https://www.speechmatics.com/)
2. **API Key**: Generate an API key from your account dashboard
3. **Feature Selection**: Configure transcription features like speaker diarization
### Select Endpoint
Speechmatics STT supports the following endpoints (defaults to `EU2`):
| Region | Environment | STT Endpoint | Access |
| ------ | ------------- | -------------------------------- | ------------------------- |
| EU | EU1 | `wss://neu.rt.speechmatics.com/` | Self-Service / Enterprise |
| EU | EU2 (Default) | `wss://eu2.rt.speechmatics.com/` | Self-Service / Enterprise |
| US | US1 | `wss://wus.rt.speechmatics.com/` | Enterprise |
### Required Environment Variables
* `SPEECHMATICS_API_KEY`: Your Speechmatics API key for authentication
* `SPEECHMATICS_RT_URL`: Speechmatics endpoint URL (optional, defaults to EU2)
## Configuration
### SpeechmaticsSTTService
Speechmatics API key. Falls back to the `SPEECHMATICS_API_KEY` environment
variable.
Base URL for the Speechmatics API. Falls back to `SPEECHMATICS_RT_URL`
environment variable, then defaults to `wss://eu2.rt.speechmatics.com/v2`.
Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
rate.
Audio encoding format. Init-only -- not part of runtime-updatable settings.
Additional configuration parameters. *Deprecated in v0.0.105. Use
`settings=SpeechmaticsSTTService.Settings(...)` instead.*
Runtime-configurable settings for the STT service. See [Settings](#settings)
below.
Whether to interrupt bot output when Speechmatics detects user speech. Only
applies when `turn_detection_mode` is set to detect speech (ADAPTIVE or
SMART\_TURN).
P99 latency from speech end to final transcript in seconds. Override for your
deployment. See [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark).
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `SpeechmaticsSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------------------------------- | ---------------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `model` | `str` | `None` | STT model identifier. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `Language.EN` | Language code for transcription. *(Inherited from base STT settings.)* |
| `domain` | `str` | `None` | Domain for Speechmatics API (e.g. for bilingual transcription). |
| `turn_detection_mode` | `TurnDetectionMode` | `EXTERNAL` | Endpoint handling mode. `EXTERNAL` (default) uses Pipecat's VAD, `ADAPTIVE` uses Speechmatics' VAD, `SMART_TURN` uses Speechmatics' ML-based turn detection. |
| `speaker_active_format` | `str` | `"{text}"` | Formatter for active speaker output. Available attributes: `{speaker_id}`, `{text}`. Example: `"@{speaker_id}: {text}"`. |
| `speaker_passive_format` | `str` | `"{text}"` | Formatter for passive/background speaker output. Same attributes as active format. |
| `focus_speakers` | `list[str]` | `[]` | Speaker IDs to focus on. Only these speakers drive end of turn and conversation flow. |
| `ignore_speakers` | `list[str]` | `[]` | Speaker IDs to exclude from transcription entirely. |
| `focus_mode` | `SpeakerFocusMode` | `RETAIN` | `RETAIN` keeps words from non-focused speakers; `IGNORE` drops them. |
| `known_speakers` | `list[SpeakerIdentifier]` | `[]` | Known speaker labels and identifiers for speaker attribution. |
| `additional_vocab` | `list[AdditionalVocabEntry]` | `[]` | Additional vocabulary to boost recognition of specific words. |
| `operating_point` | `OperatingPoint` | `None` | Transcription accuracy vs. latency tradeoff. `ENHANCED` recommended for most use cases. |
| `max_delay` | `float` | `None` | Maximum delay in seconds for transcription. Lower values reduce latency but may impact accuracy. |
| `end_of_utterance_silence_trigger` | `float` | `None` | Silence duration in seconds to trigger end of utterance. Must be lower than `max_delay`. |
| `end_of_utterance_max_delay` | `float` | `None` | Maximum delay for end of utterance. Must be greater than `end_of_utterance_silence_trigger`. |
| `punctuation_overrides` | `dict` | `None` | Custom punctuation overrides for the STT engine. |
| `include_partials` | `bool` | `None` | Include partial word fragments in partial segment output. |
| `split_sentences` | `bool` | `None` | Emit finalized sentences mid-turn as they are completed. |
| `enable_diarization` | `bool` | `None` | Enable speaker diarization to attribute words to unique speakers. |
| `speaker_sensitivity` | `float` | `None` | Diarization sensitivity. Higher values help distinguish similar voices. |
| `max_speakers` | `int` | `None` | Maximum number of speakers to detect. Only use when the speaker count is known. |
| `prefer_current_speaker` | `bool` | `None` | Give extra weight to grouping nearby words as the same speaker. |
| `extra_params` | `dict` | `None` | Additional parameters passed to the STT engine. |
## End of Turn detection
The Speechmatics STT service supports Pipecat's own end of turn detection (Silero VAD and Smart Turn) without any additional configuration. When using Pipecat's features, the `turn_detection_mode` must be set to `TurnDetectionMode.EXTERNAL` (which is the default).
### Default mode
By default, Speechmatics uses signals from Pipecat's VAD / smart turn detection as input to trigger the end of turn and finalization of the current transcript segment. This provides a seamless integration where Pipecat's voice activity detection and turn detection work in conjunction with Speechmatics' real-time processing capabilities.
If you wish to use features such as focussing on or ignoring other speakers,
then you may see benefit from using `TurnDetectionMode.ADAPTIVE` or
`TurnDetectionMode.SMART_TURN` modes.
### Adaptive End of Turn detection
This mode looks at the content of the speech, pace of speaking and other acoustic information (using VAD) to determine when the user has finished speaking. This is especially important when using the plugin's ability to focus on a specific speaker and not have other speakers interrupt the agent / conversation.
To use this mode, set the `turn_detection_mode` to `TurnDetectionMode.ADAPTIVE` in your STT configuration. You must also remove any other VAD / smart turn features within Pipecat to ensure that there is not a conflict.
```python theme={null}
transport_params = TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
# vad_analyzer=... <- REMOVE (use Speechmatics' built-in VAD)
# turn_analyzer=... <- REMOVE (use Speechmatics' built-in end-of-turn detection)
)
...
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
settings=SpeechmaticsSTTService.Settings(
language=Language.EN,
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
speaker_active_format="<{speaker_id}>{text}{speaker_id}>",
),
)
```
### Smart Turn detection
Further to `ADAPTIVE`, Speechmatics also provides its own smart turn detection which combines VAD and the use of Smart Turn v3 from Pipecat. This can be enabled by setting the `turn_detection_mode` parameter to `TurnDetectionMode.SMART_TURN`.
```python theme={null}
transport_params = TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
# vad_analyzer=... <- REMOVE (use Speechmatics' built-in VAD)
# turn_analyzer=... <- REMOVE (use Speechmatics' built-in end-of-turn detection)
)
...
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
settings=SpeechmaticsSTTService.Settings(
language=Language.EN,
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.SMART_TURN,
speaker_active_format="<{speaker_id}>{text}{speaker_id}>",
),
)
```
## Speaker Diarization
Speechmatics STT supports speaker diarization, which separates out different speakers in the audio. The identity of each speaker is returned in the TranscriptionFrame objects in the `user_id` attribute.
If `speaker_active_format` or `speaker_passive_format` are provided, then the text output for the TranscriptionFrame will be formatted to this specification. Your system context can then be updated to include information about this format to understand which speaker spoke which words. The passive format is optional and is used when the engine has been told to focus on specific speakers and other speakers will then be formatted using the `speaker_passive_format` format.
* `speaker_active_format` -> the formatter for active speakers
* `speaker_passive_format` -> the formatter for passive / background speakers
Examples:
* `<{speaker_id}>{text}{speaker_id}>` -> `Good morning.`.
* `@{speaker_id}: {text}` -> `@S1: Good morning.`.
### Available attributes
| Attribute | Description | Example |
| ------------------ | ------------------------------------------- | ------------------------------- |
| `speaker_id` | The label of the speaker | `S1` |
| `text` / `content` | The transcribed text | `Good morning.` |
| `ts` | The timestamp of the transcription | `2025-09-15T19:47:29.096+00:00` |
| `start_time` | The start time of the transcription segment | `0.0` |
| `end_time` | The end time of the transcription segment | `2.5` |
| `lang` | The language of the transcription segment | `en` |
## Speaker Lock
In conjunction with speaker diarization, it is possible to decide at the start or during a conversation to focus on a specific speaker, ignore or retain words from other speakers, or implicitly ignore one or more speakers altogether.
In the example below, the following will happen:
* `S1` will be transcribed as normal and drive the end of turn and the conversation flow
* `S2` will be ignored completely
* All other speakers' words will be transcribed and emitted as tagged segments, but ONLY when a speaker in focus also speaks
What this means is that if `S3` says "Hello", then it is not until `S1` speaks again that the transcription will be emitted.
```python theme={null}
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
settings=SpeechmaticsSTTService.Settings(
language=Language.EN,
focus_speakers=["S1"],
ignore_speakers=["S2"],
focus_mode=SpeechmaticsSTTService.SpeakerFocusMode.RETAIN,
speaker_active_format="<{speaker_id}>{text}{speaker_id}>",
),
)
```
## Language Support
Refer to the [Speechmatics
docs](https://docs.speechmatics.com/introduction/supported-languages) for more
information on supported languages.
Speechmatics STT supports the following languages and regional variants.
Setting a language can be done using the `language` parameter when creating the STT object. The exception to this is English / Mandarin which has the code `cmn_en`.
| Language Code | Description | Locales |
| -------------- | ----------- | ------------------------- |
| `Language.AR` | Arabic | - |
| `Language.BA` | Bashkir | - |
| `Language.EU` | Basque | - |
| `Language.BE` | Belarusian | - |
| `Language.BG` | Bulgarian | - |
| `Language.BN` | Bengali | - |
| `Language.YUE` | Cantonese | - |
| `Language.CA` | Catalan | - |
| `Language.HR` | Croatian | - |
| `Language.CS` | Czech | - |
| `Language.DA` | Danish | - |
| `Language.NL` | Dutch | - |
| `Language.EN` | English | `en-US`, `en-GB`, `en-AU` |
| `Language.EO` | Esperanto | - |
| `Language.ET` | Estonian | - |
| `Language.FA` | Persian | - |
| `Language.FI` | Finnish | - |
| `Language.FR` | French | - |
| `Language.GL` | Galician | - |
| `Language.DE` | German | - |
| `Language.EL` | Greek | - |
| `Language.HE` | Hebrew | - |
| `Language.HI` | Hindi | - |
| `Language.HU` | Hungarian | - |
| `Language.IA` | Interlingua | - |
| `Language.IT` | Italian | - |
| `Language.ID` | Indonesian | - |
| `Language.GA` | Irish | - |
| `Language.JA` | Japanese | - |
| `Language.KO` | Korean | - |
| `Language.LV` | Latvian | - |
| `Language.LT` | Lithuanian | - |
| `Language.MS` | Malay | - |
| `Language.MT` | Maltese | - |
| `Language.CMN` | Mandarin | `cmn-Hans`, `cmn-Hant` |
| `Language.MR` | Marathi | - |
| `Language.MN` | Mongolian | - |
| `Language.NO` | Norwegian | - |
| `Language.PL` | Polish | - |
| `Language.PT` | Portuguese | - |
| `Language.RO` | Romanian | - |
| `Language.RU` | Russian | - |
| `Language.SK` | Slovakian | - |
| `Language.SL` | Slovenian | - |
| `Language.ES` | Spanish | - |
| `Language.SV` | Swedish | - |
| `Language.SW` | Swahili | - |
| `Language.TA` | Tamil | - |
| `Language.TH` | Thai | - |
| `Language.TR` | Turkish | - |
| `Language.UG` | Uyghur | - |
| `Language.UK` | Ukrainian | - |
| `Language.UR` | Urdu | - |
| `Language.VI` | Vietnamese | - |
| `Language.CY` | Welsh | - |
For bilingual transcription, use the `language` and `domain` parameters as follows:
| Language Code | Description | Domain Options |
| ------------- | ------------------ | -------------- |
| `cmn_en` | English / Mandarin | - |
| `en_ms` | English / Malay | - |
| `Language.ES` | English / Spanish | `bilingual-en` |
| `en_ta` | English / Tamil | - |
## Usage Examples
Examples are included in the Pipecat project:
* Using Speechmatics STT service -> [07a-interruptible-speechmatics.py](https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-speechmatics.py)
* Using Speechmatics STT service with VAD -> [07a-interruptible-speechmatics-vad.py](https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-speechmatics-vad.py)
* Transcribing with Speechmatics STT -> [13h-speechmatics-transcription.py](https://github.com/pipecat-ai/pipecat/blob/main/examples/transcription/transcription-speechmatics.py)
Sample projects:
* Guess Who -> [Guess Who](https://github.com/sam-s10s/pipecat-guess-who)
* Guess Who Board Game -> [Guess Who](https://github.com/sam-s10s/pipecat-guess-who-irl)
### Basic Configuration
Initialize the `SpeechmaticsSTTService` and use it in a pipeline:
```python theme={null}
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.transcriptions.language import Language
# Configure service
stt = SpeechmaticsSTTService(
api_key="your-api-key",
settings=SpeechmaticsSTTService.Settings(
language=Language.FR,
)
)
# Use in pipeline
pipeline = Pipeline([
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant()
])
```
### With Diarization
This will enable diarization and also only go to the LLM if words are spoken from the first speaker (`S1`). Words from other speakers are transcribed but only sent when the first speaker speaks. When using the `TurnDetectionMode.ADAPTIVE` or `TurnDetectionMode.SMART_TURN` options, this will use the speaker diarization to determine when a speaker is speaking. You will need to disable VAD options within the selected transport object to ensure this works correctly (see [07b-interruptible-speechmatics-vad.py](https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-speechmatics-vad.py) as an example).
Initialize the `SpeechmaticsSTTService` and use it in a pipeline:
```python theme={null}
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.transcriptions.language import Language
# Configure service
stt = SpeechmaticsSTTService(
api_key="your-api-key",
settings=SpeechmaticsSTTService.Settings(
language=Language.EN,
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
focus_speakers=["S1"],
speaker_active_format="<{speaker_id}>{text}{speaker_id}>",
speaker_passive_format="<{speaker_id}>{text}{speaker_id}>",
)
)
# Use in pipeline
pipeline = Pipeline([
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant()
])
```
## Additional Notes
* **Connection Management**: Automatically handles WebSocket connections and reconnections
* **Sample Rate**: The default sample rate of `16000` in `pcm_s16le` format
* **VAD Integration**: Optionally supports Speechmatics' built-in VAD and end of utterance detection
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Event Handlers
In addition to the standard [service connection events](/api-reference/server/events/service-events) (`on_connected`, `on_disconnected`, `on_connection_error`), Speechmatics provides:
| Event | Description |
| -------------------- | -------------------------------------- |
| `on_speakers_result` | Speaker identification result received |
```python theme={null}
@stt.event_handler("on_speakers_result")
async def on_speakers_result(service, message):
print(f"Speaker result: {message}")
```
# Whisper
Source: https://docs.pipecat.ai/api-reference/server/services/stt/whisper
Speech-to-text service implementation using locally-downloaded Whisper models
## Overview
`WhisperSTTService` provides offline speech recognition using OpenAI's Whisper models running locally. Supports multiple model sizes and hardware acceleration options including CPU, CUDA, and Apple Silicon (MLX) for privacy-focused transcription without external API calls.
Pipecat's API methods for Whisper STT integration
Complete example with standard Whisper
OpenAI's Whisper research paper and model details
Apple Silicon optimized example
## Installation
Choose your installation based on your hardware:
### Standard Whisper (CPU/CUDA)
```bash theme={null}
pip install "pipecat-ai[whisper]"
```
### MLX Whisper (Apple Silicon)
```bash theme={null}
pip install "pipecat-ai[mlx-whisper]"
```
## Prerequisites
### Local Model Setup
Before using Whisper STT services, you need:
1. **Model Selection**: Choose appropriate Whisper model size (tiny, base, small, medium, large)
2. **Hardware Configuration**: Set up CPU, CUDA, or Apple Silicon acceleration
3. **Storage Space**: Ensure sufficient disk space for model downloads
### Configuration Options
* **Model Size**: Balance between accuracy and performance based on your hardware
* **Hardware Acceleration**: Configure CUDA for NVIDIA GPUs or MLX for Apple Silicon
* **Language Support**: Whisper supports 99+ languages out of the box
No API keys required - Whisper runs entirely locally for complete privacy.
## Configuration
### WhisperSTTService
Uses Faster Whisper for efficient local transcription on CPU or CUDA devices.
Whisper model to use. Can be a `Model` enum value or a string. Available
models: `TINY`, `BASE`, `SMALL`, `MEDIUM`, `LARGE` (large-v3),
`LARGE_V3_TURBO`, `DISTIL_LARGE_V2`, `DISTIL_MEDIUM_EN` (English-only).
*Deprecated in v0.0.105. Use `settings=WhisperSTTService.Settings(...)`
instead.*
Device for inference. Options: `"cpu"`, `"cuda"`, or `"auto"` (auto-detect).
Compute type for inference. Options include `"default"`, `"int8"`,
`"int8_float16"`, `"float16"`, etc.
Probability threshold for filtering out non-speech segments. Segments with a
no-speech probability above this value are excluded. *Deprecated in v0.0.105.
Use `settings=WhisperSTTService.Settings(...)` instead.*
Default language for transcription. *Deprecated in v0.0.105. Use
`settings=WhisperSTTService.Settings(...)` instead.*
Runtime-configurable settings for the STT service. See [WhisperSTTService
Settings](#whispersttservice-settings) below.
### WhisperSTTServiceMLX
Optimized for Apple Silicon using MLX Whisper. Models are loaded on demand.
MLX Whisper model to use. Can be an `MLXModel` enum value or a string.
Available models: `TINY`, `MEDIUM`, `LARGE_V3`, `LARGE_V3_TURBO`,
`DISTIL_LARGE_V3`, `LARGE_V3_TURBO_Q4` (quantized). *Deprecated in v0.0.105.
Use `settings=WhisperSTTServiceMLX.Settings(...)` instead.*
Probability threshold for filtering out non-speech segments. *Deprecated in
v0.0.105. Use `settings=WhisperSTTServiceMLX.Settings(...)` instead.*
Default language for transcription. *Deprecated in v0.0.105. Use
`settings=WhisperSTTServiceMLX.Settings(...)` instead.*
Sampling temperature. Lower values produce more deterministic results.
*Deprecated in v0.0.105. Use `settings=WhisperSTTServiceMLX.Settings(...)`
instead.*
Runtime-configurable settings for the MLX STT service. See
[WhisperSTTServiceMLX Settings](#whispersttservicemlx-settings) below.
### WhisperSTTService Settings
Runtime-configurable settings passed via the `settings` constructor argument using `WhisperSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------------- | ----------------- | ------------------------ | ------------------------------------------------------------------------- |
| `model` | `str` | `Model.DISTIL_MEDIUM_EN` | Whisper model to use. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `Language.EN` | Default language for transcription. *(Inherited from base STT settings.)* |
| `no_speech_prob` | `float` | `0.4` | Probability threshold for filtering out non-speech segments. |
### WhisperSTTServiceMLX Settings
Runtime-configurable settings passed via the `settings` constructor argument using `WhisperSTTServiceMLX.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------------- | ----------------- | --------------- | ------------------------------------------------------------------------- |
| `model` | `str` | `MLXModel.TINY` | MLX Whisper model to use. *(Inherited from base STT settings.)* |
| `language` | `Language \| str` | `Language.EN` | Default language for transcription. *(Inherited from base STT settings.)* |
| `no_speech_prob` | `float` | `0.6` | Probability threshold for filtering out non-speech segments. |
| `temperature` | `float` | `0.0` | Sampling temperature. Lower values are more deterministic. |
| `engine` | `str` | `"mlx"` | Whisper engine identifier. |
## Usage
### Basic Faster Whisper Setup
```python theme={null}
from pipecat.services.whisper.stt import WhisperSTTService
stt = WhisperSTTService(
settings=WhisperSTTService.Settings(
model="base",
),
)
```
### With CUDA Acceleration
```python theme={null}
from pipecat.services.whisper.stt import WhisperSTTService, Model
stt = WhisperSTTService(
device="cuda",
compute_type="float16",
settings=WhisperSTTService.Settings(
model=Model.LARGE,
),
)
```
### With Custom Language
```python theme={null}
from pipecat.services.whisper.stt import WhisperSTTService, Model
from pipecat.transcriptions.language import Language
stt = WhisperSTTService(
settings=WhisperSTTService.Settings(
model=Model.MEDIUM,
language=Language.FR,
no_speech_prob=0.5,
),
)
```
### MLX Whisper on Apple Silicon
```python theme={null}
from pipecat.services.whisper.stt import WhisperSTTServiceMLX, MLXModel
from pipecat.transcriptions.language import Language
stt = WhisperSTTServiceMLX(
settings=WhisperSTTServiceMLX.Settings(
model=MLXModel.LARGE_V3_TURBO,
language=Language.EN,
temperature=0.0,
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **First run downloads**: If the selected model hasn't been downloaded previously, the first run will download it from the Hugging Face model hub. This may take significant time depending on model size.
* **Segmented transcription**: Both `WhisperSTTService` and `WhisperSTTServiceMLX` extend `SegmentedSTTService`, meaning they process complete audio segments after VAD detects the user has stopped speaking.
* **No-speech filtering**: The `no_speech_prob` threshold helps filter out hallucinations. Increase it to be more permissive, decrease it to filter more aggressively.
* **MLX quantization**: The `LARGE_V3_TURBO_Q4` model provides reduced memory usage with minimal quality loss on Apple Silicon.
* **Language support**: Whisper supports 99+ languages. Use the `Language` enum for type-safe language selection. The default is `Language.EN` (English). Set `language=None` in settings to enable automatic language detection, which will transcribe whatever language the user speaks.
# Supported Services
Source: https://docs.pipecat.ai/api-reference/server/services/supported-services
AI services integrated with Pipecat and their setup requirements
## Transports
Transports exchange audio and video streams between the user and bot.
| Service | Setup |
| --------------------------------------------------------------------------------------- | -------------------------------------- |
| [DailyTransport](/api-reference/server/services/transport/daily) | `pip install "pipecat-ai[daily]"` |
| [FastAPIWebSocketTransport](/api-reference/server/services/transport/fastapi-websocket) | `pip install "pipecat-ai[websocket]"` |
| [HeyGenTransport](/api-reference/server/services/transport/heygen) | `pip install "pipecat-ai[heygen]"` |
| [LemonSliceTransport](/api-reference/server/services/transport/lemonslice) | `pip install "pipecat-ai[lemonslice]"` |
| [LiveKitTransport](/api-reference/server/services/transport/livekit) | `pip install "pipecat-ai[livekit]"` |
| [SmallWebRTCTransport](/api-reference/server/services/transport/small-webrtc) | `pip install "pipecat-ai[webrtc]"` |
| [TavusTransport](/api-reference/server/services/transport/tavus) | `pip install "pipecat-ai[tavus]"` |
| [WebSocket Transports](/api-reference/server/services/transport/websocket-server) | `pip install "pipecat-ai[websocket]"` |
| [WhatsAppTransport](/api-reference/server/services/transport/whatsapp) | `pip install "pipecat-ai[webrtc]"` |
## Serializers
Serializers convert between frames and media streams, enabling real-time communication over a websocket.
| Service | Setup |
| ------------------------------------------------------------- | ------------------------ |
| [Exotel](/api-reference/server/services/serializers/exotel) | No dependencies required |
| [Genesys](/api-reference/server/services/serializers/genesys) | No dependencies required |
| [Plivo](/api-reference/server/services/serializers/plivo) | No dependencies required |
| [Telnyx](/api-reference/server/services/serializers/telnyx) | No dependencies required |
| [Twilio](/api-reference/server/services/serializers/twilio) | No dependencies required |
| [Vonage](/api-reference/server/services/serializers/vonage) | No dependencies required |
## Speech-to-Text
Speech-to-Text services receive and audio input and output transcriptions.
| Service | Setup |
| --------------------------------------------------------------- | ---------------------------------------- |
| [AssemblyAI](/api-reference/server/services/stt/assemblyai) | `pip install "pipecat-ai[assemblyai]"` |
| [AWS Transcribe](/api-reference/server/services/stt/aws) | `pip install "pipecat-ai[aws]"` |
| [Azure](/api-reference/server/services/stt/azure) | `pip install "pipecat-ai[azure]"` |
| [Cartesia](/api-reference/server/services/stt/cartesia) | `pip install "pipecat-ai[cartesia]"` |
| [Deepgram](/api-reference/server/services/stt/deepgram) | `pip install "pipecat-ai[deepgram]"` |
| [ElevenLabs](/api-reference/server/services/stt/elevenlabs) | `pip install "pipecat-ai[elevenlabs]"` |
| [Fal Wizper](/api-reference/server/services/stt/fal) | `pip install "pipecat-ai[fal]"` |
| [Gladia](/api-reference/server/services/stt/gladia) | `pip install "pipecat-ai[gladia]"` |
| [Google](/api-reference/server/services/stt/google) | `pip install "pipecat-ai[google]"` |
| [Gradium](/api-reference/server/services/stt/gradium) | `pip install "pipecat-ai[gradium]"` |
| [Groq (Whisper)](/api-reference/server/services/stt/groq) | `pip install "pipecat-ai[groq]"` |
| [NVIDIA](/api-reference/server/services/stt/nvidia) | `pip install "pipecat-ai[nvidia]"` |
| [OpenAI](/api-reference/server/services/stt/openai) | `pip install "pipecat-ai[openai]"` |
| [Sarvam](/api-reference/server/services/stt/sarvam) | `pip install "pipecat-ai[sarvam]"` |
| [Soniox](/api-reference/server/services/stt/soniox) | `pip install "pipecat-ai[soniox]"` |
| [Speechmatics](/api-reference/server/services/stt/speechmatics) | `pip install "pipecat-ai[speechmatics]"` |
| [Whisper](/api-reference/server/services/stt/whisper) | `pip install "pipecat-ai[whisper]"` |
## Large Language Models
LLMs receive text or audio based input and output a streaming text response.
| Service | Setup |
| ----------------------------------------------------------------------- | -------------------------------------- |
| [Anthropic](/api-reference/server/services/llm/anthropic) | `pip install "pipecat-ai[anthropic]"` |
| [AWS Bedrock](/api-reference/server/services/llm/aws) | `pip install "pipecat-ai[aws]"` |
| [Azure](/api-reference/server/services/llm/azure) | `pip install "pipecat-ai[azure]"` |
| [Cerebras](/api-reference/server/services/llm/cerebras) | `pip install "pipecat-ai[cerebras]"` |
| [DeepSeek](/api-reference/server/services/llm/deepseek) | `pip install "pipecat-ai[deepseek]"` |
| [Fireworks AI](/api-reference/server/services/llm/fireworks) | `pip install "pipecat-ai[fireworks]"` |
| [Google Gemini](/api-reference/server/services/llm/google) | `pip install "pipecat-ai[google]"` |
| [Google Vertex AI](/api-reference/server/services/llm/google-vertex) | `pip install "pipecat-ai[google]"` |
| [Grok](/api-reference/server/services/llm/grok) | `pip install "pipecat-ai[grok]"` |
| [Groq](/api-reference/server/services/llm/groq) | `pip install "pipecat-ai[groq]"` |
| [Mistral](/api-reference/server/services/llm/mistral) | `pip install "pipecat-ai[mistral]"` |
| [NVIDIA](/api-reference/server/services/llm/nvidia) | `pip install "pipecat-ai[nvidia]"` |
| [Novita AI](/api-reference/server/services/llm/novita) | `pip install "pipecat-ai[novita]"` |
| [Ollama](/api-reference/server/services/llm/ollama) | `pip install "pipecat-ai[ollama]"` |
| [OpenAI](/api-reference/server/services/llm/openai) | `pip install "pipecat-ai[openai]"` |
| [OpenAI Responses](/api-reference/server/services/llm/openai-responses) | `pip install "pipecat-ai[openai]"` |
| [OpenPipe](/api-reference/server/services/llm/openpipe) | `pip install "pipecat-ai[openpipe]"` |
| [OpenRouter](/api-reference/server/services/llm/openrouter) | `pip install "pipecat-ai[openrouter]"` |
| [Perplexity](/api-reference/server/services/llm/perplexity) | `pip install "pipecat-ai[perplexity]"` |
| [Qwen](/api-reference/server/services/llm/qwen) | `pip install "pipecat-ai[qwen]"` |
| [SambaNova](/api-reference/server/services/llm/sambanova) | `pip install "pipecat-ai[sambanova]"` |
| [Sarvam](/api-reference/server/services/llm/sarvam) | `pip install "pipecat-ai[sarvam]"` |
| [Together AI](/api-reference/server/services/llm/together) | `pip install "pipecat-ai[together]"` |
## Text-to-Speech
Text-to-Speech services receive text input and output audio streams or chunks.
| Service | Setup |
| --------------------------------------------------------------- | ---------------------------------------- |
| [Async](/api-reference/server/services/tts/asyncai) | `pip install "pipecat-ai[asyncai]"` |
| [AWS Polly](/api-reference/server/services/tts/aws) | `pip install "pipecat-ai[aws]"` |
| [Azure](/api-reference/server/services/tts/azure) | `pip install "pipecat-ai[azure]"` |
| [Camb AI](/api-reference/server/services/tts/camb) | `pip install "pipecat-ai[camb]"` |
| [Cartesia](/api-reference/server/services/tts/cartesia) | `pip install "pipecat-ai[cartesia]"` |
| [Deepgram](/api-reference/server/services/tts/deepgram) | `pip install "pipecat-ai[deepgram]"` |
| [ElevenLabs](/api-reference/server/services/tts/elevenlabs) | `pip install "pipecat-ai[elevenlabs]"` |
| [Fish](/api-reference/server/services/tts/fish) | `pip install "pipecat-ai[fish]"` |
| [Google](/api-reference/server/services/tts/google) | `pip install "pipecat-ai[google]"` |
| [Gradium](/api-reference/server/services/tts/gradium) | `pip install "pipecat-ai[gradium]"` |
| [Groq](/api-reference/server/services/tts/groq) | `pip install "pipecat-ai[groq]"` |
| [Hume](/api-reference/server/services/tts/hume) | `pip install "pipecat-ai[hume]"` |
| [Inworld](/api-reference/server/services/tts/inworld) | No dependencies required |
| [Kokoro](/api-reference/server/services/tts/kokoro) | `pip install "pipecat-ai[kokoro]"` |
| [LMNT](/api-reference/server/services/tts/lmnt) | `pip install "pipecat-ai[lmnt]"` |
| [MiniMax](/api-reference/server/services/tts/minimax) | No dependencies required |
| [Neuphonic](/api-reference/server/services/tts/neuphonic) | `pip install "pipecat-ai[neuphonic]"` |
| [NVIDIA](/api-reference/server/services/tts/nvidia) | `pip install "pipecat-ai[nvidia]"` |
| [OpenAI](/api-reference/server/services/tts/openai) | `pip install "pipecat-ai[openai]"` |
| [Piper](/api-reference/server/services/tts/piper) | No dependencies required |
| [ResembleAI](/api-reference/server/services/tts/resembleai) | `pip install "pipecat-ai[resembleai]"` |
| [Rime](/api-reference/server/services/tts/rime) | `pip install "pipecat-ai[rime]"` |
| [Sarvam](/api-reference/server/services/tts/sarvam) | No dependencies required |
| [Smallest AI](/api-reference/server/services/tts/smallest) | `pip install "pipecat-ai[smallest]"` |
| [Speechmatics](/api-reference/server/services/tts/speechmatics) | `pip install "pipecat-ai[speechmatics]"` |
| [xAI](/api-reference/server/services/tts/xai) | `pip install "pipecat-ai[xai]"` |
| [XTTS](/api-reference/server/services/tts/xtts) | `pip install "pipecat-ai[xtts]"` |
## Speech-to-Speech
Speech-to-Speech services are multi-modal LLM services that take in audio, video, or text and output audio or text.
| Service | Setup |
| ------------------------------------------------------------------------------ | ------------------------------------------ |
| [AWS Nova Sonic](/api-reference/server/services/s2s/aws) | `pip install "pipecat-ai[aws-nova-sonic]"` |
| [Gemini Live](/api-reference/server/services/s2s/gemini-live) | `pip install "pipecat-ai[google]"` |
| [Gemini Live Vertex AI](/api-reference/server/services/s2s/gemini-live-vertex) | `pip install "pipecat-ai[google]"` |
| [Grok Voice Agent](/api-reference/server/services/s2s/grok) | `pip install "pipecat-ai[grok]"` |
| [OpenAI Realtime](/api-reference/server/services/s2s/openai) | `pip install "pipecat-ai[openai]"` |
| [Ultravox](/api-reference/server/services/s2s/ultravox) | `pip install "pipecat-ai[ultravox]"` |
## Image Generation
Image generation services receive text inputs and output images.
| Service | Setup |
| ---------------------------------------------------------------- | ---------------------------------- |
| [Azure](/api-reference/server/services/image-generation/azure) | `pip install "pipecat-ai[azure]"` |
| [fal](/api-reference/server/services/image-generation/fal) | `pip install "pipecat-ai[fal]"` |
| [Google](/api-reference/server/services/image-generation/google) | `pip install "pipecat-ai[google]"` |
| [OpenAI](/api-reference/server/services/image-generation/openai) | `pip install "pipecat-ai[openai]"` |
## Video
Video services enable you to build an avatar where audio and video are synchronized.
| Service | Setup |
| ----------------------------------------------------- | ---------------------------------- |
| [HeyGen](/api-reference/server/services/video/heygen) | `pip install "pipecat-ai[heygen]"` |
| [Simli](/api-reference/server/services/video/simli) | `pip install "pipecat-ai[simli]"` |
| [Tavus](/api-reference/server/services/video/tavus) | `pip install "pipecat-ai[tavus]"` |
## Memory
Memory services can be used to store and retrieve conversations.
| Service | Setup |
| -------------------------------------------------- | -------------------------------- |
| [mem0](/api-reference/server/services/memory/mem0) | `pip install "pipecat-ai[mem0]"` |
## Vision
Vision services receive a streaming video input and output text describing the video input.
| Service | Setup |
| ------------------------------------------------------------ | ------------------------------------- |
| [Moondream](/api-reference/server/services/vision/moondream) | `pip install "pipecat-ai[moondream]"` |
## Analytics & Monitoring
Analytics services help you better understand how your service operates.
| Service | Setup |
| --------------------------------------------------------- | ---------------------------------- |
| [Sentry](/api-reference/server/services/analytics/sentry) | `pip install "pipecat-ai[sentry]"` |
# DailyTransport
Source: https://docs.pipecat.ai/api-reference/server/services/transport/daily
WebRTC transport implementation using Daily for real-time audio/video communication
## Overview
`DailyTransport` provides real-time audio and video communication using Daily's hosted WebRTC platform. It handles bidirectional media streams, participant management, transcription, recording, and telephony features without requiring your own WebRTC infrastructure. Daily manages all the complexity of WebRTC connections, NAT traversal, and global media routing.
Transport methods and configuration options
Complete Daily transport voice agent example
Official Daily REST API and platform documentation
Sign up for Daily API access and room management
## Installation
To use `DailyTransport`, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[daily]"
```
## Prerequisites
### Daily Account Setup
Before using DailyTransport, you need:
1. **Daily Account**: Sign up at [Daily Dashboard](https://dashboard.daily.co/u/signup?pipecat=true)
2. **API Key**: Generate a Daily API key from your dashboard
3. **Room Creation**: Daily rooms must be created before connecting (see Usage section)
### Required Environment Variables
* `DAILY_API_KEY`: Your Daily API key for room creation and management
## Key Features
* **Hosted WebRTC**: No infrastructure setup required - Daily handles all WebRTC complexity
* **Multi-participant Support**: Handle multiple participants with individual audio/video tracks
* **Multi-track Audio/Video**: Publish multiple custom audio and video tracks simultaneously with per-track configuration
* **Built-in Transcription**: Real-time speech-to-text with Deepgram integration
* **Telephony Integration**: \[Dial-in/dial-out support for phone numbers via [SIP](/pipecat/telephony/twilio-daily-sip)/[PSTN](/pipecat/telephony/daily-pstn)
* **Recording & Streaming**: Built-in call recording and live streaming capabilities
* **Global Infrastructure**: Daily's edge network ensures low latency worldwide
* **Advanced Controls**: Participant management, permissions, and media routing
## Configuration
### DailyTransport
URL of the Daily room to connect to.
Authentication token for the room. Required for private rooms or when specific
permissions are needed.
Display name for the bot in the call.
Transport configuration parameters. See [DailyParams](#dailyparams) below and
[TransportParams](/api-reference/server/services/transport/transport-params) for inherited
base parameters.
Optional name for the input transport processor.
Optional name for the output transport processor.
### DailyParams
Inherits all parameters from [TransportParams](/api-reference/server/services/transport/transport-params) (audio, video, VAD settings) with these additional fields:
Daily API base URL.
Daily API authentication key.
Receive users' audio in separate tracks rather than a mixed stream.
Settings for dial-in functionality. See
[DailyDialinSettings](#dailydialinsettings) below.
Whether to enable the main camera output track.
Per-destination configuration for custom audio tracks. See
[DailyCustomAudioTrackParams](#dailycustomaudiotrackparams) below.
Per-destination configuration for custom video tracks. See
[DailyCustomVideoTrackParams](#dailycustomvideotrackparams) below.
Whether to enable the main microphone track.
Whether to enable Daily's built-in speech transcription (powered by Deepgram).
Configuration for the transcription service. See
[DailyTranscriptionSettings](#dailytranscriptionsettings) below.
### DailyDialinSettings
Settings for Daily's dial-in (SIP) functionality.
Call ID (UUID) representing the session ID in the SIP network.
Call domain (UUID) representing your Daily domain on the SIP network.
### DailyTranscriptionSettings
Configuration for Daily's built-in transcription service (Deepgram).
ISO language code for transcription.
Deepgram transcription model to use.
Whether to filter profanity from transcripts.
Whether to redact sensitive information.
Whether to use endpointing to determine speech segments.
Whether to add punctuation to transcripts.
Whether to include raw response data from Deepgram.
Additional parameters passed to the Deepgram transcription service.
### DailyCustomAudioTrackParams
Configuration for a custom audio track. If `send_settings` is not provided, the track will use the default audio publishing settings (bitrate, channel config, etc.).
Audio sample rate in Hz. Defaults to transport's output sample rate.
Number of audio channels.
Optional Daily sendSettings dict for this track. See [Daily
AudioPublishingSettings](https://reference-python.daily.co/types.html#audiopublishingsettings).
### DailyCustomVideoTrackParams
Configuration for a custom video track. If `send_settings` is not provided, the track will use the default video publishing settings (framerate, bitrate, codec, etc.).
Video width in pixels.
Video height in pixels.
Video color format (e.g., "RGB", "RGBA", "BGRA").
Optional Daily sendSettings dict for this track. See [Daily
VideoPublishingSettings](https://reference-python.daily.co/types.html#videopublishingsettings).
## Usage
DailyTransport connects your Pipecat bot to Daily rooms where it can communicate with participants through audio, video, and data channels. Rooms must be created using the Daily API before your bot can join.
The transport integrates with Pipecat's pipeline to process participant audio through your STT, LLM, and TTS services, then send responses back to participants.
See the [complete example](https://github.com/pipecat-ai/pipecat/blob/main/examples/transports/transports-daily.py) for a full implementation including:
* Daily room creation and token management
* Transport configuration with transcription and VAD
* Pipeline integration with participant event handling
* Advanced features like recording and dial-out
## Event Handlers
DailyTransport provides event handlers for room lifecycle, participant management, messaging, telephony, and recording. Register handlers using the `@event_handler` decorator on the transport instance.
### Events Summary
| Event | Description |
| ----------------------------- | --------------------------------- |
| `on_joined` | Bot joined the room |
| `on_connected` | Bot connected to the room |
| `on_left` | Bot left the room |
| `on_before_leave` | About to leave the room (sync) |
| `on_error` | Transport error occurred |
| `on_call_state_updated` | Call state changed |
| `on_first_participant_joined` | First participant joined the room |
| `on_participant_joined` | A participant joined |
| `on_participant_left` | A participant left |
| `on_participant_updated` | A participant's state updated |
| `on_client_connected` | A participant connected |
| `on_client_disconnected` | A participant disconnected |
| `on_active_speaker_changed` | Active speaker changed |
| `on_app_message` | App message received |
| `on_transcription_message` | Transcription message received |
| `on_recording_started` | Recording started |
| `on_recording_stopped` | Recording stopped |
| `on_recording_error` | Recording error occurred |
| `on_dialin_connected` | Dial-in call connected |
| `on_dialin_ready` | Dial-in SIP endpoint ready |
| `on_dialin_stopped` | Dial-in call stopped |
| `on_dialin_error` | Dial-in error |
| `on_dialin_warning` | Dial-in warning |
| `on_dialout_answered` | Dial-out call answered |
| `on_dialout_connected` | Dial-out call connected |
| `on_dialout_stopped` | Dial-out call stopped |
| `on_dialout_error` | Dial-out error |
| `on_dialout_warning` | Dial-out warning |
| `on_dtmf_event` | DTMF keypad tone received |
### Room Lifecycle
#### on\_joined
Fired when the bot successfully joins the Daily room.
```python theme={null}
@transport.event_handler("on_joined")
async def on_joined(transport, data):
print(f"Bot joined the room: {data}")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ---------------- | -------------------------- |
| `transport` | `DailyTransport` | The transport instance |
| `data` | `dict` | Join event data from Daily |
#### on\_connected
Fired when the bot connects to the Daily room. This is an alias for `on_joined` that provides a consistent event name across all transport types.
```python theme={null}
@transport.event_handler("on_connected")
async def on_connected(transport, data):
print(f"Bot connected to the room: {data}")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ---------------- | -------------------------- |
| `transport` | `DailyTransport` | The transport instance |
| `data` | `dict` | Join event data from Daily |
#### on\_left
Fired when the bot leaves the Daily room.
```python theme={null}
@transport.event_handler("on_left")
async def on_left(transport):
print("Bot left the room")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ---------------- | ---------------------- |
| `transport` | `DailyTransport` | The transport instance |
#### on\_before\_leave
Fired synchronously just before the bot leaves the room. Use this for cleanup that must happen before disconnection, such as stopping transcription.
```python theme={null}
@transport.event_handler("on_before_leave")
async def on_before_leave(transport):
print("About to leave the room...")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ---------------- | ---------------------- |
| `transport` | `DailyTransport` | The transport instance |
This is a **synchronous** event — the bot will not leave the room until all
handlers complete. Keep handlers fast.
#### on\_error
Fired when a transport-level error occurs.
```python theme={null}
@transport.event_handler("on_error")
async def on_error(transport, error):
print(f"Transport error: {error}")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ---------------- | ---------------------- |
| `transport` | `DailyTransport` | The transport instance |
| `error` | `str` | Error message |
#### on\_call\_state\_updated
Fired when the call state changes (e.g., joining, joined, leaving, left).
```python theme={null}
@transport.event_handler("on_call_state_updated")
async def on_call_state_updated(transport, state):
print(f"Call state: {state}")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ---------------- | ---------------------- |
| `transport` | `DailyTransport` | The transport instance |
| `state` | `str` | The new call state |
### Participants
#### on\_first\_participant\_joined
Fired when the first participant (other than the bot) joins the room. This is commonly used to start the conversation.
```python theme={null}
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await task.queue_frame(TTSSpeakFrame("Hello! How can I help you today?"))
```
**Parameters:**
| Parameter | Type | Description |
| ------------- | ---------------- | --------------------------- |
| `transport` | `DailyTransport` | The transport instance |
| `participant` | `dict` | Participant data from Daily |
#### on\_participant\_joined
Fired when any participant joins the room.
```python theme={null}
@transport.event_handler("on_participant_joined")
async def on_participant_joined(transport, participant):
print(f"Participant joined: {participant['id']}")
```
**Parameters:**
| Parameter | Type | Description |
| ------------- | ---------------- | --------------------------- |
| `transport` | `DailyTransport` | The transport instance |
| `participant` | `dict` | Participant data from Daily |
When a participant joins, both `on_participant_joined` and
`on_client_connected` fire. Use `on_first_participant_joined` if you only need
to react to the first participant.
#### on\_participant\_left
Fired when a participant leaves the room.
```python theme={null}
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
print(f"Participant {participant['id']} left: {reason}")
```
**Parameters:**
| Parameter | Type | Description |
| ------------- | ---------------- | --------------------------- |
| `transport` | `DailyTransport` | The transport instance |
| `participant` | `dict` | Participant data from Daily |
| `reason` | `str` | Reason the participant left |
#### on\_participant\_updated
Fired when a participant's state changes (e.g., audio/video tracks enabled/disabled).
```python theme={null}
@transport.event_handler("on_participant_updated")
async def on_participant_updated(transport, participant):
print(f"Participant updated: {participant['id']}")
```
**Parameters:**
| Parameter | Type | Description |
| ------------- | ---------------- | ----------------------------------- |
| `transport` | `DailyTransport` | The transport instance |
| `participant` | `dict` | Updated participant data from Daily |
#### on\_client\_connected / on\_client\_disconnected
Transport-agnostic aliases that fire alongside `on_participant_joined` and `on_participant_left` respectively, for compatibility with other transports. Same parameters as their counterparts.
```python theme={null}
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, participant):
print(f"Client connected: {participant['id']}")
```
#### on\_active\_speaker\_changed
Fired when the active speaker in the room changes.
```python theme={null}
@transport.event_handler("on_active_speaker_changed")
async def on_active_speaker_changed(transport, participant):
print(f"Active speaker: {participant['id']}")
```
**Parameters:**
| Parameter | Type | Description |
| ------------- | ---------------- | ------------------------------- |
| `transport` | `DailyTransport` | The transport instance |
| `participant` | `dict` | Active speaker participant data |
### Messaging
#### on\_app\_message
Fired when an app message is received from a participant.
```python theme={null}
@transport.event_handler("on_app_message")
async def on_app_message(transport, message, sender):
print(f"Message from {sender}: {message}")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ---------------- | --------------------------- |
| `transport` | `DailyTransport` | The transport instance |
| `message` | `Any` | The message content |
| `sender` | `str` | The sender's participant ID |
#### on\_transcription\_message
Fired when a transcription message is received from Daily's built-in transcription service.
```python theme={null}
@transport.event_handler("on_transcription_message")
async def on_transcription_message(transport, message):
print(f"Transcription: {message['text']}")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ---------------- | ----------------------------------------------------------------------------------------- |
| `transport` | `DailyTransport` | The transport instance |
| `message` | `dict` | Transcription message with `text`, `participantId`, `timestamp`, and `rawResponse` fields |
### Recording
#### on\_recording\_started / on\_recording\_stopped / on\_recording\_error
Events for monitoring Daily's built-in recording feature.
```python theme={null}
@transport.event_handler("on_recording_started")
async def on_recording_started(transport, status):
print(f"Recording started: {status}")
@transport.event_handler("on_recording_error")
async def on_recording_error(transport, stream_id, message):
print(f"Recording error for {stream_id}: {message}")
```
**Parameters:**
| Event | Parameters |
| ---------------------- | ----------------------------------------------- |
| `on_recording_started` | `transport`, `status` (str) |
| `on_recording_stopped` | `transport`, `stream_id` (str) |
| `on_recording_error` | `transport`, `stream_id` (str), `message` (str) |
### Telephony: Dial-in
Events for monitoring incoming phone calls. See the [telephony guides](/pipecat/telephony/overview) for setup details.
#### on\_dialin\_ready
Fired when the dial-in SIP endpoint is ready to receive calls. If `dialin_settings` are configured, Pipecat automatically calls the Daily `pinlessCallUpdate` API.
```python theme={null}
@transport.event_handler("on_dialin_ready")
async def on_dialin_ready(transport, sip_endpoint):
print(f"Dial-in ready at: {sip_endpoint}")
```
**Parameters:**
| Parameter | Type | Description |
| -------------- | ---------------- | ---------------------- |
| `transport` | `DailyTransport` | The transport instance |
| `sip_endpoint` | `str` | The SIP endpoint URI |
#### on\_dialin\_connected / on\_dialin\_stopped / on\_dialin\_error / on\_dialin\_warning
Lifecycle events for dial-in calls. All receive `(transport, data)` where `data` is a `dict` with event details.
```python theme={null}
@transport.event_handler("on_dialin_connected")
async def on_dialin_connected(transport, data):
print(f"Dial-in connected: {data}")
@transport.event_handler("on_dialin_error")
async def on_dialin_error(transport, data):
print(f"Dial-in error: {data}")
```
### Telephony: Dial-out
Events for monitoring outgoing phone calls. All receive `(transport, data)` where `data` is a `dict` with event details.
| Event | Description |
| ---------------------- | -------------------------- |
| `on_dialout_answered` | Dial-out call was answered |
| `on_dialout_connected` | Dial-out call connected |
| `on_dialout_stopped` | Dial-out call stopped |
| `on_dialout_error` | Dial-out error occurred |
| `on_dialout_warning` | Dial-out warning |
```python theme={null}
@transport.event_handler("on_dialout_answered")
async def on_dialout_answered(transport, data):
print(f"Dial-out answered: {data}")
@transport.event_handler("on_dialout_error")
async def on_dialout_error(transport, data):
print(f"Dial-out error: {data}")
```
### Telephony: DTMF
#### on\_dtmf\_event
Fired when a DTMF (dual-tone multi-frequency) keypad tone is received from a phone caller. The transport automatically pushes an `InputDTMFFrame` into the pipeline, enabling your bot to react to keypad presses.
```python theme={null}
@transport.event_handler("on_dtmf_event")
async def on_dtmf_event(transport, data):
tone = data["tone"]
print(f"Received DTMF tone: {tone}")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ---------------- | ----------------------------------------------------- |
| `transport` | `DailyTransport` | The transport instance |
| `data` | `dict` | DTMF event data from Daily containing `tone` (string) |
DTMF tones are automatically converted to `InputDTMFFrame` and pushed into the
pipeline. You can handle these frames in your processors to implement IVR
menus or other telephony interactions.
## Additional Resources
* [Events Overview](/api-reference/server/events/overview) - Overview of all events in Pipecat
* [Daily REST Helper Utility](/api-reference/server/utilities/daily/rest-helper)
* [Pipecat Development Runner's Transport Utilities](/api-reference/server/utilities/runner/transport-utils)
* [Client SDK Integration](/client/js/transports/daily)
# FastAPIWebsocketTransport
Source: https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket
WebSocket transport implementation for FastAPI web applications with telephony integration
## Overview
`FastAPIWebsocketTransport` provides WebSocket support for FastAPI web applications, enabling real-time audio communication over WebSocket connections. It's primarily designed for telephony integrations with providers like Twilio, Telnyx, and Plivo, supporting bidirectional audio streams with configurable serializers and voice activity detection.
FastAPIWebsocketTransport is best suited for telephony applications and server-side WebSocket integrations.
For general client/server applications, we recommend using WebRTC-based transports for more robust network and media handling.
Pipecat's API methods for FastAPI WebSocket integration
Complete Twilio telephony integration example
Official FastAPI WebSocket documentation
Learn about supported FrameSerializers for telephony providers
## Installation
To use FastAPIWebsocketTransport, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[websocket]"
```
## Prerequisites
### FastAPI Application Setup
Before using FastAPIWebsocketTransport, you need:
1. **FastAPI Application**: Set up a FastAPI web application
2. **WebSocket Endpoint**: Configure WebSocket routes for real-time communication
3. **Telephony Provider**: Set up integration with Twilio, Telnyx, or Plivo
4. **Frame Serializers**: Configure appropriate serializers for your telephony provider
### Configuration Options
* **Serializer Selection**: Choose frame serializer based on telephony provider
* **Audio Parameters**: Configure sample rates and audio formats
* **VAD Integration**: Set up voice activity detection for optimal performance
* **Connection Management**: Handle WebSocket lifecycle and reconnections
### Key Features
* **Telephony Integration**: Optimized for Twilio, Telnyx, and Plivo WebSocket streams
* **Frame Serialization**: Built-in support for telephony provider audio formats
* **FastAPI Integration**: Seamless WebSocket handling within FastAPI applications
* **Bidirectional Audio**: Real-time audio streaming in both directions
## Configuration
### FastAPIWebsocketTransport
The FastAPI WebSocket connection instance.
Transport configuration parameters.
Optional name for the input transport processor.
Optional name for the output transport processor.
### FastAPIWebsocketParams
Inherits from `TransportParams` with additional WebSocket-specific parameters.
Whether to add WAV headers to outgoing audio frames.
Frame serializer for encoding/decoding WebSocket messages. Use a telephony
serializer (e.g., `TwilioFrameSerializer`, `TelnyxFrameSerializer`) for
provider-specific audio formats.
Session timeout in seconds. When set, triggers `on_session_timeout` if the
session exceeds this duration. `None` disables the timeout.
Optional fixed-size packetization for raw PCM audio payloads. Useful when the
remote WebSocket media endpoint requires strict audio framing (e.g., 640 bytes
for 20ms at 16kHz PCM16 mono).
## Usage
FastAPIWebsocketTransport integrates with your FastAPI application to handle telephony WebSocket connections. It works with telephony frame serializers to process audio streams from phone calls.
See the [complete example](https://github.com/pipecat-ai/pipecat-examples/tree/main/twilio-chatbot) for a full implementation including:
* FastAPI WebSocket endpoint configuration
* Telephony provider integration setup
* Frame serializer configuration
* Audio processing pipeline integration
## Event Handlers
FastAPIWebsocketTransport provides event handlers for client connection lifecycle and session management. Register handlers using the `@event_handler` decorator on the transport instance.
### Events Summary
| Event | Description |
| ------------------------ | ----------------------------- |
| `on_client_connected` | Client WebSocket connected |
| `on_client_disconnected` | Client WebSocket disconnected |
| `on_session_timeout` | Session timed out |
### Connection Lifecycle
#### on\_client\_connected
Fired when a client successfully connects to the WebSocket.
```python theme={null}
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, websocket):
print("Client connected")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | --------------------------- | --------------------------------------- |
| `transport` | `FastAPIWebsocketTransport` | The transport instance |
| `websocket` | `WebSocket` | The FastAPI WebSocket connection object |
#### on\_client\_disconnected
Fired when a client disconnects from the WebSocket.
```python theme={null}
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, websocket):
print("Client disconnected")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | --------------------------- | --------------------------------------- |
| `transport` | `FastAPIWebsocketTransport` | The transport instance |
| `websocket` | `WebSocket` | The FastAPI WebSocket connection object |
#### on\_session\_timeout
Fired when a session exceeds the configured `session_timeout` duration. Only fires if `session_timeout` is set in the params.
```python theme={null}
@transport.event_handler("on_session_timeout")
async def on_session_timeout(transport, websocket):
print("Session timed out")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | --------------------------- | --------------------------------------- |
| `transport` | `FastAPIWebsocketTransport` | The transport instance |
| `websocket` | `WebSocket` | The FastAPI WebSocket connection object |
## Additional Resources
* [Events Overview](/api-reference/server/events/overview) - Overview of all events in Pipecat
* [Serializers](/api-reference/server/services/serializers/introduction) - Frame serializers for telephony providers
# HeyGenTransport
Source: https://docs.pipecat.ai/api-reference/server/services/transport/heygen
AI avatar video generation service for creating interactive conversational avatars
## Overview
`HeyGenTransport` enables your Pipecat bot to join the same virtual room as a HeyGen avatar and human participants. The transport integrates with the HeyGen [LiveAvatar](https://www.liveavatar.com/) platform to create interactive AI-powered video avatars that respond naturally in real-time conversations. The service handles bidirectional audio/video streaming, avatar animations, voice activity detection, and conversation interruptions to deliver engaging conversational AI experiences with lifelike visual presence.
When used, the Pipecat bot connects to a LiveKit room alongside the HeyGen avatar and user. The bot receives audio input from participants, processes it through your pipeline, and sends TTS audio to the HeyGen avatar for synchronized video rendering.
Pipecat's API methods for HeyGen video integration
Complete example with interactive avatar
Official HeyGen API documentation and guides
Access interactive avatars and API keys
## Installation
To use HeyGen services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[heygen]"
```
## Prerequisites
### HeyGen Account Setup
Before using HeyGen video services, you need:
1. **HeyGen Account**: Sign up at [HeyGen Platform](https://app.liveavatar.com/signin)
2. **API Key**: Generate an API key from your account dashboard
3. **Avatar Selection**: Choose from available interactive avatars
4. **Streaming Setup**: Configure real-time avatar streaming capabilities
### Required Environment Variables
* `HEYGEN_LIVE_AVATAR_API_KEY`: Your HeyGen LiveAvatar API key for authentication
## Configuration
### HeyGenTransport
An aiohttp session for making async HTTP requests to the HeyGen API.
HeyGen API key for authentication.
Transport configuration parameters.
Optional name for the input transport processor.
Optional name for the output transport processor.
Configuration for the HeyGen session, including avatar selection and settings.
Service type for the avatar session.
### HeyGenParams
Inherits from `TransportParams` with the following defaults changed:
Whether to enable audio input from participants.
Whether to enable audio output to participants.
## Usage
HeyGenTransport creates a three-way conversation between your Pipecat bot, a HeyGen avatar, and human participants. The transport manages the HeyGen API session and room connectivity automatically.
See the [complete example](https://github.com/pipecat-ai/pipecat/blob/main/examples/video-avatar/video-avatar-heygen-transport.py) for a full implementation including:
* HeyGen transport configuration with API key and session setup
* Avatar selection and streaming configuration
* Pipeline integration with TTS
* Event handling for participant management
## Event Handlers
HeyGenTransport provides event handlers for participant connection lifecycle. Register handlers using the `@event_handler` decorator on the transport instance.
### Events Summary
| Event | Description |
| ------------------------ | ------------------------------------------------------ |
| `on_connected` | Bot connected to the LiveKit room |
| `on_client_connected` | Participant (non-avatar) connected to the session |
| `on_client_disconnected` | Participant (non-avatar) disconnected from the session |
The HeyGen avatar participant is automatically filtered out from these events.
Only human participant connections trigger the event handlers.
### Room Lifecycle
#### on\_connected
Fired when the bot connects to the LiveKit room.
```python theme={null}
@transport.event_handler("on_connected")
async def on_connected(transport):
print("Bot connected to LiveKit room")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ----------------- | ---------------------- |
| `transport` | `HeyGenTransport` | The transport instance |
### Connection Lifecycle
#### on\_client\_connected
Fired when a human participant connects to the session. The HeyGen avatar is filtered out from this event.
```python theme={null}
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, participant):
print(f"Client connected: {participant}")
```
**Parameters:**
| Parameter | Type | Description |
| ------------- | ----------------- | -------------------------- |
| `transport` | `HeyGenTransport` | The transport instance |
| `participant` | `str` | The participant identifier |
#### on\_client\_disconnected
Fired when a human participant disconnects from the session.
```python theme={null}
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, participant):
print(f"Client disconnected: {participant}")
```
**Parameters:**
| Parameter | Type | Description |
| ------------- | ----------------- | -------------------------- |
| `transport` | `HeyGenTransport` | The transport instance |
| `participant` | `str` | The participant identifier |
## Additional Resources
* [Events Overview](/api-reference/server/events/overview) - Overview of all events in Pipecat
# LemonSliceTransport
Source: https://docs.pipecat.ai/api-reference/server/services/transport/lemonslice
AI avatar video service for adding LemonSlice avatars to Daily rooms
## Overview
`LemonSliceTransport` enables your Pipecat bot to join the same virtual room as a LemonSlice avatar and human participants. The transport integrates with the [LemonSlice](https://lemonslice.com/) platform to add interactive AI-powered video avatars to Daily rooms. The service handles bidirectional audio streaming, avatar animations, voice activity detection, and conversation interruptions to deliver engaging conversational AI experiences with lifelike visual presence.
When used, the Pipecat bot connects to a Daily room alongside the LemonSlice avatar and user. The bot receives audio input from participants, processes it through your pipeline, and sends TTS audio to the LemonSlice avatar for synchronized video rendering.
Pipecat's API methods for LemonSlice video integration
Complete example with interactive avatar
Access interactive avatars and API keys
## Installation
To use LemonSlice services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[lemonslice]"
```
## Prerequisites
### LemonSlice Account Setup
Before using LemonSlice video services, you need:
1. **LemonSlice Account**: Sign up at [LemonSlice Platform](https://lemonslice.com/)
2. **API Key**: Generate an API key from your account dashboard
3. **Avatar Selection**: Choose from available interactive avatars or provide your own agent image
### Required Environment Variables
* `LEMONSLICE_API_KEY`: Your LemonSlice API key for authentication
## Configuration
### LemonSliceTransport
The name of the Pipecat bot instance.
An aiohttp session for making async HTTP requests to the LemonSlice API.
LemonSlice API key for authentication.
Optional session creation parameters. If not provided, a default agent will be
used. See [LemonSliceNewSessionRequest](#lemonslicenewsessionrequest) below.
Transport configuration parameters. See [LemonSliceParams](#lemonsliceparams)
below.
Optional name for the input transport processor.
Optional name for the output transport processor.
### LemonSliceNewSessionRequest
Configuration for creating a new LemonSlice session.
URL to an agent image. Provide either `agent_id` or `agent_image_url`, not
both.
ID of a LemonSlice agent. Provide either `agent_id` or `agent_image_url`, not
both. If neither is provided, a default agent will be used.
A high-level system prompt that subtly influences the avatar's movements,
expressions, and emotional demeanor.
Idle timeout in seconds for the session.
Daily room URL to use for the session. If not provided, LemonSlice will create
a new room.
Daily token for authenticating with the room.
Additional properties to pass to the LemonSlice session.
### LemonSliceParams
Inherits from `DailyParams` (which inherits from `TransportParams`) with the
following defaults:
Whether to enable audio input from participants.
Whether to enable audio output to participants.
Whether to enable microphone output track.
## Usage
LemonSliceTransport creates a three-way conversation between your Pipecat bot, a
LemonSlice avatar, and human participants. The transport manages the LemonSlice
API session and Daily room connectivity automatically.
See the
[complete example](https://github.com/pipecat-ai/pipecat/blob/main/examples/video-avatar/video-avatar-lemonslice-transport.py)
for a full implementation including:
* LemonSlice transport configuration with API key and session setup
* Avatar selection and configuration
* Pipeline integration with TTS
* Event handling for participant management
## Event Handlers
LemonSliceTransport provides event handlers for participant connection
lifecycle. Register handlers using the `@event_handler` decorator on the
transport instance.
### Events Summary
| Event | Description |
| ------------------------ | ------------------------------------------------------ |
| `on_client_connected` | Participant (non-avatar) connected to the session |
| `on_client_disconnected` | Participant (non-avatar) disconnected from the session |
The LemonSlice avatar participant is automatically filtered out from these
events. Only human participant connections trigger the event handlers.
### Connection Lifecycle
#### on\_client\_connected
Fired when a human participant connects to the session. The LemonSlice avatar is
filtered out from this event.
```python theme={null}
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, participant):
print(f"Client connected: {participant}")
```
**Parameters:**
| Parameter | Type | Description |
| ------------- | --------------------- | ---------------------- |
| `transport` | `LemonSliceTransport` | The transport instance |
| `participant` | `Mapping[str, Any]` | The participant data |
#### on\_client\_disconnected
Fired when a human participant disconnects from the session.
```python theme={null}
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, participant):
print(f"Client disconnected: {participant}")
```
**Parameters:**
| Parameter | Type | Description |
| ------------- | --------------------- | ---------------------- |
| `transport` | `LemonSliceTransport` | The transport instance |
| `participant` | `Mapping[str, Any]` | The participant data |
## Notes
* LemonSlice uses Daily as the underlying transport layer, so all Daily features
and configuration options are available through the inherited `DailyParams`.
* The transport automatically manages interruptions and sends appropriate control
messages (`interrupt`, `response_started`, `response_finished`) to the
LemonSlice session.
* The LemonSlice avatar's microphone is automatically muted to prevent audio
feedback loops.
## Additional Resources
* [Events Overview](/api-reference/server/events/overview) - Overview of all events in
Pipecat
* [DailyTransport](/api-reference/server/services/transport/daily) - Underlying Daily transport
documentation
* [TransportParams](/api-reference/server/services/transport/transport-params) - Base transport
parameters
# LiveKitTransport
Source: https://docs.pipecat.ai/api-reference/server/services/transport/livekit
WebRTC transport implementation using LiveKit for real-time audio communication
## Overview
`LiveKitTransport` provides real-time audio communication capabilities using LiveKit's open-source WebRTC platform. It supports bidirectional audio streaming, data messaging, participant management, and room event handling for conversational AI applications with the flexibility of self-hosted or cloud infrastructure.
Pipecat's API methods for LiveKit integration
Complete LiveKit voice agent example
Official LiveKit API documentation and guides
Sign up for hosted LiveKit service
## Installation
To use LiveKitTransport, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[livekit]"
```
## Prerequisites
### LiveKit Setup
Before using LiveKitTransport, you need:
1. **LiveKit Server**: Set up self-hosted or use [LiveKit Cloud](https://cloud.livekit.io/)
2. **API Credentials**: Generate API key and secret from your LiveKit project
3. **Room Management**: Create rooms using LiveKit API or SDK
4. **Access Tokens**: Generate JWT tokens for room authentication
### Required Environment Variables
* `LIVEKIT_API_KEY`: Your LiveKit API key for authentication
* `LIVEKIT_API_SECRET`: Your LiveKit API secret for token generation
* `LIVEKIT_URL`: Your LiveKit server URL (wss\://your-domain.livekit.cloud)
### Key Features
* **Open Source**: Self-hosted or cloud-hosted WebRTC infrastructure
* **Multi-participant Support**: Handle multiple participants in rooms
* **Data Channels**: Real-time messaging alongside audio streams
* **Room Management**: Dynamic participant and room lifecycle management
* **Flexible Deployment**: Self-hosted or managed cloud options
## Configuration
### LiveKitTransport
LiveKit server URL to connect to (e.g., `wss://your-domain.livekit.cloud`).
Authentication token (JWT) for the LiveKit room.
Name of the LiveKit room to join.
Transport configuration parameters. Inherits all parameters from
[TransportParams](/api-reference/server/services/transport/transport-params).
Optional name for the input transport processor.
Optional name for the output transport processor.
## Usage
LiveKitTransport connects your Pipecat bot to LiveKit rooms where it can communicate with participants through audio and data channels. The transport handles room joining, participant events, and media streaming automatically.
See the [complete example](https://github.com/pipecat-ai/pipecat/blob/main/examples/transports/transports-livekit.py) for a full implementation including:
* LiveKit room creation and token generation
* Transport configuration with participant management
* Pipeline integration with audio processing
* Event handling for room lifecycle management
## Event Handlers
LiveKitTransport provides event handlers for room lifecycle, participant management, and media track events. Register handlers using the `@event_handler` decorator on the transport instance.
### Events Summary
| Event | Description |
| ----------------------------- | -------------------------- |
| `on_connected` | Connected to the room |
| `on_disconnected` | Disconnected from the room |
| `on_before_disconnect` | About to disconnect (sync) |
| `on_call_state_updated` | Call state changed |
| `on_first_participant_joined` | First participant joined |
| `on_participant_connected` | A participant connected |
| `on_participant_disconnected` | A participant disconnected |
| `on_participant_left` | A participant left |
| `on_audio_track_subscribed` | Audio track subscribed |
| `on_audio_track_unsubscribed` | Audio track unsubscribed |
| `on_video_track_subscribed` | Video track subscribed |
| `on_video_track_unsubscribed` | Video track unsubscribed |
| `on_data_received` | Data message received |
### Room Lifecycle
#### on\_connected
Fired when the bot successfully connects to the LiveKit room.
```python theme={null}
@transport.event_handler("on_connected")
async def on_connected(transport):
print("Connected to LiveKit room")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ------------------ | ---------------------- |
| `transport` | `LiveKitTransport` | The transport instance |
#### on\_disconnected
Fired when the bot disconnects from the room.
```python theme={null}
@transport.event_handler("on_disconnected")
async def on_disconnected(transport):
print("Disconnected from LiveKit room")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ------------------ | ---------------------- |
| `transport` | `LiveKitTransport` | The transport instance |
#### on\_before\_disconnect
Fired synchronously just before the bot disconnects from the room. Use this for cleanup that must happen before disconnection.
```python theme={null}
@transport.event_handler("on_before_disconnect")
async def on_before_disconnect(transport):
print("About to disconnect...")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ------------------ | ---------------------- |
| `transport` | `LiveKitTransport` | The transport instance |
This is a **synchronous** event — the bot will not disconnect until all
handlers complete. Keep handlers fast.
#### on\_call\_state\_updated
Fired when the call state changes.
```python theme={null}
@transport.event_handler("on_call_state_updated")
async def on_call_state_updated(transport, state):
print(f"Call state: {state}")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ------------------ | ---------------------- |
| `transport` | `LiveKitTransport` | The transport instance |
| `state` | `str` | The new call state |
### Participants
#### on\_first\_participant\_joined
Fired when the first participant (other than the bot) joins the room. This is commonly used to start the conversation.
```python theme={null}
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant_id):
await task.queue_frame(TTSSpeakFrame("Hello! How can I help you today?"))
```
**Parameters:**
| Parameter | Type | Description |
| ---------------- | ------------------ | ---------------------- |
| `transport` | `LiveKitTransport` | The transport instance |
| `participant_id` | `str` | The participant's ID |
#### on\_participant\_connected
Fired when a participant connects to the room.
```python theme={null}
@transport.event_handler("on_participant_connected")
async def on_participant_connected(transport, participant_id):
print(f"Participant connected: {participant_id}")
```
**Parameters:**
| Parameter | Type | Description |
| ---------------- | ------------------ | ---------------------- |
| `transport` | `LiveKitTransport` | The transport instance |
| `participant_id` | `str` | The participant's ID |
#### on\_participant\_disconnected
Fired when a participant disconnects from the room.
```python theme={null}
@transport.event_handler("on_participant_disconnected")
async def on_participant_disconnected(transport, participant_id):
print(f"Participant disconnected: {participant_id}")
```
**Parameters:**
| Parameter | Type | Description |
| ---------------- | ------------------ | ---------------------- |
| `transport` | `LiveKitTransport` | The transport instance |
| `participant_id` | `str` | The participant's ID |
When a participant disconnects, both `on_participant_disconnected` and
`on_participant_left` fire. The `on_participant_left` event includes a
`"disconnected"` reason for compatibility with other transports.
#### on\_participant\_left
Fired when a participant leaves the room. This is a transport-agnostic event that fires alongside `on_participant_disconnected`.
```python theme={null}
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant_id, reason):
print(f"Participant left: {participant_id} ({reason})")
```
**Parameters:**
| Parameter | Type | Description |
| ---------------- | ------------------ | ------------------------------------------- |
| `transport` | `LiveKitTransport` | The transport instance |
| `participant_id` | `str` | The participant's ID |
| `reason` | `str` | Reason for leaving (e.g., `"disconnected"`) |
### Media Tracks
Track subscription events fire when audio or video tracks from participants are subscribed or unsubscribed. All receive `(transport, participant_id)`.
| Event | Description |
| ----------------------------- | ----------------------------------------------- |
| `on_audio_track_subscribed` | Audio track from a participant was subscribed |
| `on_audio_track_unsubscribed` | Audio track from a participant was unsubscribed |
| `on_video_track_subscribed` | Video track from a participant was subscribed |
| `on_video_track_unsubscribed` | Video track from a participant was unsubscribed |
```python theme={null}
@transport.event_handler("on_audio_track_subscribed")
async def on_audio_track_subscribed(transport, participant_id):
print(f"Audio track subscribed for: {participant_id}")
```
### Messaging
#### on\_data\_received
Fired when data is received from a participant through LiveKit's data channel.
```python theme={null}
@transport.event_handler("on_data_received")
async def on_data_received(transport, data, participant_id):
print(f"Data from {participant_id}: {data.decode()}")
```
**Parameters:**
| Parameter | Type | Description |
| ---------------- | ------------------ | --------------------------- |
| `transport` | `LiveKitTransport` | The transport instance |
| `data` | `bytes` | The raw data received |
| `participant_id` | `str` | The sender's participant ID |
## Additional Resources
* [Events Overview](/api-reference/server/events/overview) - Overview of all events in Pipecat
# SmallWebRTCTransport
Source: https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc
A lightweight WebRTC transport for peer-to-peer audio and video communication in Pipecat
## Overview
`SmallWebRTCTransport` enables peer-to-peer ("serverless") WebRTC connections between clients and your Pipecat application. It implements bidirectional audio, video and data channels using WebRTC for real-time communication. This transport is open source and self-contained, with no dependencies on any other infrastructure.
For detailed notes on how to decide between using the SmallWebRTCTransport or
other WebRTC transports like the DailyTransport, see [this
post](https://www.daily.co/blog/you-dont-need-a-webrtc-server-for-your-voice-agents/).
Transport methods and configuration options
Connection management and signaling methods
Complete peer-to-peer voice agent example
Official WebRTC protocol documentation
## Installation
To use SmallWebRTCTransport, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[webrtc]"
```
## Prerequisites
### WebRTC Application Setup
Before using SmallWebRTCTransport, you need:
1. **Signaling Server**: Implement WebRTC offer/answer exchange (required)
2. **Client Implementation**: Set up WebRTC client for browser or application use
3. **ICE Configuration**: Configure STUN/TURN servers for NAT traversal (optional for local networks)
4. **Development Runner**: Use Pipecat's development runner for quick setup (recommended)
No API keys are required since this is a peer-to-peer transport
implementation. For production deployments across different networks, you may
need to configure STUN/TURN servers for NAT traversal.
### Configuration Options
* **Development Runner**: Automatic server infrastructure and web interface (recommended)
* **Manual Implementation**: Custom signaling server for advanced use cases
* **ICE Servers**: STUN/TURN configuration for network traversal
* **Media Configuration**: Audio/video parameters and format handling
### Key Features
* **Serverless Architecture**: Direct peer-to-peer connections with no intermediate servers
* **Production Ready**: Heavily tested and used in Pipecat examples
* **Bidirectional Media**: Full-duplex audio and video streaming
* **Data Channels**: Application messaging and signaling support
* **Development Tools**: Built-in development runner with web interface
## Configuration
### SmallWebRTCTransport
The underlying WebRTC connection handler that manages signaling and peer
connections.
Transport configuration parameters for audio, video, and VAD settings.
Optional name for the input transport processor.
Optional name for the output transport processor.
## Usage
SmallWebRTCTransport requires both a signaling server for WebRTC handshake and your Pipecat bot implementation. The easiest approach is using Pipecat's development runner which handles all server infrastructure automatically.
See the [complete examples](https://github.com/pipecat-ai/pipecat-examples/tree/main/p2p-webrtc) for full implementations including:
* Development runner setup with automatic web interface
* Manual signaling server implementation
* WebRTC client integration
* ICE server configuration for production deployment
## Event Handlers
SmallWebRTCTransport provides event handlers for client connection lifecycle and application messaging. Register handlers using the `@event_handler` decorator on the transport instance.
### Events Summary
| Event | Description |
| ------------------------ | ------------------------------------ |
| `on_client_connected` | Client WebRTC connection established |
| `on_client_disconnected` | Client WebRTC connection closed |
| `on_app_message` | App message received from client |
### Connection Lifecycle
#### on\_client\_connected
Fired when a client establishes a WebRTC peer connection.
```python theme={null}
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, webrtc_connection):
print("Client connected")
```
**Parameters:**
| Parameter | Type | Description |
| ------------------- | ----------------------- | ---------------------------- |
| `transport` | `SmallWebRTCTransport` | The transport instance |
| `webrtc_connection` | `SmallWebRTCConnection` | The WebRTC connection object |
#### on\_client\_disconnected
Fired when a client's WebRTC peer connection is closed.
```python theme={null}
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, webrtc_connection):
print("Client disconnected")
```
**Parameters:**
| Parameter | Type | Description |
| ------------------- | ----------------------- | ---------------------------- |
| `transport` | `SmallWebRTCTransport` | The transport instance |
| `webrtc_connection` | `SmallWebRTCConnection` | The WebRTC connection object |
### Messaging
#### on\_app\_message
Fired when an application message is received from a client through the WebRTC data channel.
```python theme={null}
@transport.event_handler("on_app_message")
async def on_app_message(transport, message, sender):
print(f"Message from {sender}: {message}")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ---------------------- | ------------------------------- |
| `transport` | `SmallWebRTCTransport` | The transport instance |
| `message` | `Any` | The message content |
| `sender` | `str` | The sender's peer connection ID |
## Additional Resources
* [Events Overview](/api-reference/server/events/overview) - Overview of all events in Pipecat
* [Pipecat Development Runner's Transport Utilities](/api-reference/server/utilities/runner/transport-utils)
* [Client SDK Integration](/client/js/transports/small-webrtc)
# TavusTransport
Source: https://docs.pipecat.ai/api-reference/server/services/transport/tavus
Conversational AI applications with Tavus avatars using real-time audio/video streaming
## Overview
TavusTransport enables your Pipecat bot to join the same virtual room as a Tavus avatar and human participants. The transport integrates with the Tavus platform to create conversational AI applications where a Tavus avatar provides synchronized video and audio output while your bot handles the conversation logic.
When used, the Pipecat bot connects to a Daily room alongside the Tavus avatar and user. The bot receives audio input from participants, processes it through your pipeline, and sends TTS audio to the Tavus avatar for synchronized video rendering.
Pipecat's API methods for Tavus platform integration
Complete Tavus avatar conversation example
Tavus official API reference and replica creation
Connect using Pipecat Client SDK with Daily transport
## Installation
To use TavusTransport, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[tavus]"
```
## Prerequisites
### Tavus Platform Setup
Before using TavusTransport, you need:
1. **Tavus Account**: Sign up at [Tavus.io](https://tavus.io)
2. **API Key**: Generate a Tavus API key for authentication
3. **Replica ID**: Create or obtain a Tavus replica model ID
4. **Persona Configuration**: Set up a persona (optional, defaults to Pipecat TTS voice)
Use `persona_id="pipecat-stream"` to have Tavus use your Pipecat bot's TTS
voice instead of a Tavus persona voice.
### Required Environment Variables
* `TAVUS_API_KEY`: Your Tavus API key for authentication
* `TAVUS_REPLICA_ID`: ID of the Tavus replica model to use
## Key Features
* **Avatar Integration**: Seamlessly integrates Tavus avatars with Pipecat conversations
* **Synchronized Audio/Video**: Tavus avatar renders video synchronized with bot's TTS output
* **Multi-participant Rooms**: Supports bot, avatar, and human participants in the same session
* **Conversation Management**: Handles Tavus conversation lifecycle through API
* **Real-time Streaming**: Bidirectional audio streaming with video avatar output
## Configuration
### TavusTransport
The name of the Pipecat bot instance.
An aiohttp session for making async HTTP requests to the Tavus API.
Tavus API key for authentication.
ID of the Tavus replica model to use for voice generation.
ID of the Tavus persona. Defaults to `"pipecat-stream"`, which signals Tavus
to use the Pipecat bot's TTS voice instead of a Tavus persona voice.
Transport configuration parameters.
Optional name for the input transport processor.
Optional name for the output transport processor.
### TavusParams
Inherits from `DailyParams` with the following defaults changed:
Whether to enable audio input from participants.
Whether to enable audio output to participants.
Whether to enable microphone output track.
## Usage
TavusTransport creates a three-way conversation between your Pipecat bot, a Tavus avatar, and human participants. The transport manages the Tavus API integration and Daily room connectivity automatically.
See the [complete example](https://github.com/pipecat-ai/pipecat/blob/main/examples/video-avatar/video-avatar-tavus-video-service.py) for a full implementation including:
* Tavus transport configuration
* Avatar and replica setup
* Pipeline integration with TTS
* Event handling for participant management
## Event Handlers
TavusTransport provides event handlers for participant connection lifecycle. Register handlers using the `@event_handler` decorator on the transport instance.
### Events Summary
| Event | Description |
| ------------------------ | ------------------------------------------------------ |
| `on_connected` | Bot connected to the room |
| `on_client_connected` | Participant (non-avatar) connected to the session |
| `on_client_disconnected` | Participant (non-avatar) disconnected from the session |
The Tavus replica participant is automatically filtered out from these events.
Only human participant connections trigger the event handlers. The transport
also automatically unsubscribes from the Tavus replica's microphone to prevent
audio feedback.
### Room Lifecycle
#### on\_connected
Fired when the bot connects to the Daily room.
```python theme={null}
@transport.event_handler("on_connected")
async def on_connected(transport, data):
print("Bot connected to the room")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | ---------------- | -------------------------- |
| `transport` | `TavusTransport` | The transport instance |
| `data` | `dict` | Join event data from Daily |
### Connection Lifecycle
#### on\_client\_connected
Fired when a human participant connects to the session. The Tavus replica is filtered out from this event.
```python theme={null}
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, participant):
print(f"Client connected: {participant}")
```
**Parameters:**
| Parameter | Type | Description |
| ------------- | ---------------- | ------------------------------------------------------------------------- |
| `transport` | `TavusTransport` | The transport instance |
| `participant` | `dict` | Participant data from Daily (includes `id`, `info` with `userName`, etc.) |
#### on\_client\_disconnected
Fired when a human participant disconnects from the session.
```python theme={null}
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, participant):
print(f"Client disconnected: {participant}")
```
**Parameters:**
| Parameter | Type | Description |
| ------------- | ---------------- | --------------------------- |
| `transport` | `TavusTransport` | The transport instance |
| `participant` | `dict` | Participant data from Daily |
## Additional Resources
* [Events Overview](/api-reference/server/events/overview) - Overview of all events in Pipecat
* [Client SDK Integration](/client/js/transports/daily)
# TransportParams
Source: https://docs.pipecat.ai/api-reference/server/services/transport/transport-params
Base configuration parameters shared by all transport implementations
## Overview
`TransportParams` is the base configuration class for all Pipecat transports. It controls audio input/output settings, video settings, and voice activity detection. Every transport's params class (`DailyParams`, `LiveKitParams`, `WebsocketServerParams`, etc.) inherits from `TransportParams`.
You typically pass these via the transport's `params` argument:
```python theme={null}
from pipecat.transports.daily import DailyTransport, DailyParams
transport = DailyTransport(
room_url,
token,
"Bot",
params=DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_out_sample_rate=24000,
video_out_enabled=True,
video_out_width=1280,
video_out_height=720,
),
)
```
## Audio Output
Enable audio output streaming.
Output audio sample rate in Hz. When `None`, uses the default rate from the
TTS service.
Number of output audio channels.
Output audio bitrate in bits per second.
Number of 10ms chunks to buffer before sending output audio. Higher values
increase latency but reduce overhead.
Audio mixer instance for combining audio streams, or a mapping of destination
names to mixer instances.
List of audio output destination identifiers for routing audio to specific
participants or endpoints.
Seconds of silence to send after an `EndFrame`. Set to `0` to disable.
Insert silence frames when the audio output queue is empty. When `False`, the
transport waits for audio data instead of inserting silence, which is useful
for scenarios that require uninterrupted audio playback without artificial
gaps.
## Audio Input
Enable audio input streaming.
Input audio sample rate in Hz. When `None`, uses the transport's native rate.
Number of input audio channels.
Audio filter to apply to incoming audio (e.g., noise suppression).
Start audio input streaming immediately when the transport starts. Set to
`False` to manually control when audio input begins.
Pass input audio frames downstream through the pipeline. When `False`, audio
is consumed by VAD but not forwarded.
## Video Output
Enable video output streaming.
Enable real-time video output. When `True`, frames are sent as they arrive
rather than buffered.
Video output width in pixels.
Video output height in pixels.
Video output bitrate in bits per second.
Video output frame rate in frames per second.
Video output color format string.
Preferred video codec for output (e.g., `VP8`, `H264`, `H265`).
List of video output destination identifiers.
## Video Input
Enable video input streaming.
## Deprecated Parameters
The following parameters are deprecated and should not be used in new code. See the migration guidance for each parameter.
Voice Activity Detection analyzer instance.
**Deprecated in 0.0.101**. Use `LLMUserAggregator`'s `vad_analyzer` parameter, or `VADProcessor` if no `LLMUserAggregator` is needed.
Migration example:
```python theme={null}
# Old (deprecated)
transport = DailyTransport(
room_url, token, "Bot",
params=DailyParams(
audio_in_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
)
)
# New (recommended with LLMUserAggregator)
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
transport = DailyTransport(
room_url, token, "Bot",
params=DailyParams(audio_in_enabled=True)
)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
vad_analyzer=SileroVADAnalyzer()
),
)
# Or use VADProcessor if no LLMUserAggregator is needed
from pipecat.processors.audio.vad_processor import VADProcessor
transport = DailyTransport(
room_url, token, "Bot",
params=DailyParams(audio_in_enabled=True)
)
vad_processor = VADProcessor(vad_analyzer=SileroVADAnalyzer())
pipeline = Pipeline([transport.input(), vad_processor, ...])
```
Turn-taking analyzer instance for conversation management.
**Deprecated in 0.0.99**. Use `LLMUserAggregator`'s `user_turn_strategies` parameter instead.
## Transport Subclasses
Each transport extends `TransportParams` with provider-specific fields:
| Transport | Params Class | Additional Fields |
| --------------------------------------------------------------------------------------- | ------------------------ | ------------------------------------------------------------------------------------------ |
| [DailyTransport](/api-reference/server/services/transport/daily) | `DailyParams` | `api_key`, `api_url`, `dialin_settings`, `transcription_enabled`, `transcription_settings` |
| [LiveKitTransport](/api-reference/server/services/transport/livekit) | `LiveKitParams` | (no additional fields) |
| [SmallWebRTCTransport](/api-reference/server/services/transport/small-webrtc) | `TransportParams` | Uses base class directly |
| [WebsocketServerTransport](/api-reference/server/services/transport/websocket-server) | `WebsocketServerParams` | `add_wav_header`, `serializer`, `session_timeout` |
| [FastAPIWebsocketTransport](/api-reference/server/services/transport/fastapi-websocket) | `FastAPIWebsocketParams` | `serializer`, `session_timeout` |
# WebSocket Transports
Source: https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server
WebSocket transport implementations for real-time client-server communication
## Overview
WebSocket transports provide both client and server WebSocket implementations for real-time bidirectional communication. These transports support audio streaming, frame serialization, and connection management, making them ideal for prototyping and lightweight client-server applications where WebRTC might be overkill.
WebSocket transports are best suited for prototyping and controlled network environments.
For production client-server applications, we recommend WebRTC-based transports for more robust network handling, NAT traversal, and media optimization.
Client transport methods and configuration
Server transport methods and configuration
Complete client and server WebSocket examples
WebSocket protocol documentation and guides
## Installation
To use WebSocket transports, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[websocket]"
```
## Prerequisites
### WebSocket Application Setup
Before using WebSocket transports, you need:
1. **Server Implementation**: Set up WebSocket server using your preferred framework
2. **Client Implementation**: Configure WebSocket client for browser or application use
3. **Audio Configuration**: Set up audio streaming parameters and formats
4. **Connection Management**: Handle WebSocket lifecycle and error recovery
### Configuration Options
* **Transport Type**: Choose between client or server WebSocket transport
* **Audio Parameters**: Configure sample rates, channels, and audio formats
* **Frame Serialization**: Set up custom frame serializers if needed
* **Connection Handling**: Configure reconnection and error handling strategies
### Key Features
* **Bidirectional Communication**: Real-time audio and data streaming
* **Simple Protocol**: Lightweight WebSocket-based communication
* **Flexible Serialization**: Support for custom frame formats and audio codecs
* **Cross-Platform**: Works with any WebSocket-compatible client or server
## Configuration
### WebsocketServerTransport
Transport configuration parameters.
Host address to bind the WebSocket server to.
Port number to bind the WebSocket server to.
Optional name for the input transport processor.
Optional name for the output transport processor.
### WebsocketServerParams
Inherits from `TransportParams` with additional WebSocket-specific parameters.
Whether to add WAV headers to outgoing audio frames.
Frame serializer for encoding/decoding WebSocket messages.
Timeout in seconds for client sessions. When set, triggers
`on_session_timeout` if the session exceeds this duration. `None` disables the
timeout.
### WebsocketClientTransport
The WebSocket URI to connect to.
Client transport configuration parameters.
Optional name for the input transport processor.
Optional name for the output transport processor.
### WebsocketClientParams
Inherits from `TransportParams` with additional WebSocket-specific parameters.
Whether to add WAV headers to outgoing audio frames.
Optional additional HTTP headers to include in the WebSocket handshake.
Frame serializer for encoding/decoding WebSocket messages.
## Usage
WebSocket transports provide simple client-server communication for audio streaming and real-time interaction. They are ideal for prototyping and controlled network environments.
See the [complete examples](https://github.com/pipecat-ai/pipecat-examples/tree/main/websocket) for full implementations including:
* WebSocket server setup and configuration
* Client-side WebSocket integration
* Audio streaming and frame handling
* Connection management and error recovery
## Event Handlers
WebSocket transports provide event handlers for client connection lifecycle and session management. Register handlers using the `@event_handler` decorator on the transport instance.
### WebsocketServerTransport Events
#### Events Summary
| Event | Description |
| ------------------------ | ----------------------------------------------- |
| `on_client_connected` | Client WebSocket connected |
| `on_client_disconnected` | Client WebSocket disconnected |
| `on_session_timeout` | Session timed out |
| `on_websocket_ready` | WebSocket server is ready to accept connections |
#### on\_client\_connected
Fired when a client connects to the WebSocket server.
```python theme={null}
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, websocket):
print("Client connected")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | -------------------------- | ------------------------------- |
| `transport` | `WebsocketServerTransport` | The transport instance |
| `websocket` | `WebSocketServerProtocol` | The WebSocket connection object |
#### on\_client\_disconnected
Fired when a client disconnects from the WebSocket server.
```python theme={null}
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, websocket):
print("Client disconnected")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | -------------------------- | ------------------------------- |
| `transport` | `WebsocketServerTransport` | The transport instance |
| `websocket` | `WebSocketServerProtocol` | The WebSocket connection object |
#### on\_session\_timeout
Fired when a client session exceeds the configured `session_timeout` duration. Only fires if `session_timeout` is set in the params.
```python theme={null}
@transport.event_handler("on_session_timeout")
async def on_session_timeout(transport, websocket):
print("Session timed out")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | -------------------------- | ------------------------------- |
| `transport` | `WebsocketServerTransport` | The transport instance |
| `websocket` | `WebSocketServerProtocol` | The WebSocket connection object |
#### on\_websocket\_ready
Fired when the WebSocket server has started and is ready to accept client connections.
```python theme={null}
@transport.event_handler("on_websocket_ready")
async def on_websocket_ready(transport):
print("WebSocket server ready")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | -------------------------- | ---------------------- |
| `transport` | `WebsocketServerTransport` | The transport instance |
### WebsocketClientTransport Events
#### Events Summary
| Event | Description |
| ----------------- | ---------------------------------- |
| `on_connected` | Connected to WebSocket server |
| `on_disconnected` | Disconnected from WebSocket server |
```python theme={null}
@transport.event_handler("on_connected")
async def on_connected(transport, websocket):
print("Connected to server")
@transport.event_handler("on_disconnected")
async def on_disconnected(transport, websocket):
print("Disconnected from server")
```
**Parameters:**
| Parameter | Type | Description |
| ----------- | -------------------------- | ------------------------------- |
| `transport` | `WebsocketClientTransport` | The transport instance |
| `websocket` | `WebSocketClientProtocol` | The WebSocket connection object |
The WebSocket server only supports one client connection at a time. If a new
client connects while one is already connected, the existing connection will
be closed.
## Additional Resources
* [Events Overview](/api-reference/server/events/overview) - Overview of all events in Pipecat
# WhatsAppTransport
Source: https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp
Real-time voice communication through WhatsApp Business API using WebRTC
## Overview
WhatsAppTransport enables real-time voice conversations between WhatsApp users and your Pipecat bot through the WhatsApp Business API. When users call your WhatsApp Business number, the transport handles webhook events, establishes WebRTC connections, and manages the voice call lifecycle.
The transport integrates with Meta's WhatsApp Cloud API to receive incoming calls, process WebRTC signaling, and maintain call state throughout the conversation.
API methods for Pipecat's WhatsApp Cloud API integration
Pipecat's Client for handling webhooks and WebRTC connections
Complete WhatsApp voice bot example
Meta's official WhatsApp calling documentation
## Installation
To use WhatsAppTransport, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[webrtc]"
```
## Prerequisites
### WhatsApp Business API Setup
Before using WhatsAppTransport, you need:
1. **WhatsApp Business API Account**: Set up through [Meta Developer Console](https://developers.facebook.com/apps)
2. **Phone Number Configuration**: Enable voice calling capability for your business number
3. **Webhook Configuration**: Configure webhook endpoint to receive call events
4. **Access Tokens**: Generate API access token and phone number ID
For development, Meta provides free test phone numbers valid for 90 days.
Production use requires business verification.
### Required Environment Variables
* `WHATSAPP_TOKEN`: WhatsApp Business API access token
* `WHATSAPP_PHONE_NUMBER_ID`: Your business phone number ID
* `WHATSAPP_WEBHOOK_VERIFICATION_TOKEN`: Token for webhook verification
## Key Features
* **Incoming Call Handling**: Automatically accepts WhatsApp voice calls
* **WebRTC Integration**: Establishes peer-to-peer audio connections
* **Webhook Processing**: Handles Meta's webhook events for call lifecycle
* **Call Management**: Supports call acceptance, rejection, and termination
* **Real-time Audio**: Bidirectional audio streaming for natural conversations
## Configuration
### WhatsAppClient
`WhatsAppClient` is the main class for handling WhatsApp call connections. It manages webhook processing, WebRTC connection establishment, and call lifecycle.
WhatsApp Business API access token for authentication.
WhatsApp Business phone number ID.
An aiohttp session for making async HTTP requests to the WhatsApp Cloud API.
List of ICE servers for WebRTC connections. If `None`, defaults to Google's
public STUN server (`stun:stun.l.google.com:19302`).
WhatsApp App Secret for validating that webhook requests came from WhatsApp.
When set, all incoming webhooks are verified using HMAC SHA-256 signature
validation.
## Usage
WhatsAppTransport requires a webhook server to handle incoming calls and a bot implementation to process the conversations. The transport works with `SmallWebRTCTransport` under the hood for WebRTC connectivity.
The `WhatsAppClient` handles the WhatsApp-specific webhook processing and call management, while the actual audio transport uses a `SmallWebRTCConnection` for each call.
See the [complete example](https://github.com/pipecat-ai/pipecat-examples/tree/main/whatsapp) for a full implementation including:
* FastAPI webhook server with verification endpoint
* WhatsApp client configuration
* Bot integration with Pipecat pipeline using `SmallWebRTCTransport`
* Environment setup and deployment
## Notes
* WhatsApp only supports SHA-256 fingerprints in SDP, so the client automatically filters out other fingerprint types.
* Each incoming call creates a new `SmallWebRTCConnection` that can be used with `SmallWebRTCTransport`.
* The client supports both `pre_accept` and `accept` call flow as required by the WhatsApp Cloud API.
* Use `terminate_all_calls()` during shutdown to cleanly end all ongoing calls.
## Additional Resources
* [Events Overview](/api-reference/server/events/overview) - Overview of all events in Pipecat
* [SmallWebRTCTransport](/api-reference/server/services/transport/small-webrtc) - The underlying transport used for WebRTC connectivity
* [WhatsApp Cloud API Calling Docs](https://developers.facebook.com/docs/whatsapp/cloud-api/calling/) - Official WhatsApp calling documentation
# Async
Source: https://docs.pipecat.ai/api-reference/server/services/tts/asyncai
Text-to-speech services using Async's WebSocket and HTTP APIs
## Overview
Async provides high-quality text-to-speech synthesis with two service implementations: `AsyncAITTSService` (WebSocket-based) for real-time streaming with interruption support, and `AsyncAIHttpTTSService` (HTTP-based) for simpler synthesis. `AsyncAITTSService` is recommended for interactive applications requiring low latency.
Pipecat's API methods for AsyncAI TTS integration
Complete example with WebSocket streaming
Official Async API documentation
Explore available voice models and features
## Installation
To use Async services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[asyncai]"
```
## Prerequisites
### Async Account Setup
Before using Async TTS services, you need:
1. **Async Account**: Sign up at [Async](https://async.ai)
2. **API Key**: Generate an API key from your account dashboard
3. **Voice Selection**: Choose from available voice models
### Required Environment Variables
* `ASYNCAI_API_KEY`: Your Async API key for authentication
## Configuration
### AsyncAITTSService
Async API key.
UUID of the voice to use for synthesis. *Deprecated in v0.0.105. Use
`settings=AsyncAITTSService.Settings(voice=...)` instead.*
TTS model to use. *Deprecated in v0.0.105. Use
`settings=AsyncAITTSService.Settings(model=...)` instead.*
Async API version.
WebSocket endpoint URL.
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate.
Audio encoding format.
Audio container format.
Controls how incoming text is aggregated before synthesis. `SENTENCE`
(default) buffers text until sentence boundaries, producing more natural
speech. `TOKEN` streams tokens directly for lower latency. Import from
`pipecat.services.tts_service`.
*Deprecated in v0.0.104.* Use `text_aggregation_mode` instead.
*Deprecated in v0.0.105. Use `settings=AsyncAITTSService.Settings(...)`
instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### AsyncAIHttpTTSService
The HTTP service accepts similar parameters to the WebSocket service, with these differences:
An aiohttp session for HTTP requests. You must create and manage this
yourself.
HTTP API base URL (instead of WebSocket URL).
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `AsyncAITTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.asyncai import AsyncAITTSService
tts = AsyncAITTSService(
api_key=os.getenv("ASYNCAI_API_KEY"),
settings=AsyncAITTSService.Settings(
voice="your-voice-uuid",
),
)
```
### With Language Customization
```python theme={null}
from pipecat.transcriptions.language import Language
tts = AsyncAITTSService(
api_key=os.getenv("ASYNCAI_API_KEY"),
settings=AsyncAITTSService.Settings(
voice="your-voice-uuid",
model="async_flash_v1.0",
language=Language.ES,
),
)
```
### HTTP Service
```python theme={null}
import aiohttp
from pipecat.services.asyncai import AsyncAIHttpTTSService
async with aiohttp.ClientSession() as session:
tts = AsyncAIHttpTTSService(
api_key=os.getenv("ASYNCAI_API_KEY"),
settings=AsyncAIHttpTTSService.Settings(
voice="your-voice-uuid",
),
aiohttp_session=session,
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Event Handlers
Async TTS supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| --------------------- | ----------------------------------- |
| `on_connected` | Connected to Async WebSocket |
| `on_disconnected` | Disconnected from Async WebSocket |
| `on_connection_error` | WebSocket connection error occurred |
```python theme={null}
@tts.event_handler("on_connected")
async def on_connected(service):
print("Connected to Async")
```
# AWS Polly
Source: https://docs.pipecat.ai/api-reference/server/services/tts/aws
Text-to-speech service using Amazon Polly
## Overview
`AWSPollyTTSService` provides high-quality text-to-speech synthesis through Amazon Polly with support for standard, neural, and generative engines. The service offers extensive language support, SSML features, and voice customization options including prosody controls for pitch, rate, and volume.
Pipecat's API methods for AWS Polly integration
Complete example with generative engine
Official AWS Polly documentation and features
Browse available voices and languages
## Installation
To use AWS Polly services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[aws]"
```
## Prerequisites
### AWS Account Setup
Before using AWS Polly TTS services, you need:
1. **AWS Account**: Sign up at [AWS Console](https://aws.amazon.com/)
2. **IAM User**: Create an IAM user with Polly permissions
3. **Access Keys**: Generate access key ID and secret access key
4. **Voice Selection**: Choose from available voices in the [voice list](https://docs.aws.amazon.com/polly/latest/dg/voicelist.html)
### Required Environment Variables
* `AWS_ACCESS_KEY_ID`: Your AWS access key ID
* `AWS_SECRET_ACCESS_KEY`: Your AWS secret access key
* `AWS_SESSION_TOKEN`: Session token (if using temporary credentials)
* `AWS_REGION`: AWS region (defaults to "us-east-1")
## Configuration
### AWSPollyTTSService
AWS secret access key. If `None`, uses the `AWS_SECRET_ACCESS_KEY` environment
variable.
AWS access key ID. If `None`, uses the `AWS_ACCESS_KEY_ID` environment
variable.
AWS session token for temporary credentials.
AWS region for Polly service. Defaults to `us-east-1` if not set via
environment variable.
Voice ID to use for synthesis. *Deprecated in v0.0.105. Use
`settings=AWSPollyTTSService.Settings(voice=...)` instead.*
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate. AWS Polly internally synthesizes at 16kHz and resamples to the
target rate.
*Deprecated in v0.0.105. Use `settings=AWSPollyTTSService.Settings(...)`
instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `AWSPollyTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| --------------- | ----------------- | ----------- | ----------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `engine` | `str` | `NOT_GIVEN` | Engine type (e.g., "neural", "standard"). |
| `pitch` | `str` | `NOT_GIVEN` | Pitch adjustment for SSML. |
| `rate` | `str` | `NOT_GIVEN` | Speaking rate for SSML. |
| `volume` | `str` | `NOT_GIVEN` | Volume for SSML. |
| `lexicon_names` | `List[str]` | `NOT_GIVEN` | List of lexicon names for pronunciation. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.aws import AWSPollyTTSService
tts = AWSPollyTTSService(
settings=AWSPollyTTSService.Settings(
voice="Joanna",
),
)
```
### With Voice Customization
```python theme={null}
from pipecat.transcriptions.language import Language
tts = AWSPollyTTSService(
api_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
region="us-east-1",
settings=AWSPollyTTSService.Settings(
voice="Matthew",
engine="neural",
language=Language.EN_US,
rate="110%",
volume="loud",
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Engine selection**: AWS Polly supports `"standard"`, `"neural"`, and `"generative"` engines. Not all voices support all engines. Check the [AWS voice list](https://docs.aws.amazon.com/polly/latest/dg/voicelist.html) for compatibility.
* **Pitch control**: The `pitch` parameter only works with the `"standard"` engine. Neural and generative engines ignore it.
* **Audio resampling**: Polly synthesizes PCM at 16kHz internally. The service automatically resamples to match your pipeline's sample rate.
# Azure
Source: https://docs.pipecat.ai/api-reference/server/services/tts/azure
Text-to-speech service using Azure Cognitive Services Speech SDK
## Overview
Azure Cognitive Services provides high-quality text-to-speech synthesis with two service implementations: `AzureTTSService` (WebSocket-based) for real-time streaming with low latency, and `AzureHttpTTSService` (HTTP-based) for batch synthesis. `AzureTTSService` is recommended for interactive applications requiring streaming capabilities.
Pipecat's API methods for Azure TTS integration
Complete example with streaming synthesis
Official Azure Speech Services documentation
Browse available voices and languages
## Installation
To use Azure services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[azure]"
```
## Prerequisites
### Azure Account Setup
Before using Azure TTS services, you need:
1. **Azure Account**: Sign up at [Azure Portal](https://portal.azure.com/)
2. **Speech Service**: Create a Speech resource in your Azure subscription
3. **API Key and Region**: Get your subscription key and service region
4. **Voice Selection**: Choose from available voices in the [Voice Gallery](https://speech.microsoft.com/portal/voicegallery)
### Required Environment Variables
* `AZURE_SPEECH_API_KEY`: Your Azure Speech service API key
* `AZURE_SPEECH_REGION`: Your Azure Speech service region (e.g., "eastus")
## Configuration
### AzureTTSService
Azure Cognitive Services subscription key.
Azure region identifier (e.g., `"eastus"`, `"westus2"`).
Voice name to use for synthesis. *Deprecated in v0.0.105. Use
`settings=AzureTTSService.Settings(voice=...)` instead.*
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate.
Controls how incoming text is aggregated before synthesis. `SENTENCE`
(default) buffers text until sentence boundaries, producing more natural
speech. `TOKEN` streams tokens directly for lower latency. Import from
`pipecat.services.tts_service`.
*Deprecated in v0.0.104.* Use `text_aggregation_mode` instead.
*Deprecated in v0.0.105. Use `settings=AzureTTSService.Settings(...)`
instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### AzureHttpTTSService
The HTTP service accepts the same parameters as the streaming service except `text_aggregation_mode` and `aggregate_sentences`:
Azure Cognitive Services subscription key.
Azure region identifier.
Voice name to use for synthesis. *Deprecated in v0.0.105. Use
`settings=AzureHttpTTSService.Settings(voice=...)` instead.*
Output audio sample rate in Hz.
*Deprecated in v0.0.105. Use `settings=AzureHttpTTSService.Settings(...)`
instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `AzureTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| -------------- | ----------------- | ----------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `emphasis` | `str` | `NOT_GIVEN` | Emphasis level for SSML. |
| `pitch` | `str` | `NOT_GIVEN` | Pitch adjustment. |
| `rate` | `str` | `NOT_GIVEN` | Speaking rate. |
| `role` | `str` | `NOT_GIVEN` | Role for SSML. |
| `style` | `str` | `NOT_GIVEN` | Speaking style. |
| `style_degree` | `str` | `NOT_GIVEN` | Degree of the speaking style. |
| `volume` | `str` | `NOT_GIVEN` | Volume level. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.azure import AzureTTSService
tts = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
settings=AzureTTSService.Settings(
voice="en-US-SaraNeural",
),
)
```
### With Voice Customization
```python theme={null}
from pipecat.transcriptions.language import Language
tts = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region="eastus",
settings=AzureTTSService.Settings(
voice="en-US-JennyMultilingualNeural",
language=Language.EN_US,
style="cheerful",
style_degree="1.5",
rate="1.1",
),
)
```
### HTTP Service
```python theme={null}
from pipecat.services.azure import AzureHttpTTSService
tts = AzureHttpTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
voice="en-US-SaraNeural",
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Streaming vs HTTP**: The streaming service (`AzureTTSService`) provides word-level timestamps and lower latency, making it better for interactive conversations. The HTTP service is simpler but returns the complete audio at once.
* **SSML support**: Both services automatically construct SSML from the `Settings`. Special characters in text are automatically escaped.
* **Word timestamps**: `AzureTTSService` supports word-level timestamps for synchronized text display. CJK languages receive special handling to merge individual characters into meaningful word units.
* **8kHz workaround**: At 8kHz sample rates, Azure's reported audio duration may not match word boundary offsets. The service uses word boundary offsets for timing in this case.
# Camb AI
Source: https://docs.pipecat.ai/api-reference/server/services/tts/camb
Text-to-speech service using Camb AI's MARS models for high-quality speech synthesis
## Overview
`CambTTSService` provides high-quality text-to-speech synthesis using Camb AI's MARS model family with streaming capabilities. The service offers multiple model options optimized for different use cases: `mars-flash` for fast inference and `mars-pro` for high-quality output.
Pipecat's API methods for Camb AI TTS integration
Complete example with interruption handling
Official Camb AI API documentation
Access the Camb AI studio platform
## Installation
To use Camb AI services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[camb]"
```
## Prerequisites
### Camb AI Account Setup
Before using Camb AI TTS services, you need:
1. **Camb AI Account**: Sign up at [Camb AI](https://studio.camb.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Voice Selection**: Choose voice IDs from the platform
### Required Environment Variables
* `CAMB_API_KEY`: Your Camb AI API key for authentication
## Configuration
### CambTTSService
Camb.ai API key for authentication.
Voice ID to use for synthesis.
*Deprecated in v0.0.105. Use `settings=CambTTSService.Settings(...)` instead.*
TTS model to use. Options: `"mars-flash"` (fast, 22.05kHz), `"mars-pro"` (high
quality, 48kHz), `"mars-instruct"` (instruction-following, 22.05kHz).
*Deprecated in v0.0.105. Use `settings=CambTTSService.Settings(...)` instead.*
Request timeout in seconds. 60 seconds is the minimum recommended by Camb.ai.
Output audio sample rate in Hz. If `None`, uses the model-specific default
(22050 for mars-flash, 48000 for mars-pro).
Runtime-configurable voice settings. See [InputParams](#inputparams) below.
*Deprecated in v0.0.105. Use `settings=CambTTSService.Settings(...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `CambTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ------------------- | ----------------- | ----------- | -------------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `int` | `None` | Voice identifier. |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `user_instructions` | `str` | `NOT_GIVEN` | Instructions to guide voice synthesis style. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.camb import CambTTSService
tts = CambTTSService(
api_key=os.getenv("CAMB_API_KEY"),
settings=CambTTSService.Settings(
model="mars-flash",
voice=12345,
),
)
```
### With Language Customization
```python theme={null}
from pipecat.transcriptions.language import Language
tts = CambTTSService(
api_key=os.getenv("CAMB_API_KEY"),
settings=CambTTSService.Settings(
model="mars-flash",
language=Language.FR,
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
Set the `audio_out_sample_rate` in `PipelineParams` to match the model's
sample rate (22050 for `mars-flash`, 48000 for `mars-pro`) for optimal
quality. See the [example
implementation](https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-camb.py)
for usage.
* **Model-specific sample rates**: Each model has a fixed output sample rate. Setting a mismatched `sample_rate` will produce a warning and may cause audio issues.
* **Text length limit**: Input text is limited to 3000 characters per request. Longer text is automatically truncated with a warning.
* **140+ languages**: Camb.ai supports a wide range of languages through BCP-47 codes.
# Cartesia
Source: https://docs.pipecat.ai/api-reference/server/services/tts/cartesia
Text-to-speech services using Cartesia's WebSocket and HTTP APIs
## Overview
Cartesia provides high-quality text-to-speech synthesis with two service implementations: `CartesiaTTSService` (WebSocket-based) for real-time streaming with word timestamps, and `CartesiaHttpTTSService` (HTTP-based) for simpler batch synthesis. `CartesiaTTSService` is recommended for interactive applications requiring low latency and interruption handling.
Pipecat's API methods for Cartesia TTS integration
Complete example with interruption handling
Official Cartesia API documentation and features
Browse and test available voices
## Installation
To use Cartesia services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[cartesia]"
```
## Prerequisites
### Cartesia Account Setup
Before using Cartesia TTS services, you need:
1. **Cartesia Account**: Sign up at [Cartesia](https://play.cartesia.ai/sign-up)
2. **API Key**: Generate an API key from your account dashboard
3. **Voice Selection**: Choose voice IDs from the [voice library](https://play.cartesia.ai/)
### Required Environment Variables
* `CARTESIA_API_KEY`: Your Cartesia API key for authentication
## Configuration
### CartesiaTTSService
Cartesia API key for authentication.
ID of the voice to use for synthesis. *Deprecated in v0.0.105. Use
`settings=CartesiaTTSService.Settings(voice=...)` instead.*
TTS model to use. *Deprecated in v0.0.105. Use
`settings=CartesiaTTSService.Settings(model=...)` instead.*
API version string for Cartesia service.
WebSocket endpoint URL.
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate.
Audio encoding format.
Audio container format.
Controls how incoming text is aggregated before synthesis. `SENTENCE`
(default) buffers text until sentence boundaries, producing more natural
speech. `TOKEN` streams tokens directly for lower latency. Import from
`pipecat.services.tts_service`.
*Deprecated in v0.0.104.* Use `text_aggregation_mode` instead.
*Deprecated in v0.0.105. Use `settings=CartesiaTTSService.Settings(...)`
instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### CartesiaHttpTTSService
The HTTP service accepts similar parameters to the WebSocket service, with these differences:
HTTP API base URL (instead of `url` for WebSocket).
API version for HTTP service.
Optional aiohttp ClientSession for HTTP requests. If not provided, a session
will be created and managed internally.
The HTTP service does not accept `text_aggregation_mode` or `aggregate_sentences`.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `CartesiaTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ----------------------- | ------------------ | ----------- | ------------------------------------------------------------- |
| `model` | `str` | `None` | TTS model identifier. *(Inherited from base settings.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited from base settings.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited from base settings.)* |
| `generation_config` | `GenerationConfig` | `NOT_GIVEN` | Generation configuration for Sonic-3 models. See below. |
| `pronunciation_dict_id` | `str` | `NOT_GIVEN` | ID of the pronunciation dictionary for custom pronunciations. |
#### GenerationConfig (Sonic-3)
Configuration for Sonic-3 generation parameters:
| Parameter | Type | Default | Description |
| --------- | ------- | ------- | ----------------------------------------------------------------------------------------------------- |
| `volume` | `float` | `None` | Volume multiplier. Valid range: \[0.5, 2.0]. |
| `speed` | `float` | `None` | Speed multiplier. Valid range: \[0.6, 1.5]. |
| `emotion` | `str` | `None` | Emotion string to guide tone (e.g., `"neutral"`, `"angry"`, `"excited"`). Over 60 emotions supported. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.cartesia import CartesiaTTSService
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="your-voice-id",
),
)
```
### With Sonic-3 Generation Config
```python theme={null}
from pipecat.services.cartesia.tts import GenerationConfig
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="your-voice-id",
model="sonic-3",
generation_config=GenerationConfig(
speed=1.1,
emotion="excited",
),
),
)
```
### HTTP Service
```python theme={null}
from pipecat.services.cartesia import CartesiaHttpTTSService
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaHttpTTSService.Settings(
voice="your-voice-id",
),
)
```
## Customizing Speech
`CartesiaTTSService` provides a set of helper methods for implementing Cartesia-specific customizations, meant to be used as part of text transformers. These include methods for spelling out text, adjusting speech rate, and modifying pitch. See the [Text Transformers for TTS](/pipecat/learn/text-to-speech#text-transformers-for-tts) section in the Text-to-Speech guide for usage examples.
### SPELL(text: str) -> str:
A convenience method to wrap text in [Cartesia's spell tag](https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags#spelling-out-numbers-and-letters) for spelling out text character by character.
```python theme={null}
# Text transformers for TTS
# This will insert Cartesia's spell tags around the provided text.
async def spell_out_text(text: str, type: str) -> str:
return CartesiaTTSService.SPELL(text)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
text_transforms=[
("phone_number", spell_out_text),
],
)
```
### EMOTION\_TAG(emotion: CartesiaEmotion) -> str:
A convenience method to create an [emotion tag](https://docs.cartesia.ai/build-with-cartesia/sonic-3/volume-speed-emotion#emotion-controls-beta) for expressing emotions in speech.
```python theme={null}
# Text transformers for TTS
# This will insert Cartesia's sarcasm tag in front of any sentence that is just "whatever".
async def maybe_insert_sarcasm(text: str, type: str) -> str:
if text.strip(".!").lower() == "whatever":
return CartesiaTTSService.EMOTION_TAG(CartesiaEmotion.SARCASM) + text + CartesiaTTSService.EMOTION_TAG(CartesiaEmotion.NEUTRAL)
return text
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
text_transforms=[
("sentence", maybe_insert_sarcasm),
],
)
```
### PAUSE\_TAG(seconds: float) -> str:
A convenience method to create Cartesia's [SSML tag for inserting pauses](https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags#pauses-and-breaks) in speech.
```python theme={null}
# Text transformers for TTS
# This will insert a one second pause after questions.
async def pause_after_questions(text: str, type: str) -> str:
if text.endswith("?"):
return f"{text}{CartesiaTTSService.PAUSE_TAG(1.0)}"
return text
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
text_transforms=[
("sentence", pause_after_questions), # Only apply to sentence aggregations
],
)
```
### VOLUME\_TAG(volume: float) -> str:
A convenience method to create Cartesia's [SSML volume tag](https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags#volume) for dynamically adjusting speech volume in situ.
```python theme={null}
# Text transformers for TTS
# This will increase the volume for any full text aggregation that is in all caps.
async def maybe_say_it_loud(text: str, type: str) -> str:
if text.upper() == text:
return f"{CartesiaTTSService.VOLUME_TAG(2.0)}{text}{CartesiaTTSService.VOLUME_TAG(1.0)}"
return text
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
text_transforms=[
("*", maybe_say_it_loud), # Apply to all text
],
)
```
### SPEED\_TAG(speed: float) -> str:
A convenience method to create Cartesia's [SSML speed tag](https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags#speed) for dynamically adjusting the speech rate in situ.
```python theme={null}
# Text transformers for TTS
# This will make the word slow always be spoken more slowly.
async def slow_down_slow_words(text: str, type: str) -> str:
return text.replace(
"slow",
f"{CartesiaTTSService.SPEED_TAG(0.6)}slow{CartesiaTTSService.SPEED_TAG(1.0)}"
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
text_transforms=[
("*", slow_down_slow_words), # Apply to all text
],
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **WebSocket vs HTTP**: The WebSocket service supports word-level timestamps, audio context management, and interruption handling, making it better for interactive conversations. The HTTP service is simpler but lacks these features.
* **Text aggregation**: Sentence aggregation is enabled by default (`text_aggregation_mode=TextAggregationMode.SENTENCE`). Buffering until sentence boundaries produces more natural speech. Set `text_aggregation_mode=TextAggregationMode.TOKEN` to stream tokens directly for lower latency. Cartesia handles token streaming well.
* **Connection timeout**: Cartesia WebSocket connections time out after 5 minutes of inactivity (no keepalive mechanism is available). The service automatically reconnects when needed.
* **CJK language support**: For Chinese, Japanese, and Korean, the service combines individual characters from timestamp messages into meaningful word units.
## Event Handlers
Cartesia TTS supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| --------------------- | ------------------------------------ |
| `on_connected` | Connected to Cartesia WebSocket |
| `on_disconnected` | Disconnected from Cartesia WebSocket |
| `on_connection_error` | WebSocket connection error occurred |
```python theme={null}
@tts.event_handler("on_connected")
async def on_connected(service):
print("Connected to Cartesia")
```
# Deepgram
Source: https://docs.pipecat.ai/api-reference/server/services/tts/deepgram
Text-to-speech service implementations using Deepgram's Aura API
## Overview
Deepgram provides three TTS service implementations:
* `DeepgramTTSService` for real-time streaming synthesis using Deepgram's WebSocket API with support for interruptions and ultra-low latency
* `DeepgramHttpTTSService` for batch synthesis using Deepgram's HTTP API
* `DeepgramSageMakerTTSService` for real-time synthesis using Deepgram TTS models deployed on AWS SageMaker endpoints via HTTP/2 bidirectional streaming
Pipecat's API methods for Deepgram TTS integration
Complete example with Silero VAD
Complete example with Deepgram on SageMaker
Official Deepgram Aura TTS API documentation
Browse available Aura voice models
## Installation
To use Deepgram TTS services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[deepgram]"
```
For the SageMaker variant, install the SageMaker dependencies instead:
```bash theme={null}
pip install "pipecat-ai[sagemaker]"
```
## Prerequisites
### Deepgram Account Setup
Before using `DeepgramTTSService` or `DeepgramHttpTTSService`, you need:
1. **Deepgram Account**: Sign up at [Deepgram Console](https://console.deepgram.com/)
2. **API Key**: Generate an API key from your project dashboard
3. **Voice Selection**: Choose from available Aura voice models
### Required Environment Variables
* `DEEPGRAM_API_KEY`: Your Deepgram API key for authentication
### AWS SageMaker Setup
Before using `DeepgramSageMakerTTSService`, you need:
1. **AWS Account**: With credentials configured (via environment variables, AWS CLI, or instance metadata)
2. **SageMaker Endpoint**: A deployed SageMaker endpoint with a [Deepgram TTS model](https://developers.deepgram.com/docs/deploy-amazon-sagemaker)
3. **Voice Selection**: Choose from available Aura voice models
## Configuration
### DeepgramTTSService
Deepgram API key for authentication.
Voice model to use for synthesis. *Deprecated in v0.0.105. Use
`settings=DeepgramTTSService.Settings(voice=...)` instead.*
Runtime-configurable settings. See [DeepgramTTSService
Settings](#deepgramttsservice-settings) below.
WebSocket base URL for Deepgram API.
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate.
Audio encoding format. Must be one of: `"linear16"`, `"mulaw"`, `"alaw"`.
#### DeepgramTTSService Settings
Runtime-configurable settings passed via the `settings` constructor argument using `DeepgramTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
### DeepgramHttpTTSService
Deepgram API key for authentication.
Voice model to use for synthesis. *Deprecated in v0.0.105. Use
`settings=DeepgramHttpTTSService.Settings(voice=...)` instead.*
Runtime-configurable settings. See [DeepgramHttpTTSService
Settings](#deepgramhttpttsservice-settings) below.
An aiohttp session for HTTP requests. You must create and manage this
yourself.
HTTP API base URL.
Output audio sample rate in Hz.
Audio encoding format.
#### DeepgramHttpTTSService Settings
Runtime-configurable settings passed via the `settings` constructor argument using `DeepgramHttpTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
### DeepgramSageMakerTTSService
Name of the SageMaker endpoint with Deepgram TTS model deployed.
AWS region where the SageMaker endpoint is deployed (e.g., `"us-east-2"`).
Voice model to use for synthesis. *Deprecated in v0.0.105. Use
`settings=DeepgramSageMakerTTSService.Settings(voice=...)` instead.*
Runtime-configurable settings. See [DeepgramSageMakerTTSService
Settings](#deepgramsagemakerttsservice-settings) below.
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate.
Audio encoding format.
#### DeepgramSageMakerTTSService Settings
Runtime-configurable settings passed via the `settings` constructor argument using `DeepgramSageMakerTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.deepgram import DeepgramTTSService
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramTTSService.Settings(
voice="aura-2-helena-en",
),
)
```
### HTTP Service
```python theme={null}
import aiohttp
from pipecat.services.deepgram import DeepgramHttpTTSService
async with aiohttp.ClientSession() as session:
tts = DeepgramHttpTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramHttpTTSService.Settings(
voice="aura-2-helena-en",
),
aiohttp_session=session,
)
```
### SageMaker Service
```python theme={null}
from pipecat.services.deepgram.sagemaker.tts import DeepgramSageMakerTTSService
tts = DeepgramSageMakerTTSService(
endpoint_name=os.getenv("SAGEMAKER_TTS_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
settings=DeepgramSageMakerTTSService.Settings(
voice="aura-2-helena-en",
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **WebSocket vs HTTP vs SageMaker**: The WebSocket service (`DeepgramTTSService`) and SageMaker service (`DeepgramSageMakerTTSService`) both support real-time streaming with interruption handling, making them suitable for interactive conversations. The HTTP service (`DeepgramHttpTTSService`) is simpler but processes each request as a batch.
* **Encoding validation**: The WebSocket service validates the `encoding` parameter at initialization and raises a `ValueError` for unsupported formats.
* **SageMaker deployment**: The SageMaker service requires a Deepgram TTS model deployed to an AWS SageMaker endpoint. See the [Deepgram SageMaker deployment guide](https://developers.deepgram.com/docs/deploy-amazon-sagemaker) for setup instructions.
## Event Handlers
The WebSocket and SageMaker services support the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| --------------------- | ---------------------------------------------- |
| `on_connected` | Connected to Deepgram (WebSocket or SageMaker) |
| `on_disconnected` | Disconnected from Deepgram |
| `on_connection_error` | Connection error occurred |
```python theme={null}
@tts.event_handler("on_connected")
async def on_connected(service):
print("Connected to Deepgram")
```
# ElevenLabs
Source: https://docs.pipecat.ai/api-reference/server/services/tts/elevenlabs
Text-to-speech service using ElevenLabs' streaming API with word-level timing
## Overview
ElevenLabs provides high-quality text-to-speech synthesis with two service implementations:
* **`ElevenLabsTTSService`** (WebSocket) — Real-time streaming with word-level timestamps, audio context management, and interruption handling. Recommended for interactive applications.
* **`ElevenLabsHttpTTSService`** (HTTP) — Simpler batch-style synthesis. Suitable for non-interactive use cases or when WebSocket connections are not possible.
Complete API reference for all parameters and methods
Complete example with WebSocket streaming
Official ElevenLabs TTS API documentation
Browse and clone voices from the community
## Installation
```bash theme={null}
pip install "pipecat-ai[elevenlabs]"
```
## Prerequisites
1. **ElevenLabs Account**: Sign up at [ElevenLabs](https://elevenlabs.io/app/sign-up)
2. **API Key**: Generate an API key from your account dashboard
3. **Voice Selection**: Choose voice IDs from the [voice library](https://elevenlabs.io/voice-library)
Set the following environment variable:
```bash theme={null}
export ELEVENLABS_API_KEY=your_api_key
```
## Configuration
### ElevenLabsTTSService
ElevenLabs API key.
Voice ID from the [voice library](https://elevenlabs.io/voice-library).
*Deprecated in v0.0.105. Use
`settings=ElevenLabsTTSService.Settings(voice=...)` instead.*
ElevenLabs model ID. Use a `multilingual` model variant (e.g.
`eleven_multilingual_v2`) if you need non-English language support.
*Deprecated in v0.0.105. Use
`settings=ElevenLabsTTSService.Settings(model=...)` instead.*
WebSocket endpoint URL. Override for custom or proxied deployments.
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate.
Controls how incoming text is aggregated before synthesis. `SENTENCE`
(default) buffers text until sentence boundaries, producing more natural
speech. `TOKEN` streams tokens directly for lower latency. Import from
`pipecat.services.tts_service`.
*Deprecated in v0.0.104.* Use `text_aggregation_mode` instead.
*Deprecated in v0.0.105. Use `settings=ElevenLabsTTSService.Settings(...)`
instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### ElevenLabsHttpTTSService
The HTTP service accepts the same parameters as the WebSocket service, with these differences:
An aiohttp session for HTTP requests. You must create and manage this
yourself.
HTTP API base URL (instead of `url` for WebSocket).
The HTTP service uses `ElevenLabsHttpTTSSettings` which also includes:
Latency optimization level (0–4). Higher values reduce latency at the cost of
quality.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `ElevenLabsTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| -------------------------- | ----------------- | ----------- | ------------------------------------------------------------------------------------------------- |
| `model` | `str` | `None` | ElevenLabs model identifier. *(Inherited from base settings.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited from base settings.)* |
| `language` | `Language \| str` | `None` | Language code. Only effective with multilingual models. *(Inherited from base settings.)* |
| `stability` | `float` | `NOT_GIVEN` | Voice consistency (0.0–1.0). Lower values are more expressive, higher values are more consistent. |
| `similarity_boost` | `float` | `NOT_GIVEN` | Voice clarity and similarity to the original (0.0–1.0). |
| `style` | `float` | `NOT_GIVEN` | Style exaggeration (0.0–1.0). Higher values amplify the voice's style. |
| `use_speaker_boost` | `bool` | `NOT_GIVEN` | Enhance clarity and target speaker similarity. |
| `speed` | `float` | `NOT_GIVEN` | Speech rate. WebSocket: 0.7–1.2. HTTP: 0.25–4.0. |
| `apply_text_normalization` | `Literal` | `NOT_GIVEN` | Text normalization: `"auto"`, `"on"`, or `"off"`. |
`NOT_GIVEN` values use the ElevenLabs API defaults. See [ElevenLabs voice
settings](https://elevenlabs.io/docs/api-reference/text-to-speech/v-1-text-to-speech-voice-id-multi-stream-input)
for details on how these parameters interact.
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.elevenlabs import ElevenLabsTTSService
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
settings=ElevenLabsTTSService.Settings(
voice="21m00Tcm4TlvDq8ikWAM", # Rachel
),
)
```
### With Voice Customization
```python theme={null}
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
settings=ElevenLabsTTSService.Settings(
voice="21m00Tcm4TlvDq8ikWAM",
model="eleven_multilingual_v2",
language=Language.ES,
stability=0.7,
similarity_boost=0.8,
speed=1.1,
),
)
```
### Updating Settings at Runtime
Voice settings can be changed mid-conversation using `TTSUpdateSettingsFrame`:
```python theme={null}
from pipecat.frames.frames import TTSUpdateSettingsFrame
from pipecat.services.elevenlabs.tts import ElevenLabsTTSSettings
await task.queue_frame(
TTSUpdateSettingsFrame(
delta=ElevenLabsTTSSettings(
stability=0.3,
speed=1.1,
)
)
)
```
### HTTP Service
```python theme={null}
import aiohttp
from pipecat.services.elevenlabs import ElevenLabsHttpTTSService
async with aiohttp.ClientSession() as session:
tts = ElevenLabsHttpTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
settings=ElevenLabsHttpTTSService.Settings(
voice="21m00Tcm4TlvDq8ikWAM",
),
aiohttp_session=session,
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Multilingual models required for `language`**: Setting `language` with a non-multilingual model (e.g. `eleven_turbo_v2_5`) has no effect. Use `eleven_multilingual_v2` or similar.
* **WebSocket vs HTTP**: The WebSocket service supports word-level timestamps and interruption handling, making it significantly better for interactive conversations. The HTTP service is simpler but lacks these features.
* **Text aggregation**: Sentence aggregation is enabled by default (`text_aggregation_mode=TextAggregationMode.SENTENCE`). Buffering until sentence boundaries produces more natural speech. Set `text_aggregation_mode=TextAggregationMode.TOKEN` to stream tokens directly for lower latency, but you must also set `auto_mode=False` in `settings` when using TOKEN mode.
## Event Handlers
ElevenLabs TTS supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| --------------------- | -------------------------------------- |
| `on_connected` | Connected to ElevenLabs WebSocket |
| `on_disconnected` | Disconnected from ElevenLabs WebSocket |
| `on_connection_error` | WebSocket connection error occurred |
```python theme={null}
@tts.event_handler("on_connected")
async def on_connected(service):
print("Connected to ElevenLabs")
```
# Fish Audio
Source: https://docs.pipecat.ai/api-reference/server/services/tts/fish
Real-time text-to-speech service using Fish Audio's WebSocket API
## Overview
`FishAudioTTSService` provides real-time text-to-speech synthesis through Fish Audio's WebSocket-based streaming API. The service offers custom voice models, prosody controls, and multiple audio formats optimized for conversational AI applications with low latency.
Pipecat's API methods for Fish Audio TTS integration
Complete example with custom voice model
Official Fish Audio documentation
Create and manage custom voice models
## Installation
To use Fish Audio services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[fish]"
```
## Prerequisites
### Fish Audio Account Setup
Before using Fish Audio TTS services, you need:
1. **Fish Audio Account**: Sign up at [Fish Audio Console](https://console.fish.audio/)
2. **API Key**: Generate an API key from your account dashboard
3. **Voice Models**: Create or select custom voice models for synthesis
### Required Environment Variables
* `FISH_API_KEY`: Your Fish Audio API key for authentication
## Configuration
### FishAudioTTSService
Fish Audio API key for authentication.
Reference ID of the voice model to use for synthesis. Deprecated in v0.0.105.
Use `settings=FishAudioTTSService.Settings(voice=...)` instead.
Fish Audio TTS model to use.
*Deprecated in v0.0.105. Use `settings=FishAudioTTSService.Settings(...)` instead.*
Audio output format. Options: `"pcm"`, `"opus"`, `"mp3"`, `"wav"`.
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate.
Runtime-configurable voice settings. See [InputParams](#inputparams) below.
*Deprecated in v0.0.105. Use `settings=FishAudioTTSService.Settings(...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `FishAudioTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------------- | ----------------- | ----------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `latency` | `str` | `NOT_GIVEN` | Latency mode setting. |
| `normalize` | `bool` | `NOT_GIVEN` | Whether to normalize audio. |
| `temperature` | `float` | `NOT_GIVEN` | Temperature for sampling. |
| `top_p` | `float` | `NOT_GIVEN` | Top-p sampling parameter. |
| `prosody_speed` | `float` | `NOT_GIVEN` | Prosody speed control. |
| `prosody_volume` | `int` | `NOT_GIVEN` | Prosody volume control. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.fish import FishAudioTTSService
tts = FishAudioTTSService(
api_key=os.getenv("FISH_API_KEY"),
settings=FishAudioTTSService.Settings(
voice="your-voice-reference-id",
),
)
```
### With Prosody Controls
```python theme={null}
tts = FishAudioTTSService(
api_key=os.getenv("FISH_API_KEY"),
settings=FishAudioTTSService.Settings(
voice="your-voice-reference-id",
model="s2-pro",
prosody_speed=1.2,
prosody_volume=3,
latency="balanced",
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **`voice` required**: You must specify either `voice` (preferred) or the deprecated `model` or `reference_id` parameter. Passing both raises a `ValueError`.
* **Model switching**: Changing the model via `set_model()` automatically disconnects and reconnects the WebSocket with the new model configuration.
## Event Handlers
Fish Audio TTS supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| --------------------- | -------------------------------------- |
| `on_connected` | Connected to Fish Audio WebSocket |
| `on_disconnected` | Disconnected from Fish Audio WebSocket |
| `on_connection_error` | WebSocket connection error occurred |
```python theme={null}
@tts.event_handler("on_connected")
async def on_connected(service):
print("Connected to Fish Audio")
```
# Google
Source: https://docs.pipecat.ai/api-reference/server/services/tts/google
Text-to-speech service using Google's Cloud Text-to-Speech API
## Overview
Google Cloud Text-to-Speech provides high-quality speech synthesis with two service implementations: `GoogleTTSService` (WebSocket-based) for streaming with the lowest latency, and `GoogleHttpTTSService` (HTTP-based) for simpler integration. `GoogleTTSService` is recommended for real-time applications.
Pipecat's API methods for Google Cloud TTS integration
Complete example with Chirp 3 HD voice
Official Google Cloud Text-to-Speech documentation
Browse available voices and languages
## Installation
To use Google services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[google]"
```
## Prerequisites
### Google Cloud Setup
Before using Google Cloud TTS services, you need:
1. **Google Cloud Account**: Sign up at [Google Cloud Console](https://console.cloud.google.com/)
2. **Project Setup**: Create a project and enable the Text-to-Speech API
3. **Service Account**: Create a service account with TTS permissions
4. **Authentication**: Set up credentials via service account key or Application Default Credentials
### Required Environment Variables
* `GOOGLE_APPLICATION_CREDENTIALS`: Path to your service account key file (recommended)
* Or use Application Default Credentials for cloud deployments
## Configuration
### GoogleTTSService
Streaming service optimized for Chirp 3 HD and Journey voices.
JSON string containing Google Cloud service account credentials.
Path to Google Cloud service account JSON file.
Google Cloud location for regional endpoint (e.g., `"us-central1"`).
Google TTS voice identifier. *Deprecated in v0.0.105. Use
`settings=GoogleTTSService.Settings(voice=...)` instead.*
Voice cloning key for Chirp 3 custom voices.
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate.
*Deprecated in v0.0.105. Use `settings=GoogleTTSService.Settings(...)`
instead.*
Runtime-configurable settings. See [GoogleTTSService
Settings](#googlettsservice-settings) below.
#### GoogleTTSService Settings
Runtime-configurable settings passed via the `settings` constructor argument using `GoogleTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| --------------- | ----------------- | ----------- | ---------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `speaking_rate` | `float` | `NOT_GIVEN` | Speaking rate in the range \[0.25, 2.0]. |
### GoogleHttpTTSService
HTTP service with full SSML support for all voice types.
JSON string containing Google Cloud service account credentials.
Path to Google Cloud service account JSON file.
Google Cloud location for regional endpoint.
Google TTS voice identifier. *Deprecated in v0.0.105. Use
`settings=GoogleHttpTTSService.Settings(voice=...)` instead.*
Output audio sample rate in Hz.
*Deprecated in v0.0.105. Use `settings=GoogleHttpTTSService.Settings(...)`
instead.*
Runtime-configurable settings. See [GoogleHttpTTSService
Settings](#googlehttpttsservice-settings) below.
#### GoogleHttpTTSService Settings
Runtime-configurable settings passed via the `settings` constructor argument using `GoogleHttpTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| --------------- | ----------------- | ----------- | -------------------------------------------------------------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `pitch` | `str` | `NOT_GIVEN` | Voice pitch adjustment (e.g., `"+2st"`, `"-50%"`). |
| `rate` | `str` | `NOT_GIVEN` | Speaking rate for SSML prosody (non-Chirp voices, e.g., `"slow"`, `"fast"`, `"125%"`). |
| `speaking_rate` | `float` | `NOT_GIVEN` | Speaking rate for AudioConfig (Chirp/Journey voices). Range \[0.25, 2.0]. |
| `volume` | `str` | `NOT_GIVEN` | Volume adjustment (e.g., `"loud"`, `"soft"`, `"+6dB"`). |
| `emphasis` | `Literal` | `NOT_GIVEN` | Emphasis level: `"strong"`, `"moderate"`, `"reduced"`, `"none"`. |
| `gender` | `Literal` | `NOT_GIVEN` | Voice gender preference: `"male"`, `"female"`, `"neutral"`. |
| `google_style` | `Literal` | `NOT_GIVEN` | Google-specific voice style: `"apologetic"`, `"calm"`, `"empathetic"`, `"firm"`, `"lively"`. |
### GeminiTTSService
Streaming service using Gemini's TTS-specific models with natural voice control, prompts for style instructions, and multi-speaker support.
Gemini TTS model to use. Options: `"gemini-2.5-flash-tts"`,
`"gemini-2.5-pro-tts"`. *Deprecated in v0.0.105. Use
`settings=GeminiTTSService.Settings(model=...)` instead.*
JSON string containing Google Cloud service account credentials.
Path to Google Cloud service account JSON file.
Google Cloud location for regional endpoint.
Voice name from available Gemini voices (e.g., `"Kore"`, `"Charon"`, `"Puck"`,
`"Zephyr"`). *Deprecated in v0.0.105. Use
`settings=GeminiTTSService.Settings(voice=...)` instead.*
Output audio sample rate in Hz. Google TTS outputs at 24kHz; mismatched rates
will produce a warning.
*Deprecated in v0.0.105. Use `settings=GeminiTTSService.Settings(...)`
instead.*
Runtime-configurable settings. See [GeminiTTSService
Settings](#geminittsservice-settings) below.
#### GeminiTTSService Settings
Runtime-configurable settings passed via the `settings` constructor argument using `GeminiTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ----------------- | ----------------- | ----------- | ----------------------------------------------------------------------------------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `prompt` | `str` | `NOT_GIVEN` | Style instructions for how to synthesize the content. |
| `multi_speaker` | `bool` | `NOT_GIVEN` | Enable multi-speaker support. |
| `speaker_configs` | `list[dict]` | `NOT_GIVEN` | Speaker configurations for multi-speaker mode. Each dict should have `speaker_alias` and optionally `speaker_id`. |
## Usage
### Basic Setup (Streaming)
```python theme={null}
from pipecat.services.google import GoogleTTSService
tts = GoogleTTSService(
credentials_path="/path/to/service-account.json",
settings=GoogleTTSService.Settings(
voice="en-US-Chirp3-HD-Charon",
language=Language.EN_US,
)
)
```
### HTTP Service with SSML
```python theme={null}
from pipecat.services.google import GoogleHttpTTSService
from pipecat.transcriptions.language import Language
tts = GoogleHttpTTSService(
credentials_path="/path/to/service-account.json",
settings=GoogleHttpTTSService.Settings(
voice="en-US-Standard-A",
language=Language.EN_US,
rate="1.1",
pitch="+2st",
),
)
```
### Gemini TTS with Style Prompt
```python theme={null}
from pipecat.services.google import GeminiTTSService
from pipecat.transcriptions.language import Language
tts = GeminiTTSService(
credentials_path="/path/to/service-account.json",
settings=GeminiTTSService.Settings(
model="gemini-2.5-flash-tts",
voice="Kore",
language=Language.EN_US,
prompt="Say this in a friendly and helpful tone"
)
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Streaming vs HTTP**: `GoogleTTSService` uses the streaming API for low latency and only supports Chirp 3 HD and Journey voices. `GoogleHttpTTSService` supports all Google voices including Standard and WaveNet, with full SSML support.
* **Chirp/Journey voices and SSML**: Chirp and Journey voices do not support SSML. The HTTP service automatically uses plain text input for these voices.
* **Speaking rate**: For Chirp and Journey voices, use `speaking_rate` (float, 0.25-2.0) in `settings`. For other voices, use `rate` (string) for SSML prosody control.
* **Gemini TTS sample rate**: Google TTS always outputs at 24kHz. Setting a different sample rate will produce a warning and may cause audio issues.
* **Gemini multi-speaker**: Use `multi_speaker=True` with `speaker_configs` to generate conversations between multiple voices. Markup text with speaker aliases to control which voice speaks.
# Gradium
Source: https://docs.pipecat.ai/api-reference/server/services/tts/gradium
Text-to-speech service using Gradium's low-latency streaming API
## Overview
`GradiumTTSService` provides high-quality text-to-speech synthesis using Gradium's WebSocket API with expressive voices, instant voice cloning, streaming inference for real-time applications, and multilingual support.
Pipecat's API methods for Gradium TTS integration
Complete example with streaming synthesis
Official Gradium TTS API documentation
Access API keys and voice library
## Installation
To use Gradium services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[gradium]"
```
## Prerequisites
### Gradium Account Setup
Before using Gradium TTS services, you need:
1. **Gradium Account**: Sign up at [Gradium](https://gradium.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Voice Selection**: Choose voice IDs from the Gradium platform or create custom voices
### Required Environment Variables
* `GRADIUM_API_KEY`: Your Gradium API key for authentication
## Configuration
### GradiumTTSService
Gradium API key for authentication.
Voice identifier. *Deprecated in v0.0.105. Use
`settings=GradiumTTSService.Settings(voice=...)` instead.*
Gradium WebSocket API endpoint.
Model ID to use for synthesis. *Deprecated in v0.0.105. Use
`settings=GradiumTTSService.Settings(model=...)` instead.*
Optional JSON configuration string for additional model settings.
*Deprecated in v0.0.105. Use `settings=GradiumTTSService.Settings(...)`
instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `GradiumTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
The Gradium service outputs audio at a fixed 48kHz sample rate. This is set
automatically and cannot be changed.
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.gradium import GradiumTTSService
tts = GradiumTTSService(
api_key=os.getenv("GRADIUM_API_KEY"),
settings=GradiumTTSService.Settings(
voice="YTpq7expH9539ERJ",
),
)
```
### With Custom Configuration
```python theme={null}
tts = GradiumTTSService(
api_key=os.getenv("GRADIUM_API_KEY"),
settings=GradiumTTSService.Settings(
model="default",
voice="your-voice-id",
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Word timestamps**: Gradium provides word-level timestamps for synchronized text display.
* **Voice switching**: Changing the voice at runtime via `UpdateSettingsFrame` automatically disconnects and reconnects the WebSocket with the new voice configuration.
* **Fixed sample rate**: Gradium always outputs at 48kHz. The sample rate is not configurable.
## Event Handlers
Gradium TTS supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| --------------------- | ----------------------------------- |
| `on_connected` | Connected to Gradium WebSocket |
| `on_disconnected` | Disconnected from Gradium WebSocket |
| `on_connection_error` | WebSocket connection error occurred |
```python theme={null}
@tts.event_handler("on_connected")
async def on_connected(service):
print("Connected to Gradium")
```
# Groq
Source: https://docs.pipecat.ai/api-reference/server/services/tts/groq
Text-to-speech service implementation using Groq's TTS API
## Overview
`GroqTTSService` provides fast text-to-speech synthesis using Groq's TTS API with multiple voice options. The service operates at a fixed 48kHz sample rate and offers efficient audio streaming for real-time applications with ultra-low latency.
Pipecat's API methods for Groq TTS integration
Complete example with Groq STT and LLM
Official Groq API documentation and models
Explore available voice models and features
## Installation
To use Groq services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[groq]"
```
## Prerequisites
### Groq Account Setup
Before using Groq TTS services, you need:
1. **Groq Account**: Sign up at [Groq Console](https://console.groq.com/login)
2. **API Key**: Generate an API key from your account dashboard
3. **Voice Selection**: Choose from available voice models
### Required Environment Variables
* `GROQ_API_KEY`: Your Groq API key for authentication
## Configuration
### GroqTTSService
Groq API key for authentication.
TTS model to use. *Deprecated in v0.0.105. Use
`settings=GroqTTSService.Settings(model=...)` instead.*
Voice identifier to use. *Deprecated in v0.0.105. Use
`settings=GroqTTSService.Settings(voice=...)` instead.*
Audio output format.
Audio sample rate. Must be 48000 Hz for Groq TTS.
*Deprecated in v0.0.105. Use `settings=GroqTTSService.Settings(...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `GroqTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ----------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `speed` | `float` | `NOT_GIVEN` | Speech rate control. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.groq import GroqTTSService
tts = GroqTTSService(
api_key=os.getenv("GROQ_API_KEY"),
settings=GroqTTSService.Settings(
voice="autumn",
),
)
```
### With Custom Settings
```python theme={null}
from pipecat.transcriptions.language import Language
tts = GroqTTSService(
api_key=os.getenv("GROQ_API_KEY"),
settings=GroqTTSService.Settings(
model="canopylabs/orpheus-v1-english",
voice="autumn",
language=Language.EN,
speed=1.2,
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Fixed sample rate**: Groq TTS only supports 48kHz sample rate. Setting a different value will produce a warning.
* **WAV output**: The service outputs WAV-formatted audio, which is decoded internally to extract raw PCM frames.
# Hume
Source: https://docs.pipecat.ai/api-reference/server/services/tts/hume
Text-to-speech service using Hume AI's expressive Octave models with word timestamps
## Overview
Hume provides expressive text-to-speech synthesis using their Octave models, which adapt pronunciation, pitch, speed, and emotional style based on context. `HumeTTSService` offers real-time streaming with word-level timestamps, custom voice support, and advanced synthesis controls including acting instructions, speed adjustment, and trailing silence configuration.
Pipecat's API methods for Hume TTS integration
Complete example with word timestamps and interruption handling
Official Hume TTS API documentation and features
Browse and manage available voices
## Installation
To use Hume services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[hume]"
```
## Prerequisites
### Hume Account Setup
Before using Hume TTS services, you need:
1. **Hume Account**: Sign up at [Hume AI](https://www.hume.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Voice Selection**: Choose voice IDs from the voice library or create custom voices
### Required Environment Variables
* `HUME_API_KEY`: Your Hume API key for authentication
## Configuration
### HumeTTSService
Hume API key. If omitted, reads the `HUME_API_KEY` environment variable.
ID of the voice to use. Only voice IDs are supported; voice names are not.
*Deprecated in v0.0.105. Use `settings=HumeTTSService.Settings(...)` instead.*
Output sample rate for PCM frames. Hume TTS streams at 48kHz.
Runtime-configurable synthesis controls. See [InputParams](#inputparams)
below.
*Deprecated in v0.0.105. Use `settings=HumeTTSService.Settings(...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `HumeTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ------------------ | ----------------- | ----------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `description` | `str` | `NOT_GIVEN` | Description to guide voice synthesis. |
| `speed` | `float` | `NOT_GIVEN` | Speech rate control. |
| `trailing_silence` | `float` | `NOT_GIVEN` | Trailing silence duration in seconds. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.hume import HumeTTSService
tts = HumeTTSService(
api_key=os.getenv("HUME_API_KEY"),
settings=HumeTTSService.Settings(
voice="your-voice-id",
),
)
```
### With Acting Directions
```python theme={null}
tts = HumeTTSService(
api_key=os.getenv("HUME_API_KEY"),
settings=HumeTTSService.Settings(
voice="your-voice-id",
description="Speak warmly and reassuringly",
speed=1.1,
trailing_silence=0.5,
),
)
```
### Updating Settings at Runtime
Voice and synthesis parameters can be changed mid-conversation using `TTSUpdateSettingsFrame`:
```python theme={null}
from pipecat.frames.frames import TTSUpdateSettingsFrame
from pipecat.services.hume.tts import HumeTTSSettings
await task.queue_frame(
TTSUpdateSettingsFrame(
delta=HumeTTSSettings(
speed=1.3,
description="Speak with excitement",
)
)
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Fixed sample rate**: Hume TTS streams at 48kHz. Setting a different `sample_rate` will produce a warning.
* **Word timestamps**: The service provides word-level timestamps for synchronized text display. Timestamps are tracked cumulatively across utterances within a turn.
* **Description versions**: When `description` is provided, the service uses Hume API version `"1"`. Without a description, it uses the newer version `"2"`.
* **Audio buffering**: Audio is buffered internally until a minimum chunk size is reached before being pushed as frames, reducing audio glitches.
# Inworld
Source: https://docs.pipecat.ai/api-reference/server/services/tts/inworld
Text-to-speech service using Inworld AI's TTS APIs
## Overview
Inworld provides high-quality, low-latency speech synthesis via two implementation types: `InworldTTSService` for real-time, minimal-latency use-cases through websockets and `InworldHttpTTSService` for streaming and non-streaming use-cases over HTTP. Featuring support for 12+ languages, timestamps, custom pronunciation and instant voice cloning.
Pipecat's API methods for Inworld TTS integration
Complete example with Inworld TTS
Official Inworld TTS API documentation
Create and manage voice models
## Installation
To use Inworld services, no additional dependencies are required beyond the base installation:
```bash theme={null}
pip install "pipecat-ai"
```
## Prerequisites
### Inworld Account Setup
Before using Inworld TTS services, you need:
1. **Inworld Account**: Sign up at [Inworld Studio](https://studio.inworld.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Voice Selection**: Choose from available voice models
### Required Environment Variables
* `INWORLD_API_KEY`: Your Inworld API key for authentication
## Configuration
### InworldTTSService
WebSocket-based service for lowest latency streaming.
Inworld API key.
ID of the voice to use for synthesis. *Deprecated in v0.0.105. Use
`settings=InworldTTSService.Settings(voice=...)` instead.*
ID of the model to use for synthesis. *Deprecated in v0.0.105. Use
`settings=InworldTTSService.Settings(model=...)` instead.*
URL of the Inworld WebSocket API.
Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
rate.
Audio encoding format.
Controls how incoming text is aggregated before synthesis. `SENTENCE`
(default) buffers text until sentence boundaries, producing more natural
speech. `TOKEN` streams tokens directly for lower latency. Import from
`pipecat.services.tts_service`.
*Deprecated in v0.0.104.* Use `text_aggregation_mode` instead.
Whether to append a trailing space to text before sending to TTS.
*Deprecated in v0.0.105. Use `settings=InworldTTSService.Settings(...)`
instead.*
Runtime-configurable settings. See [InworldTTSService
Settings](#inworldttsservice-settings) below.
#### InworldTTSService Settings
Runtime-configurable settings passed via the `settings` constructor argument using `InworldTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| --------------- | ----------------- | ----------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `speaking_rate` | `float` | `NOT_GIVEN` | Speaking rate for speech synthesis. |
| `temperature` | `float` | `NOT_GIVEN` | Temperature for speech synthesis. |
### InworldHttpTTSService
HTTP-based service supporting both streaming and non-streaming modes.
Inworld API key.
aiohttp ClientSession for HTTP requests.
ID of the voice to use for synthesis. *Deprecated in v0.0.105. Use
`settings=InworldHttpTTSService.Settings(voice=...)` instead.*
ID of the model to use for synthesis. *Deprecated in v0.0.105. Use
`settings=InworldHttpTTSService.Settings(model=...)` instead.*
Whether to use streaming mode.
Audio sample rate in Hz.
Audio encoding format.
*Deprecated in v0.0.105. Use `settings=InworldHttpTTSService.Settings(...)`
instead.*
Runtime-configurable settings. See [InworldTTSService
Settings](#inworldttsservice-settings) below.
## Usage
### Basic Setup (WebSocket)
```python theme={null}
from pipecat.services.inworld import InworldTTSService
tts = InworldTTSService(
api_key=os.getenv("INWORLD_API_KEY"),
settings=InworldTTSService.Settings(
voice="Ashley",
),
)
```
### With Custom Settings
```python theme={null}
tts = InworldTTSService(
api_key=os.getenv("INWORLD_API_KEY"),
settings=InworldTTSService.Settings(
voice="Ashley",
model="inworld-tts-1.5-max",
temperature=0.8,
speaking_rate=1.1,
),
)
```
### HTTP Service
```python theme={null}
import aiohttp
from pipecat.services.inworld import InworldHttpTTSService
async with aiohttp.ClientSession() as session:
tts = InworldHttpTTSService(
api_key=os.getenv("INWORLD_API_KEY"),
aiohttp_session=session,
voice_id="Ashley",
streaming=True,
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **WebSocket vs HTTP**: The WebSocket service (`InworldTTSService`) provides the lowest latency with bidirectional streaming and supports multiple independent audio contexts per connection (max 5). The HTTP service supports both streaming and non-streaming modes via the `streaming` parameter.
* **Word timestamps**: Both services provide word-level timestamps for synchronized text display. Timestamps are tracked cumulatively across utterances within a turn. When timestamps are not received from the service, a fallback mechanism ensures the full text is still committed to the LLM conversation context, even on interruption.
* **Auto mode**: When `auto_mode=True` (default), the server controls flushing of buffered text for optimal latency and quality. This is recommended when text is sent in full sentences or phrases (i.e., when using `text_aggregation_mode=TextAggregationMode.SENTENCE`).
* **Keepalive**: The WebSocket service sends periodic keepalive messages every 60 seconds to maintain the connection.
## Event Handlers
Inworld TTS supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| --------------------- | ----------------------------------- |
| `on_connected` | Connected to Inworld WebSocket |
| `on_disconnected` | Disconnected from Inworld WebSocket |
| `on_connection_error` | WebSocket connection error occurred |
```python theme={null}
@tts.event_handler("on_connected")
async def on_connected(service):
print("Connected to Inworld")
```
# Kokoro
Source: https://docs.pipecat.ai/api-reference/server/services/tts/kokoro
Local text-to-speech synthesis using Kokoro ONNX
## Overview
`KokoroTTSService` provides local, offline text-to-speech synthesis using the [kokoro-onnx](https://github.com/thewh1teagle/kokoro-onnx) engine. It runs entirely on the host machine with no external API calls or authentication required. Model files are automatically downloaded to `~/.cache/kokoro-onnx/` on first use.
Pipecat's API methods for Kokoro TTS integration
Complete example with interruption handling
Official kokoro-onnx project and documentation
Example showing runtime settings updates
## Installation
To use Kokoro TTS, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[kokoro]"
```
This installs `kokoro-onnx>=0.5.0` and its dependencies.
## Prerequisites
### Local Setup
Kokoro runs locally and does not require an API key or external service. On first use, the service automatically downloads two model files to `~/.cache/kokoro-onnx/`:
* `kokoro-v1.0.onnx` -- the ONNX speech synthesis model
* `voices-v1.0.bin` -- the voice data file
You can also provide custom paths to pre-downloaded model files via the `model_path` and `voices_path` constructor parameters.
The initial model download may take a few minutes depending on your connection
speed. Subsequent runs use the cached files.
## Configuration
### KokoroTTSService
Path to a custom ONNX model file. When `None`, the model is automatically
downloaded to `~/.cache/kokoro-onnx/kokoro-v1.0.onnx`.
Path to a custom voices binary file. When `None`, the file is automatically
downloaded to `~/.cache/kokoro-onnx/voices-v1.0.bin`.
Voice identifier for synthesis. *Deprecated in v0.0.105. Use
`settings=KokoroTTSService.Settings(voice=...)` instead.*
*Deprecated in v0.0.105. Use `settings=KokoroTTSService.Settings(...)`
instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `KokoroTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ------------- | --------------------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited from base settings.)* |
| `voice` | `str` | `None` | Voice identifier (e.g. `"af_heart"`). |
| `language` | `Language \| str` | `Language.EN` | Language for synthesis. See supported languages. |
### Supported Languages
Kokoro supports the following languages:
| Language | Code |
| ----------------- | ---------------- |
| English (US) | `Language.EN_US` |
| English (UK) | `Language.EN_GB` |
| English (generic) | `Language.EN` |
| Spanish | `Language.ES` |
| French | `Language.FR` |
| Hindi | `Language.HI` |
| Italian | `Language.IT` |
| Japanese | `Language.JA` |
| Portuguese | `Language.PT` |
| Chinese | `Language.ZH` |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.kokoro import KokoroTTSService
tts = KokoroTTSService(
settings=KokoroTTSService.Settings(
voice="af_heart",
),
)
```
### With Language Configuration
```python theme={null}
from pipecat.services.kokoro import KokoroTTSService
from pipecat.transcriptions.language import Language
tts = KokoroTTSService(
settings=KokoroTTSService.Settings(
voice="af_heart",
language=Language.ES,
),
)
```
### With Custom Model Paths
```python theme={null}
from pipecat.services.kokoro import KokoroTTSService
tts = KokoroTTSService(
model_path="/path/to/kokoro-v1.0.onnx",
voices_path="/path/to/voices-v1.0.bin",
settings=KokoroTTSService.Settings(
voice="af_heart",
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Fully local**: Kokoro runs entirely on the host machine using ONNX Runtime. No API keys, network access, or external services are required after the initial model download.
* **Automatic model caching**: Model files are downloaded once to `~/.cache/kokoro-onnx/` and reused on subsequent runs. You can also pre-download models and specify custom paths.
* **Audio resampling**: Kokoro's native output is automatically resampled to match the pipeline's configured sample rate.
* **Streaming output**: The service uses kokoro-onnx's async streaming API, delivering audio frames incrementally as they are generated.
* **Metrics support**: The service supports TTFB (time to first byte) and usage metrics for performance monitoring.
# LMNT
Source: https://docs.pipecat.ai/api-reference/server/services/tts/lmnt
Text-to-speech service implementation using LMNT's streaming API
## Overview
`LMNTTTSService` provides real-time text-to-speech synthesis through LMNT's WebSocket-based streaming API optimized for conversational AI. The service offers ultra-low latency with high-quality voice models and supports multiple languages with automatic interruption handling.
Pipecat's API methods for LMNT TTS integration
Complete example with voice synthesis
Official LMNT streaming speech API documentation
Browse and create custom voices
## Installation
To use LMNT services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[lmnt]"
```
## Prerequisites
### LMNT Account Setup
Before using LMNT TTS services, you need:
1. **LMNT Account**: Sign up at [LMNT Console](https://app.lmnt.com/)
2. **API Key**: Generate an API key from your account dashboard
3. **Voice Selection**: Choose from available voice models or create custom voices
### Required Environment Variables
* `LMNT_API_KEY`: Your LMNT API key for authentication
## Configuration
### LmntTTSService
LMNT API key for authentication.
ID of the voice to use for synthesis. *Deprecated in v0.0.105. Use
`settings=LmntTTSService.Settings(voice=...)` instead.*
LMNT TTS model to use. *Deprecated in v0.0.105. Use
`settings=LmntTTSService.Settings(model=...)` instead.*
Language for synthesis. Supports multiple languages including German, English,
Spanish, French, Hindi, and more. *Deprecated in v0.0.106. Use
`settings=LmntTTSService.Settings(language=...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `LmntTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.lmnt import LmntTTSService
tts = LmntTTSService(
api_key=os.getenv("LMNT_API_KEY"),
settings=LmntTTSService.Settings(
voice="lily",
),
)
```
### With Language Configuration
```python theme={null}
from pipecat.services.lmnt import LmntTTSService
from pipecat.transcriptions.language import Language
tts = LmntTTSService(
api_key=os.getenv("LMNT_API_KEY"),
settings=LmntTTSService.Settings(
voice="lily",
model="aurora",
language=Language.ES,
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **WebSocket-based streaming**: LMNT uses a persistent WebSocket connection for low-latency audio synthesis with automatic reconnection.
* **Class name**: The Python class is `LmntTTSService` (note the lowercase 'mnt').
## Event Handlers
LMNT TTS supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| --------------------- | ----------------------------------- |
| `on_connected` | Connected to LMNT WebSocket |
| `on_disconnected` | Disconnected from LMNT WebSocket |
| `on_connection_error` | WebSocket connection error occurred |
```python theme={null}
@tts.event_handler("on_connected")
async def on_connected(service):
print("Connected to LMNT")
```
# MiniMax
Source: https://docs.pipecat.ai/api-reference/server/services/tts/minimax
Text-to-speech service implementation using MiniMax T2A API
## Overview
`MiniMaxTTSService` provides high-quality text-to-speech synthesis using MiniMax's T2A (Text-to-Audio) API with streaming capabilities, emotional voice control, and support for multiple languages. The service offers various models optimized for different use cases, from low-latency to high-definition audio quality.
Pipecat's API methods for MiniMax TTS integration
Complete example with emotional voice settings
Official MiniMax T2A API documentation
Access voice models and API credentials
## Installation
To use MiniMax services, no additional dependencies are required beyond the base installation:
```bash theme={null}
pip install "pipecat-ai"
```
## Prerequisites
### MiniMax Account Setup
Before using MiniMax TTS services, you need:
1. **MiniMax Account**: Sign up at [MiniMax Platform](https://www.minimax.io/platform/)
2. **API Credentials**: Get your API key and Group ID from the platform
3. **Voice Selection**: Choose from available voice models and emotional settings
### Required Environment Variables
* `MINIMAX_API_KEY`: Your MiniMax API key for authentication
* `MINIMAX_GROUP_ID`: Your MiniMax group ID
## Configuration
### MiniMaxHttpTTSService
MiniMax API key for authentication.
MiniMax Group ID to identify project.
Voice identifier for synthesis.
*Deprecated in v0.0.105. Use `settings=MiniMaxHttpTTSService.Settings(...)` instead.*
TTS model name. Options include `speech-2.6-hd`, `speech-2.6-turbo`,
`speech-02-hd`, `speech-02-turbo`, `speech-01-hd`, `speech-01-turbo`.
*Deprecated in v0.0.105. Use `settings=MiniMaxHttpTTSService.Settings(...)` instead.*
API base URL. Use `https://api.minimaxi.chat/v1/t2a_v2` for mainland China or
`https://api-uw.minimax.io/v1/t2a_v2` for western United States.
An aiohttp session for HTTP requests.
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate.
Runtime-configurable voice and generation settings. See
[InputParams](#inputparams) below.
*Deprecated in v0.0.105. Use `settings=MiniMaxHttpTTSService.Settings(...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `MiniMaxHttpTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| -------------------- | ----------------- | ----------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `speed` | `float` | `NOT_GIVEN` | Speech speed. |
| `volume` | `float` | `NOT_GIVEN` | Volume level. |
| `pitch` | `int` | `NOT_GIVEN` | Pitch adjustment. |
| `emotion` | `str` | `NOT_GIVEN` | Emotion for synthesis. |
| `text_normalization` | `bool` | `NOT_GIVEN` | Whether to apply text normalization. |
| `latex_read` | `bool` | `NOT_GIVEN` | Whether to read LaTeX formulas. |
| `language_boost` | `str` | `NOT_GIVEN` | Language boost setting. |
## Usage
### Basic Setup
```python theme={null}
import aiohttp
from pipecat.services.minimax import MiniMaxHttpTTSService
async with aiohttp.ClientSession() as session:
tts = MiniMaxHttpTTSService(
api_key=os.getenv("MINIMAX_API_KEY"),
group_id=os.getenv("MINIMAX_GROUP_ID"),
aiohttp_session=session,
)
```
### With Voice Customization
```python theme={null}
import aiohttp
from pipecat.services.minimax import MiniMaxHttpTTSService
from pipecat.transcriptions.language import Language
async with aiohttp.ClientSession() as session:
tts = MiniMaxHttpTTSService(
api_key=os.getenv("MINIMAX_API_KEY"),
group_id=os.getenv("MINIMAX_GROUP_ID"),
aiohttp_session=session,
settings=MiniMaxHttpTTSService.Settings(
voice="Calm_Woman",
model="speech-02-hd",
language=Language.ZH,
speed=1.2,
emotion="happy",
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **HTTP-based streaming**: MiniMax uses an HTTP streaming API, not WebSocket. Audio data is returned in hex-encoded PCM chunks.
* **Emotional voice control**: The `emotion` parameter lets you adjust the emotional tone of the voice without changing the voice model itself.
* **Model selection**: The `speech-2.6-*` models are the latest and support additional languages (Filipino, Tamil, Persian). Use `turbo` variants for lower latency or `hd` variants for higher quality.
* **The Python class is named `MiniMaxHttpTTSService`**, not `MiniMaxTTSService`.
# Neuphonic
Source: https://docs.pipecat.ai/api-reference/server/services/tts/neuphonic
Text-to-speech service implementation using Neuphonic's API
## Overview
Neuphonic provides high-quality text-to-speech synthesis with two service implementations: `NeuphonicTTSService` (WebSocket-based) with real-time streaming and interruption support, and `NeuphonicHttpTTSService` (HTTP-based) with server-sent events. `NeuphonicTTSService` is recommended for interactive applications requiring low latency.
Pipecat's API methods for Neuphonic TTS integration
Complete example with WebSocket streaming
Official Neuphonic TTS API documentation
Browse available voices and features
## Installation
To use Neuphonic services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[neuphonic]"
```
## Prerequisites
### Neuphonic Account Setup
Before using Neuphonic TTS services, you need:
1. **Neuphonic Account**: Sign up at [Neuphonic](https://docs.neuphonic.com/)
2. **API Key**: Generate an API key from your account dashboard
3. **Voice Selection**: Choose from available voice models
### Required Environment Variables
* `NEUPHONIC_API_KEY`: Your Neuphonic API key for authentication
## Configuration
### NeuphonicTTSService
Neuphonic API key for authentication.
ID of the voice to use for synthesis.
*Deprecated in v0.0.105. Use `settings=NeuphonicTTSService.Settings(...)` instead.*
WebSocket URL for the Neuphonic API.
Output audio sample rate in Hz.
Audio encoding format.
Controls how incoming text is aggregated before synthesis. `SENTENCE`
(default) buffers text until sentence boundaries, producing more natural
speech. `TOKEN` streams tokens directly for lower latency. Import from
`pipecat.services.tts_service`.
*Deprecated in v0.0.104.* Use `text_aggregation_mode` instead.
Runtime-configurable voice and generation settings. See
[InputParams](#inputparams) below.
*Deprecated in v0.0.105. Use `settings=NeuphonicTTSService.Settings(...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### NeuphonicHttpTTSService
The HTTP service uses SSE (server-sent events) for streaming audio.
Neuphonic API key for authentication.
ID of the voice to use for synthesis.
*Deprecated in v0.0.105. Use `settings=NeuphonicHttpTTSService.Settings(...)` instead.*
An aiohttp session for HTTP requests.
Base URL for the Neuphonic HTTP API.
Output audio sample rate in Hz.
Audio encoding format.
Runtime-configurable voice and generation settings. See
[InputParams](#inputparams) below.
*Deprecated in v0.0.105. Use `settings=NeuphonicHttpTTSService.Settings(...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `NeuphonicTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ----------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `speed` | `float` | `NOT_GIVEN` | Speech rate control. |
## Usage
### Basic Setup (WebSocket)
```python theme={null}
from pipecat.services.neuphonic import NeuphonicTTSService
tts = NeuphonicTTSService(
api_key=os.getenv("NEUPHONIC_API_KEY"),
settings=NeuphonicTTSService.Settings(
voice="your-voice-id",
),
)
```
### With Customization (WebSocket)
```python theme={null}
from pipecat.services.neuphonic import NeuphonicTTSService
from pipecat.transcriptions.language import Language
tts = NeuphonicTTSService(
api_key=os.getenv("NEUPHONIC_API_KEY"),
sample_rate=22050,
settings=NeuphonicTTSService.Settings(
voice="your-voice-id",
language=Language.FR,
speed=1.2,
),
)
```
### HTTP Service
```python theme={null}
import aiohttp
from pipecat.services.neuphonic import NeuphonicHttpTTSService
async with aiohttp.ClientSession() as session:
tts = NeuphonicHttpTTSService(
api_key=os.getenv("NEUPHONIC_API_KEY"),
settings=NeuphonicHttpTTSService.Settings(
voice="your-voice-id",
),
aiohttp_session=session,
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **WebSocket vs HTTP**: The WebSocket service (`NeuphonicTTSService`) supports interruption handling and keepalive connections, making it better for interactive conversations. The HTTP service (`NeuphonicHttpTTSService`) uses server-sent events and is simpler to integrate.
* **Keepalive**: The WebSocket service automatically sends keepalive messages every 10 seconds to maintain the connection.
* **Default sample rate**: Both services default to 22050 Hz, which differs from most other TTS services.
## Event Handlers
Neuphonic WebSocket TTS supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| --------------------- | ------------------------------------- |
| `on_connected` | Connected to Neuphonic WebSocket |
| `on_disconnected` | Disconnected from Neuphonic WebSocket |
| `on_connection_error` | WebSocket connection error occurred |
```python theme={null}
@tts.event_handler("on_connected")
async def on_connected(service):
print("Connected to Neuphonic")
```
# NVIDIA Riva
Source: https://docs.pipecat.ai/api-reference/server/services/tts/nvidia
Text-to-speech service implementation using NVIDIA Riva
## Overview
`NvidiaTTSService` provides high-quality text-to-speech synthesis through NVIDIA Riva's cloud-based AI models accessible via gRPC API. The service offers multilingual support, configurable quality settings, and streaming audio generation optimized for real-time applications.
Pipecat's API methods for NVIDIA Riva TTS integration
Complete example with Riva NIM
Official NVIDIA Riva TTS documentation
Access API keys and Riva services
## Installation
To use NVIDIA Riva services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[nvidia]"
```
## Prerequisites
### NVIDIA Riva Setup
Before using Riva TTS services, you need:
1. **NVIDIA Developer Account**: Sign up at [NVIDIA Developer Portal](https://developer.nvidia.com/)
2. **API Key**: Generate an NVIDIA API key for Riva services
3. **Riva Access**: Ensure access to NVIDIA Riva TTS services
### Required Environment Variables
* `NVIDIA_API_KEY`: Your NVIDIA API key for authentication
## Configuration
### NvidiaTTSService
NVIDIA API key for authentication.
gRPC server endpoint.
Voice model identifier.
*Deprecated in v0.0.105. Use `settings=NvidiaTTSService.Settings(...)` instead.*
Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
rate.
Dictionary containing `function_id` and `model_name` for the TTS model.
Whether to use SSL for the NVIDIA Riva server connection.
Runtime-configurable synthesis settings. See [InputParams](#inputparams)
below.
*Deprecated in v0.0.105. Use `settings=NvidiaTTSService.Settings(...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `NvidiaTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ----------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `quality` | `int` | `NOT_GIVEN` | Audio quality setting. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.nvidia import NvidiaTTSService
tts = NvidiaTTSService(
api_key=os.getenv("NVIDIA_API_KEY"),
)
```
### With Custom Voice and Quality
```python theme={null}
from pipecat.services.nvidia import NvidiaTTSService
from pipecat.transcriptions.language import Language
tts = NvidiaTTSService(
api_key=os.getenv("NVIDIA_API_KEY"),
model_function_map={
"function_id": "877104f7-e885-42b9-8de8-f6e4c6303969",
"model_name": "magpie-tts-multilingual",
},
settings=NvidiaTTSService.Settings(
voice="Magpie-Multilingual.EN-US.Aria",
language=Language.EN_US,
quality=40,
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **gRPC-based**: NVIDIA Riva uses gRPC (not HTTP or WebSocket) for communication with the TTS service.
* **Model cannot be changed after initialization**: The model and function ID must be set during construction via `model_function_map`. Calling `set_model()` after initialization will log a warning and have no effect.
* **SSL enabled by default**: The service connects to NVIDIA's cloud endpoint with SSL. Set `use_ssl=False` only for local or custom Riva deployments.
* **Blocking gRPC calls**: Audio generation uses `asyncio.to_thread` to avoid blocking the event loop, since the underlying Riva client uses synchronous gRPC calls.
# OpenAI
Source: https://docs.pipecat.ai/api-reference/server/services/tts/openai
Text-to-speech service using OpenAI's TTS API
## Overview
`OpenAITTSService` provides high-quality text-to-speech synthesis using OpenAI's TTS API with multiple voice models including traditional TTS models and advanced GPT-based models. The service outputs 24kHz PCM audio with streaming capabilities for real-time applications.
Pipecat's API methods for OpenAI TTS integration
Complete example with voice customization
Official OpenAI TTS API documentation
Listen to available voice options
## Installation
To use OpenAI services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[openai]"
```
## Prerequisites
### OpenAI Account Setup
Before using OpenAI TTS services, you need:
1. **OpenAI Account**: Sign up at [OpenAI Platform](https://platform.openai.com/)
2. **API Key**: Generate an API key from your [API keys page](https://platform.openai.com/api-keys)
3. **Voice Selection**: Choose from available voice options (alloy, ash, ballad, cedar, coral, echo, fable, marin, nova, onyx, sage, shimmer, verse)
### Required Environment Variables
* `OPENAI_API_KEY`: Your OpenAI API key for authentication
## Configuration
### OpenAITTSService
OpenAI API key for authentication. If `None`, uses the `OPENAI_API_KEY`
environment variable.
Custom base URL for OpenAI API. If `None`, uses the default OpenAI endpoint.
Voice ID to use for synthesis. Options: `alloy`, `ash`, `ballad`, `cedar`,
`coral`, `echo`, `fable`, `marin`, `nova`, `onyx`, `sage`, `shimmer`, `verse`.
*Deprecated in v0.0.105. Use `settings=OpenAITTSService.Settings(...)` instead.*
TTS model to use.
*Deprecated in v0.0.105. Use `settings=OpenAITTSService.Settings(...)` instead.*
Output audio sample rate in Hz. If `None`, uses OpenAI's default 24kHz. OpenAI
TTS only supports 24kHz output.
Runtime-configurable voice and generation settings. See
[InputParams](#inputparams) below.
*Deprecated in v0.0.105. Use `settings=OpenAITTSService.Settings(...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `OpenAITTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| -------------- | ----------------- | ----------- | --------------------------------------------------------------------------- |
| `model` | `str` | `None` | TTS model identifier. *(Inherited from base settings.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited from base settings.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited from base settings.)* |
| `instructions` | `str` | `NOT_GIVEN` | Instructions to guide voice synthesis behavior (e.g. affect, tone, pacing). |
| `speed` | `float` | `NOT_GIVEN` | Voice speed control (0.25 to 4.0). |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.openai import OpenAITTSService
tts = OpenAITTSService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAITTSService.Settings(
voice="nova",
),
)
```
### With Voice Customization
```python theme={null}
from pipecat.services.openai import OpenAITTSService
tts = OpenAITTSService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAITTSService.Settings(
voice="coral",
model="gpt-4o-mini-tts",
instructions="Speak in a warm, friendly tone with moderate pacing.",
speed=1.1,
),
)
```
### Updating Settings at Runtime
Voice settings can be changed mid-conversation using `TTSUpdateSettingsFrame`:
```python theme={null}
from pipecat.frames.frames import TTSUpdateSettingsFrame
from pipecat.services.openai.tts import OpenAITTSSettings
await task.queue_frame(
TTSUpdateSettingsFrame(
delta=OpenAITTSSettings(
instructions="Now speak more formally.",
speed=0.9,
)
)
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Fixed sample rate**: OpenAI TTS always outputs audio at 24kHz. Using a different sample rate may cause issues.
* **Model selection**: The `gpt-4o-mini-tts` model supports the `instructions` parameter for controlling voice affect and tone, which traditional TTS models do not support.
* **HTTP-based service**: OpenAI TTS uses HTTP streaming, so it does not have WebSocket connection events.
# Piper
Source: https://docs.pipecat.ai/api-reference/server/services/tts/piper
Text-to-speech service implementation using the Piper TTS server
## Overview
`PiperTTSService` provides high-quality neural text-to-speech synthesis through a self-hosted HTTP server. The service offers complete privacy and control with no external API dependencies, making it ideal for on-premise deployments and applications requiring data sovereignty.
Pipecat's API methods for Piper TTS integration
Browse examples using Piper TTS
Official Piper TTS documentation and setup
Configure Piper HTTP server for Pipecat
## Installation
To use Piper services, no additional Pipecat dependencies are required:
```bash theme={null}
pip install "pipecat-ai" # Base installation is sufficient
```
## Prerequisites
### Piper Server Setup
Before using PiperTTSService, you need:
1. **Piper TTS Server**: Set up a running Piper TTS server following the [HTTP server documentation](https://github.com/OHF-Voice/piper1-gpl/blob/main/docs/API_HTTP.md)
2. **Voice Models**: Download and configure voice models for your target languages
3. **Server Configuration**: Configure server endpoint and voice selection
### Required Configuration
* **Server URL**: Configure the Piper server endpoint in your service initialization
* **Voice Models**: Ensure required voice models are available on the server
Piper runs entirely locally, providing complete privacy and eliminating API
key requirements.
## Configuration
Piper offers two service implementations: `PiperTTSService` for local inference and `PiperHttpTTSService` for HTTP server-based synthesis.
### PiperTTSService
Runs Piper locally, automatically downloading voice models as needed.
Piper voice model identifier (e.g. `en_US-ryan-high`). *Deprecated in
v0.0.105. Use `settings=PiperTTSService.Settings(voice=...)` instead.*
Runtime-configurable settings. See [PiperTTSService
Settings](#piperttsservice-settings) below.
Directory for storing voice model files. Defaults to the current working
directory.
Re-download the voice model even if it already exists locally.
Use CUDA for GPU-accelerated inference.
#### PiperTTSService Settings
Runtime-configurable settings passed via the `settings` constructor argument using `PiperTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
### PiperHttpTTSService
Connects to a running Piper HTTP TTS server.
Base URL for the Piper TTS HTTP server.
An aiohttp session for HTTP requests.
Piper voice model identifier (e.g. `en_US-ryan-high`). *Deprecated in
v0.0.105. Use `settings=PiperHttpTTSService.Settings(voice=...)` instead.*
Runtime-configurable settings. See [PiperHttpTTSService
Settings](#piperhttpttsservice-settings) below.
#### PiperHttpTTSService Settings
Runtime-configurable settings passed via the `settings` constructor argument using `PiperHttpTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
## Usage
### Local Inference
```python theme={null}
from pipecat.services.piper import PiperTTSService
tts = PiperTTSService(
settings=PiperTTSService.Settings(
voice="en_US-ryan-high",
),
)
```
### HTTP Server
Start the Piper HTTP server first:
```bash theme={null}
uv pip install "piper-tts[http]"
uv run python -m piper.http_server -m en_US-ryan-high
```
Then connect to it:
```python theme={null}
import aiohttp
from pipecat.services.piper import PiperHttpTTSService
async with aiohttp.ClientSession() as session:
tts = PiperHttpTTSService(
base_url="http://localhost:5000",
aiohttp_session=session,
settings=PiperHttpTTSService.Settings(
voice="en_US-ryan-high",
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Local execution**: `PiperTTSService` runs entirely locally with no network requests. Voice models are automatically downloaded on first use.
* **GPU acceleration**: Set `use_cuda=True` for GPU-accelerated inference with `PiperTTSService` (requires CUDA-compatible hardware).
* **Audio resampling**: Audio output is automatically resampled to match the pipeline's configured sample rate.
* **No API key required**: Piper is open-source and runs locally, so no API credentials are needed.
# Resemble AI
Source: https://docs.pipecat.ai/api-reference/server/services/tts/resembleai
Text-to-speech service using Resemble AI's WebSocket streaming API with word-level timing
## Overview
`ResembleAITTSService` provides high-quality text-to-speech synthesis using Resemble AI's streaming WebSocket API with word-level timestamps and audio context management for handling multiple simultaneous synthesis requests with proper interruption support.
Pipecat's API methods for Resemble AI TTS integration
Complete example with interruption handling
Official Resemble AI API documentation
Sign up for a Resemble AI account
## Installation
To use Resemble AI services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[resemble]"
```
## Prerequisites
### Resemble AI Account Setup
Before using Resemble AI TTS services, you need:
1. **Resemble AI Account**: Sign up at [Resemble AI](https://app.resemble.ai)
2. **API Key**: Generate an API key from your [account settings](https://app.resemble.ai/account/api)
3. **Voice Selection**: Choose or create voice UUIDs from your [voice library](https://app.resemble.ai/hub/voices)
### Required Environment Variables
* `RESEMBLE_API_KEY`: Your Resemble AI API key for authentication
## Configuration
### ResembleAITTSService
Resemble AI API key for authentication.
Voice UUID to use for synthesis. *Deprecated in v0.0.105. Use
`settings=ResembleAITTSService.Settings(voice=...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
WebSocket URL for Resemble AI TTS API.
PCM bit depth. Options: `PCM_32`, `PCM_24`, `PCM_16`, or `MULAW`.
Audio output format (`wav` or `mp3`).
Audio sample rate in Hz. Options: 8000, 16000, 22050, 32000, or 44100.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `ResembleAITTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.resembleai import ResembleAITTSService
tts = ResembleAITTSService(
api_key=os.getenv("RESEMBLE_API_KEY"),
settings=ResembleAITTSService.Settings(
voice="your-voice-uuid",
),
)
```
### With Custom Settings
```python theme={null}
from pipecat.services.resembleai import ResembleAITTSService
tts = ResembleAITTSService(
api_key=os.getenv("RESEMBLE_API_KEY"),
settings=ResembleAITTSService.Settings(
voice="your-voice-uuid",
),
sample_rate=16000,
precision="PCM_16",
output_format="wav",
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Word-level timestamps**: Resemble AI provides word-level timing information, enabling synchronized text highlighting and precise interruption handling.
* **Jitter buffering**: The service buffers approximately 1 second of audio before starting playback to absorb network latency gaps (Resemble AI sends audio in bursts with 300-450ms gaps).
* **Audio context management**: Supports multiple simultaneous synthesis requests with proper context tracking and interruption handling.
* **Default sample rate**: Defaults to 22050 Hz. Supported rates are 8000, 16000, 22050, 32000, and 44100 Hz.
## Event Handlers
Resemble AI TTS supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| --------------------- | --------------------------------------- |
| `on_connected` | Connected to Resemble AI WebSocket |
| `on_disconnected` | Disconnected from Resemble AI WebSocket |
| `on_connection_error` | WebSocket connection error occurred |
```python theme={null}
@tts.event_handler("on_connected")
async def on_connected(service):
print("Connected to Resemble AI")
```
# Rime
Source: https://docs.pipecat.ai/api-reference/server/services/tts/rime
Text-to-speech service implementations using Rime AI
## Overview
Rime AI provides two TTS service implementations: `RimeTTSService` (WebSocket-based) with word-level timing and interruption support, and `RimeHttpTTSService` (HTTP-based) for simpler use cases. `RimeTTSService` is recommended for real-time interactive applications.
Pipecat's API methods for Rime TTS integration
Complete example with word timestamps
Official Rime WebSocket and HTTP API documentation
Explore available voice models and features
## Installation
To use Rime services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[rime]"
```
## Prerequisites
### Rime Account Setup
Before using Rime TTS services, you need:
1. **Rime Account**: Sign up at [Rime AI](https://docs.rime.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Voice Selection**: Choose from available voice models
### Required Environment Variables
* `RIME_API_KEY`: Your Rime API key for authentication
## Configuration
### RimeTTSService
Rime API key for authentication.
ID of the voice to use for synthesis. *Deprecated in v0.0.105. Use
`settings=RimeTTSService.Settings(voice=...)` instead.*
Rime WebSocket API endpoint.
Model ID to use for synthesis. *Deprecated in v0.0.105. Use
`settings=RimeTTSService.Settings(model=...)` instead.*
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate.
Controls how incoming text is aggregated before synthesis. `SENTENCE`
(default) buffers text until sentence boundaries, producing more natural
speech. `TOKEN` streams tokens directly for lower latency. Import from
`pipecat.services.tts_service`.
*Deprecated in v0.0.104.* Use `text_aggregation_mode` instead.
*Deprecated in v0.0.105. Use `settings=RimeTTSService.Settings(...)` instead.*
Runtime-configurable settings. See [RimeTTSService
Settings](#rimettsservice-settings) below.
### RimeHttpTTSService
Rime API key for authentication.
ID of the voice to use for synthesis. *Deprecated in v0.0.105. Use
`settings=RimeHttpTTSService.Settings(voice=...)` instead.*
An aiohttp session for HTTP requests.
Model ID to use for synthesis. *Deprecated in v0.0.105. Use
`settings=RimeHttpTTSService.Settings(model=...)` instead.*
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate.
*Deprecated in v0.0.105. Use `settings=RimeHttpTTSService.Settings(...)`
instead.*
Runtime-configurable settings. See [RimeTTSService
Settings](#rimettsservice-settings) below.
### RimeNonJsonTTSService
A non-JSON WebSocket service for models like Arcana that use plain text messages.
Rime API key for authentication.
ID of the voice to use for synthesis. *Deprecated in v0.0.105. Use
`settings=RimeNonJsonTTSService.Settings(voice=...)` instead.*
Rime WebSocket API endpoint.
Model ID to use for synthesis. *Deprecated in v0.0.105. Use
`settings=RimeNonJsonTTSService.Settings(model=...)` instead.*
Audio output format.
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate.
Controls how incoming text is aggregated before synthesis. `SENTENCE`
(default) buffers text until sentence boundaries. `TOKEN` streams tokens
directly for lower latency. Import from `pipecat.services.tts_service`.
*Deprecated in v0.0.104.* Use `text_aggregation_mode` instead.
*Deprecated in v0.0.105. Use `settings=RimeNonJsonTTSService.Settings(...)`
instead.*
Runtime-configurable settings. See [RimeNonJsonTTSService
Settings](#rimenonjsonttsservice-settings) below.
#### RimeTTSService Settings
Runtime-configurable settings passed via the `settings` constructor argument using `RimeTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| -------------------------- | ----------------- | ----------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `segment` | `str` | `NOT_GIVEN` | Segment type for synthesis. |
| `speedAlpha` | `float` | `NOT_GIVEN` | Speed alpha parameter. |
| `reduceLatency` | `bool` | `NOT_GIVEN` | Whether to reduce latency. |
| `pauseBetweenBrackets` | `bool` | `NOT_GIVEN` | Pause between brackets. |
| `phonemizeBetweenBrackets` | `bool` | `NOT_GIVEN` | Phonemize between brackets. |
| `noTextNormalization` | `bool` | `NOT_GIVEN` | Disable text normalization. |
| `saveOovs` | `bool` | `NOT_GIVEN` | Save out-of-vocabulary words. |
| `inlineSpeedAlpha` | `str` | `NOT_GIVEN` | Inline speed alpha. |
| `repetition_penalty` | `float` | `NOT_GIVEN` | Repetition penalty. |
| `temperature` | `float` | `NOT_GIVEN` | Temperature for sampling. |
| `top_p` | `float` | `NOT_GIVEN` | Top-p sampling parameter. |
#### RimeNonJsonTTSService Settings
Runtime-configurable settings passed via the `settings` constructor argument using `RimeNonJsonTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| -------------------- | ----------------- | ----------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `segment` | `str` | `NOT_GIVEN` | Segment type for synthesis. |
| `repetition_penalty` | `float` | `NOT_GIVEN` | Repetition penalty. |
| `temperature` | `float` | `NOT_GIVEN` | Temperature for sampling. |
| `top_p` | `float` | `NOT_GIVEN` | Top-p sampling parameter. |
## Usage
### Basic Setup (WebSocket)
```python theme={null}
from pipecat.services.rime import RimeTTSService
tts = RimeTTSService(
api_key=os.getenv("RIME_API_KEY"),
settings=RimeTTSService.Settings(
voice="cove",
),
)
```
### With Customization (WebSocket)
```python theme={null}
from pipecat.services.rime import RimeTTSService
from pipecat.transcriptions.language import Language
tts = RimeTTSService(
api_key=os.getenv("RIME_API_KEY"),
settings=RimeTTSService.Settings(
voice="cove",
model="mistv2",
language=Language.ES,
speedAlpha=1.2,
reduceLatency=True,
),
)
```
### HTTP Service
```python theme={null}
import aiohttp
from pipecat.services.rime import RimeHttpTTSService
async with aiohttp.ClientSession() as session:
tts = RimeHttpTTSService(
api_key=os.getenv("RIME_API_KEY"),
settings=RimeHttpTTSService.Settings(
voice="cove",
),
aiohttp_session=session,
)
```
### Non-JSON WebSocket (Arcana)
```python theme={null}
from pipecat.services.rime import RimeNonJsonTTSService
tts = RimeNonJsonTTSService(
api_key=os.getenv("RIME_API_KEY"),
settings=RimeNonJsonTTSService.Settings(
voice="cove",
model="arcana",
),
)
```
## Customizing Speech
`RimeTTSService` provides a set of helper methods for implementing Rime-specific customizations, meant to be used as part of text transformers. These include methods for spelling out text, adjusting speech rate, and modifying pitch. See the [Text Transformers for TTS](/pipecat/learn/text-to-speech#text-transformers-for-tts) section in the Text-to-Speech guide for usage examples.
### SPELL(text: str) -> str:
Implements [Rime's spell function](https://docs.rime.ai/api-reference/spell) to spell out text character by character.
```python theme={null}
# Text transformers for TTS
# This will insert Rime's spell tags around the provided text.
async def spell_out_text(text: str, type: str) -> str:
return RimeTTSService.SPELL(text)
tts = RimeTTSService(
api_key=os.getenv("RIME_API_KEY"),
text_transforms=[
("phone_number", spell_out_text),
],
)
```
### PAUSE\_TAG(seconds: float) -> str:
Implements [Rime's custom pause functionality](https://docs.rime.ai/api-reference/custom-pauses) to generate a properly formatted pause tag you can insert into the text.
```python theme={null}
# Text transformers for TTS
# This will insert a one second pause after questions.
async def pause_after_questions(text: str, type: str) -> str:
if text.endswith("?"):
return f"{text}{RimeTTSService.PAUSE_TAG(1.0)}"
return text
tts = RimeTTSService(
api_key=os.getenv("RIME_API_KEY"),
text_transforms=[
("sentence", pause_after_questions), # Only apply to sentence aggregations
],
)
```
### PRONOUNCE(self, text: str, word: str, phoneme: str) -> str:
Convenience method to support Rime's [custom pronunciations feature](https://docs.rime.ai/api-reference/custom-pronunciation). It takes a word and its desired phoneme representation, returning the text with the provided word replaced by the appropriate phoneme tag.
```python theme={null}
# Text transformers for TTS
# This will a phoneme in place of the word "potato" to define how it
# should be pronounced.
async def maybe_say_potato_all_fancylike(text: str, type: str) -> str:
if using_fancy_voice:
return RimeTTSService.PRONOUNCE(text, "potato", "potato")
else:
return RimeTTSService.PRONOUNCE(text, "potato", "poteto")
tts = RimeTTSService(
api_key=os.getenv("RIME_API_KEY"),
text_transforms=[
("*", maybe_say_potato_all_fancylike), # Apply to all text
],
)
```
### INLINE\_SPEED(self, text: str, speed: float) -> str:
A convenience method to support Rime's [inline speed adjustment feature](https://docs.rime.ai/api-reference/speed). It will wrap the provided text in the `[]` tags and add the provided speed to the `inlineSpeedAlpha` field in the request metadata.
```python theme={null}
# Text transformers for TTS
# This will make the word slow always be spoken more slowly.
async def slow_down_slow_words(text: str, type: str) -> str:
return text.replace(
"slow",
RimeTTSService.INLINE_SPEED("slow", speed=0.5)
)
tts = RimeTTSService(
api_key=os.getenv("RIME_API_KEY"),
text_transforms=[
("*", slow_down_slow_words), # Apply to all text
],
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Word-level timestamps**: `RimeTTSService` provides word-level timing information, enabling synchronized text highlighting.
* **WebSocket vs HTTP**: The WebSocket service supports word-level timestamps, interruption handling, and maintains context across messages within a turn. The HTTP service is simpler but lacks these features.
* **Non-JSON WebSocket**: `RimeNonJsonTTSService` is for models like Arcana that use plain text messages instead of JSON. It does not support word-level timestamps.
## Event Handlers
Rime WebSocket TTS services support the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| --------------------- | ----------------------------------- |
| `on_connected` | Connected to Rime WebSocket |
| `on_disconnected` | Disconnected from Rime WebSocket |
| `on_connection_error` | WebSocket connection error occurred |
```python theme={null}
@tts.event_handler("on_connected")
async def on_connected(service):
print("Connected to Rime")
```
# Sarvam AI
Source: https://docs.pipecat.ai/api-reference/server/services/tts/sarvam
Text-to-speech service implementation using Sarvam AI's TTS API
## Overview
`SarvamTTSService` provides text-to-speech synthesis specialized for Indian languages and voices. The service offers extensive voice customization options including pitch, pace, and loudness control, with support for multiple Indian languages and preprocessing for mixed-language content. The `bulbul:v3-beta` model adds temperature control and 25 new speaker voices.
Pipecat's API methods for Sarvam AI TTS integration
Complete example with Indian language support
Official Sarvam AI text-to-speech API documentation
Access Indian language voices and API keys
## Installation
To use Sarvam AI services, no additional dependencies are required beyond the base installation:
```bash theme={null}
pip install "pipecat-ai"
```
## Prerequisites
### Sarvam AI Account Setup
Before using Sarvam AI TTS services, you need:
1. **Sarvam AI Account**: Sign up at [Sarvam AI Console](https://www.sarvam.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Language Selection**: Choose from available Indian language voices
### Required Environment Variables
* `SARVAM_API_KEY`: Your Sarvam AI API key for authentication
## Configuration
Sarvam offers two service implementations: `SarvamTTSService` (WebSocket) for real-time streaming and `SarvamHttpTTSService` (HTTP) for simpler batch synthesis.
### SarvamTTSService
Sarvam AI API subscription key.
TTS model to use. Options: `bulbul:v2`, `bulbul:v3-beta`, `bulbul:v3`.
*Deprecated in v0.0.105. Use `settings=SarvamTTSService.Settings(model=...)`
instead.*
Speaker voice ID. If `None`, uses the model-appropriate default (`anushka` for
v2, `shubh` for v3). *Deprecated in v0.0.105. Use
`settings=SarvamTTSService.Settings(voice=...)` instead.*
WebSocket URL for the TTS backend.
Controls how incoming text is aggregated before synthesis. `SENTENCE`
(default) buffers text until sentence boundaries, producing more natural
speech. `TOKEN` streams tokens directly for lower latency. Import from
`pipecat.services.tts_service`.
*Deprecated in v0.0.104.* Use `text_aggregation_mode` instead.
Audio sample rate in Hz (8000, 16000, 22050, 24000). If `None`, uses
model-specific default (22050 for v2, 24000 for v3).
*Deprecated in v0.0.105. Use `settings=SarvamTTSService.Settings(...)`
instead.*
Runtime-configurable settings. See [SarvamTTSService
Settings](#sarvamttsservice-settings) below.
### SarvamHttpTTSService
Sarvam AI API subscription key.
An aiohttp session for HTTP requests.
TTS model to use. Options: `bulbul:v2`, `bulbul:v3-beta`, `bulbul:v3`.
*Deprecated in v0.0.105. Use
`settings=SarvamHttpTTSService.Settings(model=...)` instead.*
Speaker voice ID. If `None`, uses the model-appropriate default. *Deprecated
in v0.0.105. Use `settings=SarvamHttpTTSService.Settings(voice=...)` instead.*
Sarvam AI API base URL.
Audio sample rate in Hz (8000, 16000, 22050, 24000). If `None`, uses
model-specific default.
*Deprecated in v0.0.105. Use `settings=SarvamHttpTTSService.Settings(...)`
instead.*
Runtime-configurable settings. See [SarvamHttpTTSService
Settings](#sarvamhttpttsservice-settings) below.
#### SarvamTTSService Settings
Runtime-configurable settings passed via the `settings` constructor argument using `SarvamTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------------------- | ----------------- | ----------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `enable_preprocessing` | `bool` | `NOT_GIVEN` | Enable text preprocessing. |
| `pace` | `float` | `NOT_GIVEN` | Pace of speech. |
| `pitch` | `float` | `NOT_GIVEN` | Pitch of speech. |
| `loudness` | `float` | `NOT_GIVEN` | Loudness of speech. |
| `temperature` | `float` | `NOT_GIVEN` | Temperature for speech synthesis. |
| `min_buffer_size` | `int` | `NOT_GIVEN` | Minimum buffer size for WebSocket. |
| `max_chunk_length` | `int` | `NOT_GIVEN` | Maximum chunk length for WebSocket. |
#### SarvamHttpTTSService Settings
Runtime-configurable settings passed via the `settings` constructor argument using `SarvamHttpTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------------------- | ----------------- | ----------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `enable_preprocessing` | `bool` | `NOT_GIVEN` | Enable text preprocessing. |
| `pace` | `float` | `NOT_GIVEN` | Pace of speech. |
| `pitch` | `float` | `NOT_GIVEN` | Pitch of speech. |
| `loudness` | `float` | `NOT_GIVEN` | Loudness of speech. |
| `temperature` | `float` | `NOT_GIVEN` | Temperature for speech synthesis. |
## Usage
### Basic Setup (WebSocket)
```python theme={null}
from pipecat.services.sarvam import SarvamTTSService
from pipecat.transcriptions.language import Language
tts = SarvamTTSService(
api_key=os.getenv("SARVAM_API_KEY"),
settings=SarvamTTSService.Settings(
voice="anushka",
model="bulbul:v2",
language=Language.HI,
),
)
```
### With v3 Model and Temperature Control
```python theme={null}
from pipecat.services.sarvam import SarvamTTSService
from pipecat.transcriptions.language import Language
tts = SarvamTTSService(
api_key=os.getenv("SARVAM_API_KEY"),
settings=SarvamTTSService.Settings(
voice="aditya",
model="bulbul:v3-beta",
language=Language.HI,
pace=1.2,
temperature=0.8,
),
)
```
### HTTP Service
```python theme={null}
import aiohttp
from pipecat.services.sarvam import SarvamHttpTTSService
from pipecat.transcriptions.language import Language
async with aiohttp.ClientSession() as session:
tts = SarvamHttpTTSService(
api_key=os.getenv("SARVAM_API_KEY"),
aiohttp_session=session,
settings=SarvamHttpTTSService.Settings(
voice="anushka",
model="bulbul:v2",
language=Language.HI,
pitch=0.1,
pace=1.2,
loudness=1.5,
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Model differences**: `bulbul:v2` supports pitch and loudness control; `bulbul:v3-beta` and `bulbul:v3` add temperature control but do not support pitch or loudness. Setting unsupported parameters for a model will log a warning.
* **Default speakers vary by model**: v2 defaults to `anushka`; v3 models default to `shubh`.
* **Default sample rates vary by model**: v2 defaults to 22050 Hz; v3 models default to 24000 Hz.
* **Indian language focus**: Sarvam AI specializes in Indian languages, supporting Bengali, English (India), Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, and Telugu.
* **Pace ranges differ**: `bulbul:v2` supports pace from 0.3 to 3.0, while v3 models support 0.5 to 2.0. Values outside the range are clamped automatically.
## Event Handlers
Sarvam WebSocket TTS supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| --------------------- | ----------------------------------- |
| `on_connected` | Connected to Sarvam WebSocket |
| `on_disconnected` | Disconnected from Sarvam WebSocket |
| `on_connection_error` | WebSocket connection error occurred |
```python theme={null}
@tts.event_handler("on_connected")
async def on_connected(service):
print("Connected to Sarvam")
```
# Smallest AI
Source: https://docs.pipecat.ai/api-reference/server/services/tts/smallest
Text-to-speech service using Smallest AI's WebSocket streaming API
## Overview
Smallest AI provides real-time text-to-speech synthesis through a WebSocket-based integration with their Waves API. The service supports configurable voice parameters, multiple languages, and handles interruptions by reconnecting the WebSocket.
Complete API reference for all parameters and methods
Complete example with WebSocket streaming
## Installation
```bash theme={null}
pip install "pipecat-ai[smallest]"
```
## Prerequisites
1. **Smallest AI Account**: Sign up at [Smallest AI](https://www.smallest.ai/)
2. **API Key**: Generate an API key from your account dashboard
Set the following environment variable:
```bash theme={null}
export SMALLEST_API_KEY=your_api_key
```
## Configuration
Smallest AI API key for authentication.
Base WebSocket URL for the Smallest API. Override for custom or proxied
deployments.
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate.
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `SmallestTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ------------- | ----------------- | ---------------- | ------------------------------------------------------------------------------------------------------------------------- |
| `model` | `str` | `lightning-v3.1` | Model identifier: `lightning-v2` or `lightning-v3.1`. Model changes require WebSocket reconnection. *(Init-only setting)* |
| `voice` | `str` | `sophia` | Voice identifier. |
| `language` | `Language \| str` | `Language.EN` | Language code for synthesis. |
| `speed` | `float` | `None` | Speech speed multiplier. When `None`, uses API defaults. |
| `consistency` | `float` | `None` | Consistency level for voice generation (0-1). Only supported by `lightning-v2`. When `None`, uses API defaults. |
| `similarity` | `float` | `None` | Similarity level for voice generation (0-1). Only supported by `lightning-v2`. When `None`, uses API defaults. |
| `enhancement` | `int` | `None` | Enhancement level for voice generation (0-2). Only supported by `lightning-v2`. When `None`, uses API defaults. |
`None` values use the Smallest AI API defaults. The `consistency`,
`similarity`, and `enhancement` parameters are only supported by the
`lightning-v2` model.
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.smallest import SmallestTTSService
tts = SmallestTTSService(
api_key=os.getenv("SMALLEST_API_KEY"),
settings=SmallestTTSService.Settings(
voice="sophia",
),
)
```
### With Voice Customization
```python theme={null}
from pipecat.transcriptions.language import Language
tts = SmallestTTSService(
api_key=os.getenv("SMALLEST_API_KEY"),
settings=SmallestTTSService.Settings(
voice="sophia",
language=Language.ES,
speed=1.2,
),
)
```
### Using Lightning V2 Model
```python theme={null}
from pipecat.services.smallest.tts import SmallestTTSModel
tts = SmallestTTSService(
api_key=os.getenv("SMALLEST_API_KEY"),
settings=SmallestTTSService.Settings(
model=SmallestTTSModel.LIGHTNING_V2,
voice="sophia",
consistency=0.7,
similarity=0.8,
enhancement=1,
),
)
```
### Updating Settings at Runtime
Voice settings can be changed mid-conversation using `TTSUpdateSettingsFrame`:
```python theme={null}
from pipecat.frames.frames import TTSUpdateSettingsFrame
from pipecat.services.smallest.tts import SmallestTTSSettings
await task.queue_frame(
TTSUpdateSettingsFrame(
delta=SmallestTTSSettings(
voice="new_voice",
speed=1.1,
)
)
)
```
Changing the `model` setting will trigger a WebSocket reconnection, which may
cause a brief interruption in service.
## Notes
* **WebSocket streaming**: The service uses WebSocket connections for real-time streaming. The connection is automatically managed and will reconnect if interrupted.
* **Keepalive**: The service sends periodic keepalive messages (every 30 seconds) to prevent idle timeouts on the WebSocket connection.
* **Model-specific parameters**: The `consistency`, `similarity`, and `enhancement` parameters are only effective when using the `lightning-v2` model. They are ignored by `lightning-v3.1`.
* **Language support**: Supports multiple languages including Arabic, Bengali, German, English, Spanish, French, Gujarati, Hebrew, Hindi, Italian, Kannada, Marathi, Dutch, Polish, Russian, and Tamil.
## Event Handlers
Smallest AI TTS supports the standard [service connection events](/api-reference/server/events/service-events):
| Event | Description |
| --------------------- | ------------------------------------ |
| `on_connected` | Connected to Smallest AI WebSocket |
| `on_disconnected` | Disconnected from Smallest WebSocket |
| `on_connection_error` | WebSocket connection error occurred |
```python theme={null}
@tts.event_handler("on_connected")
async def on_connected(service):
print("Connected to Smallest AI")
```
# Speechmatics
Source: https://docs.pipecat.ai/api-reference/server/services/tts/speechmatics
Text-to-speech service using Speechmatics TTS API
## Overview
`SpeechmaticsTTSService` provides production-grade, low-latency synthesis optimized for telephony and voice agents. By streaming 16kHz mono audio, it ensures bandwidth efficiency and prioritizes pronunciation accuracy for natural, uninterrupted conversations at scale.
Pipecat's API methods for Speechmatics TTS integration
Complete example with Speechmatics TTS
Official Speechmatics TTS API documentation
Browse and test available voices
## Installation
To use Speechmatics services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[speechmatics]"
```
## Prerequisites
### Speechmatics Account Setup
Before using Speechmatics TTS services, you need:
1. **Speechmatics Account**: Sign up at [Speechmatics Portal](https://portal.speechmatics.com)
2. **API Key**: Generate an API key from your dashboard
3. **Voice Selection**: Choose from available [voices](https://docs.speechmatics.com/text-to-speech/quickstart#voices)
### Required Environment Variables
* `SPEECHMATICS_API_KEY`: Your Speechmatics API key for authentication
## Configuration
### SpeechmaticsTTSService
Speechmatics API key for authentication.
Base URL for Speechmatics TTS API.
Voice model to use for synthesis.
*Deprecated in v0.0.105. Use `settings=SpeechmaticsTTSService.Settings(...)` instead.*
An aiohttp session for HTTP requests.
Audio sample rate in Hz. Speechmatics TTS only supports 16kHz.
Runtime-configurable service settings. See [InputParams](#inputparams) below.
*Deprecated in v0.0.105. Use `settings=SpeechmaticsTTSService.Settings(...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `SpeechmaticsTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ------------- | ----------------- | ----------- | ---------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
| `max_retries` | `int` | `NOT_GIVEN` | Maximum number of retries for synthesis. |
## Usage
### Basic Setup
```python theme={null}
import aiohttp
from pipecat.services.speechmatics import SpeechmaticsTTSService
async with aiohttp.ClientSession() as session:
tts = SpeechmaticsTTSService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
settings=SpeechmaticsTTSService.Settings(
voice="sarah",
),
aiohttp_session=session,
)
```
### With Custom Settings
```python theme={null}
import aiohttp
from pipecat.services.speechmatics import SpeechmaticsTTSService
async with aiohttp.ClientSession() as session:
tts = SpeechmaticsTTSService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
aiohttp_session=session,
settings=SpeechmaticsTTSService.Settings(
max_retries=3,
voice="sarah",
),
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Fixed sample rate**: Speechmatics TTS only supports 16kHz output. Using a different sample rate may cause issues.
* **Automatic retry with backoff**: The service automatically retries on 503 (service unavailable) responses using exponential backoff, up to `max_retries` attempts.
* **HTTP-based service**: Speechmatics TTS uses HTTP streaming, so it does not have WebSocket connection events.
* **Requires aiohttp session**: You must create and manage an `aiohttp.ClientSession` yourself and pass it to the constructor.
# xAI
Source: https://docs.pipecat.ai/api-reference/server/services/tts/xai
Text-to-speech service using xAI's HTTP API with support for 20 languages
## Overview
xAI provides text-to-speech synthesis via an HTTP API with support for multiple languages and audio encoding formats.
Complete API reference for all parameters and methods
Complete example with interruption handling
Official xAI TTS API documentation
## Installation
```bash theme={null}
pip install "pipecat-ai[xai]"
```
## Prerequisites
1. **xAI Account**: Sign up at [xAI](https://x.ai/)
2. **API Key**: Generate an API key from your account dashboard (also works with Grok API keys)
Set the following environment variable:
```bash theme={null}
export GROK_API_KEY=your_api_key
```
## Configuration
### XAIHttpTTSService
xAI API key for authentication.
xAI TTS endpoint URL. Override for custom or proxied deployments.
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate.
Output audio encoding format. Supported formats: `"pcm"`, `"mp3"`, `"wav"`,
`"mulaw"`, `"alaw"`.
Optional shared aiohttp session for HTTP requests. If `None`, the service
creates and manages its own session.
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `XAIHttpTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ------------- | --------------------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited from base settings.)* |
| `voice` | `str` | `"eve"` | Voice identifier. *(Inherited from base settings.)* |
| `language` | `Language \| str` | `Language.EN` | Language code. *(Inherited from base settings.)* |
## Supported Languages
xAI TTS supports 20 languages. Use the `Language` enum from `pipecat.transcriptions.language`:
* Arabic (Egyptian, Saudi, UAE): `Language.AR`, `Language.AR_EG`, `Language.AR_SA`, `Language.AR_AE`
* Bengali: `Language.BN`
* Chinese: `Language.ZH`
* English: `Language.EN`
* French: `Language.FR`
* German: `Language.DE`
* Hindi: `Language.HI`
* Indonesian: `Language.ID`
* Italian: `Language.IT`
* Japanese: `Language.JA`
* Korean: `Language.KO`
* Portuguese (Brazil, Portugal): `Language.PT`, `Language.PT_BR`, `Language.PT_PT`
* Russian: `Language.RU`
* Spanish (Spain, Mexico): `Language.ES`, `Language.ES_ES`, `Language.ES_MX`
* Turkish: `Language.TR`
* Vietnamese: `Language.VI`
## Usage
### Basic Setup
```python theme={null}
import os
from pipecat.services.xai import XAIHttpTTSService
tts = XAIHttpTTSService(
api_key=os.getenv("GROK_API_KEY"),
settings=XAIHttpTTSService.Settings(
voice="eve",
),
)
```
### With Custom Language
```python theme={null}
from pipecat.transcriptions.language import Language
tts = XAIHttpTTSService(
api_key=os.getenv("GROK_API_KEY"),
settings=XAIHttpTTSService.Settings(
voice="eve",
language=Language.ES,
),
)
```
### With Custom Encoding
```python theme={null}
tts = XAIHttpTTSService(
api_key=os.getenv("GROK_API_KEY"),
encoding="mp3",
settings=XAIHttpTTSService.Settings(
voice="eve",
),
)
```
### With Shared HTTP Session
```python theme={null}
import aiohttp
async with aiohttp.ClientSession() as session:
tts = XAIHttpTTSService(
api_key=os.getenv("GROK_API_KEY"),
aiohttp_session=session,
settings=XAIHttpTTSService.Settings(
voice="eve",
),
)
```
### Updating Settings at Runtime
Voice settings can be changed mid-conversation using `TTSUpdateSettingsFrame`:
```python theme={null}
from pipecat.frames.frames import TTSUpdateSettingsFrame
from pipecat.services.xai.tts import XAITTSSettings
from pipecat.transcriptions.language import Language
await task.queue_frame(
TTSUpdateSettingsFrame(
delta=XAITTSSettings(
language=Language.FR,
)
)
)
```
## Notes
* **HTTP-only**: This service uses xAI's HTTP API. The service requests raw PCM audio by default, which matches Pipecat's downstream expectations without extra decoding.
* **Encoding options**: When using non-PCM encodings (`mp3`, `wav`, `mulaw`, `alaw`), ensure your audio pipeline can handle the selected format.
* **Automatic session management**: If you don't provide an `aiohttp_session`, the service creates and manages its own session lifecycle automatically.
# XTTS
Source: https://docs.pipecat.ai/api-reference/server/services/tts/xtts
Text-to-speech service implementation using Coqui's XTTS streaming server
Coqui, the XTTS maintainer, has shut down. XTTS may not receive future updates
or support.
## Overview
`XTTSTTSService` provides multilingual voice synthesis with voice cloning capabilities through a locally hosted streaming server. The service supports real-time streaming and custom voice training using Coqui's XTTS-v2 model for cross-lingual text-to-speech.
Pipecat's API methods for XTTS integration
Complete example with voice cloning
Official XTTS streaming server repository
Learn about custom voice training
## Installation
XTTS requires a running streaming server. Start the server using Docker:
```bash theme={null}
docker run --gpus=all -e COQUI_TOS_AGREED=1 --rm -p 8000:80 \
ghcr.io/coqui-ai/xtts-streaming-server:latest-cuda121
```
## Prerequisites
### XTTS Server Setup
Before using XTTSTTSService, you need:
1. **Docker Environment**: Set up Docker with GPU support for optimal performance
2. **XTTS Server**: Run the XTTS streaming server container
3. **Voice Models**: Configure voice models and cloning samples as needed
### Required Configuration
* **Server URL**: Configure the XTTS server endpoint (default: `http://localhost:8000`)
* **Voice Selection**: Set up voice models or voice cloning samples
GPU acceleration is recommended for optimal performance. The server requires
CUDA support for best results.
## Configuration
### XTTSService
ID of the studio speaker to use for synthesis. *Deprecated in v0.0.105. Use
`settings=XTTSService.Settings(voice=...)` instead.*
Base URL of the XTTS streaming server (e.g. `http://localhost:8000`).
An aiohttp session for HTTP requests to the XTTS server.
Language for synthesis. Supports Czech, German, English, Spanish, French,
Hindi, Hungarian, Italian, Japanese, Korean, Dutch, Polish, Portuguese,
Russian, Turkish, and Chinese. *Deprecated in v0.0.106. Use
`settings=XTTSService.Settings(language=...)` instead.*
Runtime-configurable settings. See [Settings](#settings) below.
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate. Audio is automatically resampled from XTTS's native 24kHz output.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `XTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| ---------- | ----------------- | ------- | -------------------------------------- |
| `model` | `str` | `None` | Model identifier. *(Inherited.)* |
| `voice` | `str` | `None` | Voice identifier. *(Inherited.)* |
| `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* |
## Usage
### Basic Setup
```python theme={null}
import aiohttp
from pipecat.services.xtts import XTTSService
async with aiohttp.ClientSession() as session:
tts = XTTSService(
settings=XTTSService.Settings(
voice="Ana Florence",
),
base_url="http://localhost:8000",
aiohttp_session=session,
)
```
### With Language Configuration
```python theme={null}
import aiohttp
from pipecat.services.xtts import XTTSService
from pipecat.transcriptions.language import Language
async with aiohttp.ClientSession() as session:
tts = XTTSService(
settings=XTTSService.Settings(
voice="Ana Florence",
language=Language.ES,
),
base_url="http://localhost:8000",
aiohttp_session=session,
)
```
The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
`Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **Local server required**: XTTS requires a locally running streaming server (via Docker). The service connects to this server over HTTP.
* **Studio speakers**: On startup, the service fetches available "studio speakers" from the server's `/studio_speakers` endpoint. The `voice_id` must match one of these speakers.
* **Audio resampling**: XTTS natively outputs audio at 24kHz. The service automatically resamples to match the pipeline's configured sample rate.
* **GPU recommended**: The XTTS server performs best with CUDA-enabled GPU acceleration. CPU inference is significantly slower.
* **No API key required**: XTTS runs locally, so no external API credentials are needed.
# HeyGen
Source: https://docs.pipecat.ai/api-reference/server/services/video/heygen
AI avatar video generation service for creating interactive conversational avatars
## Overview
`HeyGenVideoService` integrates with HeyGen [LiveAvatar](https://www.liveavatar.com/) to create interactive AI-powered video avatars that respond naturally in real-time conversations. The service handles bidirectional audio/video streaming, avatar animations, voice activity detection, and conversation interruptions to deliver engaging conversational AI experiences with lifelike visual presence.
Pipecat's API methods for HeyGen video integration
Complete example with interactive avatar
Official HeyGen API documentation and guides
Access interactive avatars and API keys
## Installation
To use HeyGen services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[heygen]"
```
## Prerequisites
### HeyGen Account Setup
Before using HeyGen video services, you need:
1. **HeyGen Account**: Sign up at [HeyGen Platform](https://app.liveavatar.com/signin)
2. **API Key**: Generate an API key from your account dashboard
3. **Avatar Selection**: Choose from available interactive avatars
4. **Streaming Setup**: Configure real-time avatar streaming capabilities
### Required Environment Variables
* `HEYGEN_LIVE_AVATAR_API_KEY`: Your HeyGen LiveAvatar API key for authentication
## Configuration
HeyGen API key for authentication.
HTTP client session for API requests.
Configuration for the HeyGen session. When `None`, defaults to using the
`"Shawn_Therapist_public"` avatar.
Service type for the avatar session.
Runtime-configurable settings. HeyGen has no model-level settings, so this is
primarily used for the `extra` dict. See [Service
Settings](/pipecat/fundamentals/service-settings) for details.
## Usage
### Basic Setup
```python theme={null}
import aiohttp
from pipecat.services.heygen import HeyGenVideoService
async with aiohttp.ClientSession() as session:
heygen = HeyGenVideoService(
api_key=os.getenv("HEYGEN_LIVE_AVATAR_API_KEY"),
session=session,
)
```
### With Custom Session Request
```python theme={null}
from pipecat.services.heygen.api_liveavatar import LiveAvatarNewSessionRequest
heygen = HeyGenVideoService(
api_key=os.getenv("HEYGEN_LIVE_AVATAR_API_KEY"),
session=session,
session_request=LiveAvatarNewSessionRequest(
avatar_id="your_avatar_id",
version="v2",
),
)
```
## Notes
* **Bidirectional streaming**: The service manages both sending audio to HeyGen and receiving avatar video/audio back through WebRTC.
* **Interruption handling**: When a user starts speaking, the service interrupts the avatar's current speech, cancels ongoing audio tasks, and activates the avatar's listening animation.
* **Metrics support**: The service supports TTFB metrics tracking using `TTSStartedFrame` and `BotStartedSpeakingFrame` signals.
# Simli
Source: https://docs.pipecat.ai/api-reference/server/services/video/simli
Real-time AI avatar video generation service using WebRTC streaming
## Overview
`SimliVideoService` integrates with Simli to create real-time AI avatar video experiences using WebRTC streaming. The service processes audio input to generate synchronized avatar video and audio output, handling real-time streaming, audio resampling, and conversation interruptions for engaging conversational AI applications.
Pipecat's API methods for Simli video integration
Complete example with avatar streaming
Official Simli API documentation and guides
Access avatar faces and manage API keys
## Installation
To use Simli services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[simli]"
```
## Prerequisites
### Simli Account Setup
Before using Simli video services, you need:
1. **Simli Account**: Sign up at [Simli Platform](https://www.simli.com/)
2. **API Key**: Generate an API key from your account dashboard
3. **Face Selection**: Choose or create avatar faces for video generation
4. **WebRTC Setup**: Configure real-time streaming capabilities
### Required Environment Variables
* `SIMLI_API_KEY`: Your Simli API key for authentication
* `SIMLI_FACE_ID`: ID of your avatar face
## Configuration
Simli API key for authentication.
Simli Face ID. For Trinity avatars, specify `"faceId/emotionId"` to use a
different emotion than the default.
URL of the Simli servers. Can be changed for custom deployments by enterprise
users.
Whether this is a Trinity avatar, which reduces latency when using Trinity.
Absolute maximum session duration in seconds. Avatar disconnects after this
time even if speaking.
Maximum duration in seconds the avatar can be idle (not speaking) before
disconnecting.
Whether to enable Simli logging.
Service settings. Use `SimliVideoService.Settings(...)` for configuration.
**Deprecated since 0.0.106**: Use the direct constructor parameters
(`max_session_length`, `max_idle_time`, `enable_logging`) or
`settings=SimliVideoService.Settings(...)` instead.
Additional input parameters for session configuration. See
[InputParams](#inputparams) below.
### InputParams
**Deprecated since 0.0.106**: Use the direct constructor parameters
(`max_session_length`, `max_idle_time`, `enable_logging`) instead of
`SimliVideoService.InputParams`.
| Parameter | Type | Default | Description |
| -------------------- | ------ | ------- | -------------------------------------------------------------------------------------------------- |
| `enable_logging` | `bool` | `None` | Whether to enable Simli logging. |
| `max_session_length` | `int` | `None` | Absolute maximum session duration in seconds. Avatar disconnects after this time even if speaking. |
| `max_idle_time` | `int` | `None` | Maximum duration in seconds the avatar can be idle (not speaking) before disconnecting. |
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.simli import SimliVideoService
simli = SimliVideoService(
api_key=os.getenv("SIMLI_API_KEY"),
face_id=os.getenv("SIMLI_FACE_ID"),
)
```
### With Session Configuration
```python theme={null}
simli = SimliVideoService(
api_key=os.getenv("SIMLI_API_KEY"),
face_id=os.getenv("SIMLI_FACE_ID"),
is_trinity_avatar=True,
max_session_length=600,
max_idle_time=120,
enable_logging=True,
)
```
### Using Settings (Alternative)
```python theme={null}
simli = SimliVideoService(
api_key=os.getenv("SIMLI_API_KEY"),
face_id=os.getenv("SIMLI_FACE_ID"),
settings=SimliVideoService.Settings(),
)
```
## Notes
* **Service architecture**: As of version 0.0.106, `SimliVideoService` extends `AIService` and supports `SimliVideoService.Settings(...)` for configuration, aligning it with other video services like HeyGen and Tavus.
* **Audio resampling**: The service resamples audio to 16kHz internally for the Simli API and resamples received audio back to the pipeline's sample rate.
* **Trinity avatars**: When `is_trinity_avatar=True`, the service uses `playImmediate` for the first audio chunk after an interruption to reduce latency.
* **Deprecated parameters**:
* `SimliVideoService.InputParams` is deprecated since 0.0.106. Use direct constructor parameters (`max_session_length`, `max_idle_time`, `enable_logging`) instead.
* The `simli_config` and `use_turn_server` parameters are deprecated. Use `api_key` and `face_id` instead of `simli_config`.
# Tavus
Source: https://docs.pipecat.ai/api-reference/server/services/video/tavus
AI avatar video generation service for creating realistic talking avatars
## Overview
`TavusVideoService` integrates with Tavus to generate AI-powered video avatars that speak your text-to-speech output in real-time. The service takes audio input and produces synchronized video of a realistic avatar speaking, enabling engaging conversational AI experiences with visual presence.
Pipecat's API methods for Tavus video integration
Complete example with avatar video generation
Official Tavus replica and avatar documentation
Create avatars and manage API keys
## Installation
To use Tavus services, install the required dependency:
```bash theme={null}
pip install "pipecat-ai[tavus]"
```
## Prerequisites
### Tavus Account Setup
Before using Tavus video services, you need:
1. **Tavus Account**: Sign up at [Tavus Platform](https://platform.tavus.io/auth/sign-up?plan=free)
2. **API Key**: Generate an API key from your account dashboard
3. **Replica Creation**: Create and train voice replicas for your avatars
4. **Avatar Selection**: Choose or create avatar models for video generation
### Required Environment Variables
* `TAVUS_API_KEY`: Your Tavus API key for authentication
* `TAVUS_REPLICA_ID`: ID of your trained voice replica
## Configuration
Tavus API key for authentication.
ID of the Tavus voice replica to use for speech synthesis.
ID of the Tavus persona. Defaults to `"pipecat-stream"` for Pipecat TTS voice.
Async HTTP session used for communication with Tavus.
Runtime-configurable settings. Tavus has no model-level settings, so this is
primarily used for the `extra` dict. See [Service
Settings](/pipecat/fundamentals/service-settings) for details.
## Usage
### Basic Setup
```python theme={null}
import aiohttp
from pipecat.services.tavus import TavusVideoService
async with aiohttp.ClientSession() as session:
tavus = TavusVideoService(
api_key=os.getenv("TAVUS_API_KEY"),
replica_id=os.getenv("TAVUS_REPLICA_ID"),
session=session,
)
```
### With Custom Persona
```python theme={null}
tavus = TavusVideoService(
api_key=os.getenv("TAVUS_API_KEY"),
replica_id=os.getenv("TAVUS_REPLICA_ID"),
persona_id="my-custom-persona",
session=session,
)
```
## Notes
* **Dual room architecture**: When used with DailyTransport, Tavus creates two virtual rooms -- a Tavus room (containing the avatar and the Pipecat bot) and a user room (containing the Pipecat bot and the user).
* **Interruption handling**: The service handles interruptions by resetting the audio send task and sending an interrupt message to the Tavus client.
* **Metrics support**: The service supports TTFB metrics tracking using `TTSStartedFrame` and `BotStartedSpeakingFrame` signals.
# Moondream
Source: https://docs.pipecat.ai/api-reference/server/services/vision/moondream
Vision service implementation using Moondream for local image analysis and question answering
## Overview
`MoondreamService` provides local image analysis and question-answering capabilities using the Moondream model. It runs entirely on your local machine, supporting various hardware acceleration options including CUDA, Intel XPU, and Apple MPS for privacy-focused computer vision applications.
Pipecat's API methods for Moondream vision integration
Browse examples using Moondream vision
Official Moondream model documentation
Access Moondream model on Hugging Face
## Installation
To use Moondream services, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[moondream]"
```
## Prerequisites
### Local Model Setup
Before using Moondream vision services, you need:
1. **Model Download**: First run will automatically download the Moondream model from Hugging Face
2. **Hardware Configuration**: Set up CUDA, Intel XPU, or Apple MPS for optimal performance
3. **Storage Space**: Ensure sufficient disk space for model files
4. **Memory Requirements**: Adequate RAM/VRAM for model inference
### Hardware Acceleration
The service automatically detects and uses the best available hardware:
* **Intel XPU**: Requires intel\_extension\_for\_pytorch
* **NVIDIA CUDA**: For GPU acceleration
* **Apple Metal (MPS)**: For Apple Silicon optimization
* **CPU**: Fallback option for any system
### Configuration Options
* **Model Selection**: Choose Moondream model version and revision
* **Hardware Override**: Force CPU usage if needed
* **Local Processing**: Complete privacy with no external API calls
No API keys required - Moondream runs entirely locally for complete privacy
and control.
## Configuration
Hugging Face model identifier for the Moondream model. *Deprecated in
v0.0.105. Use `settings=MoondreamService.Settings(model=...)` instead.*
Specific model revision to use.
Whether to force CPU usage instead of hardware acceleration. When `False`, the
service automatically detects and uses the best available device (Intel XPU,
CUDA, MPS, or CPU).
Runtime-configurable settings. See [Settings](#settings) below.
### Settings
Runtime-configurable settings passed via the `settings` constructor argument using `MoondreamService.Settings(...)`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.
| Parameter | Type | Default | Description |
| --------- | ----- | ----------- | ------------------------------------------------------------- |
| `model` | `str` | `NOT_GIVEN` | Moondream model identifier. *(Inherited from base settings.)* |
`NOT_GIVEN` values are omitted, letting the service use its own defaults
(`"vikhyatk/moondream2"` for model). Only parameters that are explicitly set
are included.
## Usage
### Basic Setup
```python theme={null}
from pipecat.services.moondream import MoondreamService
vision = MoondreamService()
```
### With Settings and CPU Override
```python theme={null}
vision = MoondreamService(
revision="2025-01-09",
use_cpu=True,
settings=MoondreamService.Settings(
model="vikhyatk/moondream2",
),
)
```
The deprecated `model` constructor parameter is replaced by `Settings` as of
v0.0.105. Use `Settings` / `settings=` instead. See the [Service Settings
guide](/pipecat/fundamentals/service-settings) for migration details.
## Notes
* **First-run download**: The model is automatically downloaded from Hugging Face on first use. Ensure sufficient disk space and network access.
* **Hardware auto-detection**: When `use_cpu=False` (the default), the service detects available hardware in this priority order: Intel XPU, NVIDIA CUDA, Apple Metal (MPS), then CPU.
* **Data types**: CUDA and MPS use `float16` for faster inference, while XPU and CPU use `float32`.
* **Blocking inference**: Image analysis runs in a separate thread via `asyncio.to_thread` to avoid blocking the event loop.
# AICFilter
Source: https://docs.pipecat.ai/api-reference/server/utilities/audio/aic-filter
Speech enhancement using ai-coustics' SDK
## Overview
`AICFilter` is an audio processor that enhances user speech by reducing background noise and improving speech clarity. It inherits from `BaseAudioFilter` and processes audio frames in real-time using ai-coustics' speech enhancement technology.
To use AIC, you need a license key. Get started at [ai-coustics.com](https://ai-coustics.com/pipecat).
This documentation covers **aic-sdk v2.x**. If you're using aic-sdk v1.x,
please see the [Migration Guide](#migration-guide-v1-to-v2) section below for
upgrading instructions.
## Installation
The AIC filter requires additional dependencies:
```bash theme={null}
pip install "pipecat-ai[aic]"
```
## Constructor Parameters
ai-coustics license key for authentication. Get your key at
[developers.ai-coustics.io](https://developers.ai-coustics.io).
Model identifier to download from CDN. Required if `model_path` is not provided.
See [artifacts.ai-coustics.io](https://artifacts.ai-coustics.io/) for available models.
See the [documentation](https://docs.ai-coustics.com/guides/models) for more detailed information about the models.
Examples: `"quail-vf-2.0-l-16khz"`, `"quail-vf-l-16khz"`, `"quail-s-16khz"`, `"quail-l-8khz"`
Path to a local `.aicmodel` file. If provided, `model_id` is ignored and no
download occurs. Useful for offline deployments or custom models.
Directory for downloading and caching models. Defaults to a cache directory in
the user's home folder.
Overall enhancement strength from `0.0` (no enhancement) to `1.0` (maximum
enhancement). If `None`, the model's default behavior is used. This parameter
allows you to control the intensity of the speech enhancement applied by the
model.
## Methods
### create\_vad\_analyzer
Creates an `AICVADAnalyzer` that uses the AIC model's built-in voice activity detection.
```python theme={null}
def create_vad_analyzer(
*,
speech_hold_duration: Optional[float] = None,
minimum_speech_duration: Optional[float] = None,
sensitivity: Optional[float] = None,
) -> AICVADAnalyzer
```
#### VAD Parameters
Controls for how long the VAD continues to detect speech after the audio
signal no longer contains speech (in seconds). Range: `0.0` to `100x model
window length`, Default (in SDK): `0.05s`
Controls for how long speech needs to be present in the audio signal before
the VAD considers it speech (in seconds). Range: `0.0` to `1.0`, Default (in
SDK): `0.0s`
Controls the sensitivity (energy threshold) of the VAD. This value is used by
the VAD as the threshold a speech audio signal's energy has to exceed in order
to be considered speech. Formula: `Energy threshold = 10 ** (-sensitivity)`
Range: `1.0` to `15.0`, Default (in SDK): `6.0`
### get\_vad\_context
Returns the VAD context once the processor is initialized. Can be used to dynamically adjust VAD parameters at runtime.
```python theme={null}
vad_ctx = aic_filter.get_vad_context()
vad_ctx.set_parameter(VadParameter.Sensitivity, 8.0)
```
## Input Frames
Specific control frame to toggle filtering on/off
```python theme={null}
from pipecat.frames.frames import FilterEnableFrame
# Disable speech enhancement
await task.queue_frame(FilterEnableFrame(False))
# Re-enable speech enhancement
await task.queue_frame(FilterEnableFrame(True))
```
## Usage Examples
### Basic Usage with AIC VAD
The recommended approach is to use `AICFilter` with its built-in VAD analyzer:
```python theme={null}
from pipecat.audio.filters.aic_filter import AICFilter
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.transports.services.daily import DailyTransport, DailyParams
# Create the AIC filter
aic_filter = AICFilter(
license_key=os.environ["AIC_SDK_LICENSE"],
model_id="quail-vf-2.0-l-16khz",
)
# Use AIC's integrated VAD
transport = DailyTransport(
room_url,
token,
"Bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_in_filter=aic_filter,
),
)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
vad_analyzer=aic_filter.create_vad_analyzer(
speech_hold_duration=0.05,
minimum_speech_duration=0.0,
sensitivity=6.0,
),
),
)
```
### Using a Local Model
For offline deployments or when you want to manage model files yourself:
```python theme={null}
from pipecat.audio.filters.aic_filter import AICFilter
aic_filter = AICFilter(
license_key=os.environ["AIC_SDK_LICENSE"],
model_path="/path/to/your/model.aicmodel",
)
```
### Custom Cache Directory
Specify a custom directory for model downloads:
```python theme={null}
from pipecat.audio.filters.aic_filter import AICFilter
aic_filter = AICFilter(
license_key=os.environ["AIC_SDK_LICENSE"],
model_id="quail-s-16khz",
model_download_dir="/opt/aic-models",
)
```
### With Enhancement Level Control
Control the enhancement strength applied by the model:
```python theme={null}
from pipecat.audio.filters.aic_filter import AICFilter
# Set enhancement level to 70% strength
aic_filter = AICFilter(
license_key=os.environ["AIC_SDK_LICENSE"],
model_id="quail-vf-l-16khz",
enhancement_level=0.7,
)
# Use default model behavior (no enhancement_level specified)
aic_filter_default = AICFilter(
license_key=os.environ["AIC_SDK_LICENSE"],
model_id="quail-vf-l-16khz",
)
```
### With Other Transports
The AIC filter works with any Pipecat transport:
```python theme={null}
from pipecat.audio.filters.aic_filter import AICFilter
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.transports.websocket import FastAPIWebsocketTransport, FastAPIWebsocketParams
aic_filter = AICFilter(
license_key=os.environ["AIC_SDK_LICENSE"],
model_id="quail-vf-2.0-l-16khz",
)
transport = FastAPIWebsocketTransport(
params=FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_in_filter=aic_filter,
),
)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
vad_analyzer=aic_filter.create_vad_analyzer(
speech_hold_duration=0.05,
sensitivity=6.0,
),
),
)
```
See the [AIC filter
example](https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-aicoustics.py)
for a complete working example.
## Models
For detailed information about the available models, take a look at the [Models documentation](https://docs.ai-coustics.com/guides/models).
## Audio Flow
```mermaid theme={null}
graph TD
A[AudioRawFrame] --> B[AICFilter]
B --> C[AICVADAnalyzer]
C --> D[STT]
```
The AIC filter enhances audio before it reaches the VAD and STT stages, improving transcription accuracy in noisy environments.
## Migration Guide (v1 to v2)
For the complete aic-sdk migration guide including all API changes, see the
official [Python 1.3 to 2.0 Migration
Guide](https://docs.ai-coustics.com/guides/migrations/python-1-3-to-2-0#quick-migration-checklist).
### Migration Steps
1. Update Pipecat to the latest version (aic-sdk v2.x is included automatically).
2. Remove deprecated constructor parameters (`model_type`, `voice_gain`, `noise_gate_enable`).
3. Add `model_id` parameter with an appropriate model (e.g., `"quail-vf-l-16khz"`).
4. Update any runtime VAD adjustments to use the new VAD context API.
5. We recommend to use `aic_filter.create_vad_analyzer()` for improved accuracy.
### Breaking Changes
| v1 Parameter | v2 Replacement |
| ------------------- | --------------------------------------------------------------------------------------- |
| `model_type` | `model_id` (string-based model selection) |
| `enhancement_level` | Now optional (0.0-1.0 range, applies at initialization and when toggling filter on/off) |
| `voice_gain` | Removed |
| `noise_gate_enable` | Removed |
## Notes
* Requires ai-coustics license key (get one at [developers.ai-coustics.io](https://developers.ai-coustics.io))
* Voice Focus 2.0 models are supported with aic-sdk 2.1.0+ (included in pipecat-ai\[aic])
* Models are automatically downloaded and cached on first use
* Supports real-time audio processing with low latency
* Handles PCM\_16 audio format (int16 samples)
* Thread-safe for pipeline processing
* Can be dynamically enabled/disabled via `FilterEnableFrame`
* Integrated VAD provides better accuracy than standalone VAD when using enhancement
* For available models, visit [artifacts.ai-coustics.io](https://artifacts.ai-coustics.io/)
# AudioBufferProcessor
Source: https://docs.pipecat.ai/api-reference/server/utilities/audio/audio-buffer-processor
Process and buffer audio frames from conversations with flexible event handling
## Overview
The `AudioBufferProcessor` captures and buffers audio frames from both input (user) and output (bot) sources during conversations. It provides synchronized audio streams with configurable sample rates, supports both mono and stereo output, and offers flexible event handlers for various audio processing workflows.
## Constructor
```python theme={null}
AudioBufferProcessor(
sample_rate=None,
num_channels=1,
buffer_size=0,
enable_turn_audio=False,
**kwargs
)
```
### Parameters
The desired output sample rate in Hz. If `None`, uses the transport's sample
rate from the `StartFrame`.
Number of output audio channels:
* `1`: Mono output (user and bot audio are mixed together)
* `2`: Stereo output (user audio on left channel, bot audio on right channel)
Buffer size in bytes that triggers audio data events:
* `0`: Events only trigger when recording stops
* `>0`: Events trigger whenever buffer reaches this size (useful for chunked processing)
Whether to enable per-turn audio event handlers (`on_user_turn_audio_data` and
`on_bot_turn_audio_data`).
**Deprecated since version 0.0.72.** This parameter no longer has any effect
and will be removed in a future version.
## Properties
### sample\_rate
```python theme={null}
@property
def sample_rate(self) -> int
```
The current sample rate of the audio processor in Hz.
### num\_channels
```python theme={null}
@property
def num_channels(self) -> int
```
The number of channels in the audio output (1 for mono, 2 for stereo).
## Methods
### start\_recording()
```python theme={null}
async def start_recording()
```
Start recording audio from both user and bot sources. Initializes recording state and resets audio buffers.
### stop\_recording()
```python theme={null}
async def stop_recording()
```
Stop recording and trigger final audio data handlers with any remaining buffered audio.
### has\_audio()
```python theme={null}
def has_audio() -> bool
```
Check if both user and bot audio buffers contain data.
**Returns:** `True` if both buffers contain audio data.
## Event Handlers
The processor supports multiple event handlers for different audio processing workflows. Register handlers using the `@processor.event_handler()` decorator.
### on\_audio\_data
Triggered when `buffer_size` is reached or recording stops, providing merged audio.
```python theme={null}
@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(buffer, audio: bytes, sample_rate: int, num_channels: int):
# Handle merged audio data
pass
```
**Parameters:**
* `buffer`: The AudioBufferProcessor instance
* `audio`: Merged audio data (format depends on `num_channels` setting)
* `sample_rate`: Sample rate in Hz
* `num_channels`: Number of channels (1 or 2)
### on\_track\_audio\_data
Triggered alongside `on_audio_data`, providing separate user and bot audio tracks.
```python theme={null}
@audiobuffer.event_handler("on_track_audio_data")
async def on_track_audio_data(buffer, user_audio: bytes, bot_audio: bytes,
sample_rate: int, num_channels: int):
# Handle separate audio tracks
pass
```
**Parameters:**
* `buffer`: The AudioBufferProcessor instance
* `user_audio`: Raw user audio bytes (always mono)
* `bot_audio`: Raw bot audio bytes (always mono)
* `sample_rate`: Sample rate in Hz
* `num_channels`: Always 1 for individual tracks
### on\_user\_turn\_audio\_data
Triggered when a user speaking turn ends. Requires `enable_turn_audio=True`.
```python theme={null}
@audiobuffer.event_handler("on_user_turn_audio_data")
async def on_user_turn_audio_data(buffer, audio: bytes, sample_rate: int, num_channels: int):
# Handle user turn audio
pass
```
**Parameters:**
* `buffer`: The AudioBufferProcessor instance
* `audio`: Audio data from the user's speaking turn
* `sample_rate`: Sample rate in Hz
* `num_channels`: Always 1 (mono)
### on\_bot\_turn\_audio\_data
Triggered when a bot speaking turn ends. Requires `enable_turn_audio=True`.
```python theme={null}
@audiobuffer.event_handler("on_bot_turn_audio_data")
async def on_bot_turn_audio_data(buffer, audio: bytes, sample_rate: int, num_channels: int):
# Handle bot turn audio
pass
```
**Parameters:**
* `buffer`: The AudioBufferProcessor instance
* `audio`: Audio data from the bot's speaking turn
* `sample_rate`: Sample rate in Hz
* `num_channels`: Always 1 (mono)
## Audio Processing Features
* **Automatic resampling**: Converts incoming audio to the specified sample rate
* **Buffer synchronization**: Aligns user and bot audio streams temporally
* **Silence insertion**: Fills gaps in non-continuous audio streams to maintain timing
* **Turn tracking**: Monitors speaking turns when `enable_turn_audio=True`
## Integration Notes
### STT Audio Passthrough
If using an STT service in your pipeline, enable audio passthrough to make audio available to the AudioBufferProcessor:
```python theme={null}
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
audio_passthrough=True,
)
```
`audio_passthrough` is enabled by default.
### Pipeline Placement
Add the AudioBufferProcessor after `transport.output()` to capture both user and bot audio:
```python theme={null}
pipeline = Pipeline([
transport.input(),
# ... other processors ...
transport.output(),
audiobuffer, # Place after audio output
# ... remaining processors ...
])
```
# KoalaFilter
Source: https://docs.pipecat.ai/api-reference/server/utilities/audio/koala-filter
Audio noise reduction filter using Koala AI technology from Picovoice
## Overview
`KoalaFilter` is an audio processor that reduces background noise in real-time audio streams using Koala Noise Suppression technology from Picovoice. It inherits from `BaseAudioFilter` and processes audio frames to improve audio quality by removing unwanted noise.
To use Koala, you need a Picovoice access key. Get started at [Picovoice Console](https://console.picovoice.ai/signup).
## Installation
The Koala filter requires additional dependencies:
```bash theme={null}
pip install "pipecat-ai[koala]"
```
You'll also need to set up your Koala access key as an environment variable: `KOALA_ACCESS_KEY`
## Constructor Parameters
Picovoice access key for using the Koala noise suppression service
## Input Frames
Specific control frame to toggle filtering on/off
```python theme={null}
from pipecat.frames.frames import FilterEnableFrame
# Disable noise reduction
await task.queue_frame(FilterEnableFrame(False))
# Re-enable noise reduction
await task.queue_frame(FilterEnableFrame(True))
```
## Usage Example
```python theme={null}
from pipecat.audio.filters.koala_filter import KoalaFilter
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_in_filter=KoalaFilter(access_key=os.getenv("KOALA_ACCESS_KEY")), # Enable Koala noise reduction
audio_in_enabled=True,
audio_out_enabled=True,
),
)
```
## Notes
* Requires Picovoice access key
* Supports real-time audio processing
* Handles 16-bit PCM audio format
* Can be dynamically enabled/disabled
* Maintains audio quality while reducing noise
* Efficient processing for low latency
* Automatically handles audio frame buffering
* Sample rate must match Koala's required sample rate
# KrispVivaFilter
Source: https://docs.pipecat.ai/api-reference/server/utilities/audio/krisp-viva-filter
Audio voice isolation filter using Krisp VIVA model
## Overview
`KrispVivaFilter` is an audio processor that isolates the user's voice in real-time audio streams using Krisp VIVA SDK. It inherits from `BaseAudioFilter` and processes audio frames to improve audio quality by filtering out background noise and other voices using Krisp's voice isolation algorithms.
To use Krisp, you need a Krisp SDK license. Get started at [Krisp.ai](https://krisp.ai/developers/).
## Installation
See the [Krisp guide](/pipecat/features/krisp-viva) to learn how to install the Krisp VIVA SDK.
## Environment Variables
You need to provide the path to the Krisp model file (.kef extension). This can either be done by setting the `KRISP_VIVA_MODEL_PATH` environment variable or by setting the `model_path` in the constructor.
For SDK v1.6.1+, you also need to provide a Krisp API key via the `api_key` constructor parameter or the `KRISP_VIVA_API_KEY` environment variable.
## Constructor Parameters
Path to the Krisp model file (.kef extension).
You can set the `model_path` directly. Alternatively, you can set the `KRISP_VIVA_MODEL_PATH` environment variable to the model file path.
Voice isolation level for the filter
Krisp SDK API key for licensing (required for SDK v1.6.1+). If empty, falls back to the `KRISP_VIVA_API_KEY` environment variable.
## Supported Sample Rates
The filter supports the following sample rates:
* 8000 Hz
* 16000 Hz
* 24000 Hz
* 32000 Hz
* 44100 Hz
* 48000 Hz
## Input Frames
Specific control frame to toggle filtering on/off
```python theme={null}
from pipecat.frames.frames import FilterEnableFrame
# Disable voice isolation
await task.queue_frame(FilterEnableFrame(False))
# Re-enable voice isolation
await task.queue_frame(FilterEnableFrame(True))
```
## Usage Example
```python theme={null}
from pipecat.audio.filters.krisp_viva_filter import KrispVivaFilter
from pipecat.transports.daily.transport import DailyParams, DailyTransport
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_in_enabled=True,
audio_in_filter=KrispVivaFilter(), # Enable Krisp voice isolation
audio_out_enabled=True,
),
)
```
# KrispVivaVadAnalyzer
Source: https://docs.pipecat.ai/api-reference/server/utilities/audio/krisp-viva-vad-analyzer
Voice Activity Detection analyzer using the Krisp VIVA SDK
## Overview
`KrispVivaVadAnalyzer` is a Voice Activity Detection (VAD) analyzer that uses the Krisp VIVA SDK to detect speech in audio streams. It provides high-accuracy speech detection with support for multiple sample rates.
## Installation
```bash theme={null}
pip install "pipecat-ai[krisp]"
```
## Prerequisites
You need a Krisp VIVA VAD model file (`.kef` extension). Set the model path via:
* The `model_path` constructor parameter, or
* The `KRISP_VIVA_VAD_MODEL_PATH` environment variable
## Constructor Parameters
Path to the Krisp model file (`.kef` extension). If not provided, uses the
`KRISP_VIVA_VAD_MODEL_PATH` environment variable.
Frame duration in milliseconds. Must be 10, 15, 20, 30, or 32ms.
Audio sample rate in Hz. Must be 8000, 16000, 32000, 44100, or 48000.
Voice Activity Detection parameters object
Confidence threshold for speech detection. Higher values make detection more strict. Must
be between 0 and 1.
Time in seconds that speech must be detected before transitioning to SPEAKING state.
Time in seconds of silence required before transitioning back to QUIET state.
Minimum audio volume threshold for speech detection. Must be between 0 and 1.
## Usage Example
```python theme={null}
from pipecat.audio.vad.krisp_viva_vad import KrispVivaVadAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
vad_analyzer=KrispVivaVadAnalyzer(
model_path="/path/to/model.kef",
params=VADParams(stop_secs=0.2)
),
),
)
```
## Technical Details
### Sample Rate Requirements
The analyzer supports five sample rates:
* 8000 Hz
* 16000 Hz
* 32000 Hz
* 44100 Hz
* 48000 Hz
### Model Requirements
* Model files must have a `.kef` extension
* Model path can be specified via constructor or environment variable
* Model is loaded once during initialization
## Notes
* High-accuracy speech detection using Krisp VIVA SDK
* Supports multiple sample rates (8kHz to 48kHz)
* Requires external `.kef` model file
* Thread-safe for pipeline processing
* Automatic session management
* Configurable frame duration
# RNNoiseFilter
Source: https://docs.pipecat.ai/api-reference/server/utilities/audio/rnnoise-filter
Audio noise suppression filter using RNNoise recurrent neural network
## Overview
`RNNoiseFilter` is an audio processor that reduces background noise in real-time audio streams using RNNoise, a recurrent neural network for audio noise reduction. It inherits from `BaseAudioFilter` and processes audio frames via the `pyrnnoise` library.
RNNoise is a free, open-source noise suppression solution that requires no API keys or external services.
## Installation
The RNNoise filter requires additional dependencies:
```bash theme={null}
pip install "pipecat-ai[rnnoise]"
```
## Constructor Parameters
Quality of the internal resampler used when the transport sample rate differs from 48kHz. One of `"VHQ"` (Very High Quality), `"HQ"` (High Quality), `"MQ"` (Medium Quality), `"LQ"` (Low Quality), or `"QQ"` (Quick). Defaults to `"QQ"` for lowest latency.
## Input Frames
Specific control frame to toggle filtering on/off
```python theme={null}
from pipecat.frames.frames import FilterEnableFrame
# Disable noise suppression
await task.queue_frame(FilterEnableFrame(False))
# Re-enable noise suppression
await task.queue_frame(FilterEnableFrame(True))
```
## Usage Example
```python theme={null}
from pipecat.audio.filters.rnnoise_filter import RNNoiseFilter
from pipecat.transports.services.daily import DailyTransport, DailyParams
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_in_enabled=True,
audio_in_filter=RNNoiseFilter(), # Enable RNNoise noise suppression
audio_out_enabled=True,
),
)
```
## Notes
* No API key or external service required (fully local processing)
* RNNoise operates at 48kHz internally; automatic resampling is applied for other sample rates
* Handles 16-bit PCM audio format
* Can be dynamically enabled/disabled via `FilterEnableFrame`
* Buffers audio to match RNNoise's required frame length (480 samples)
* When resampling is needed, uses SOXR (install with `pip install "pipecat-ai[soxr]"`)
# SileroVADAnalyzer
Source: https://docs.pipecat.ai/api-reference/server/utilities/audio/silero-vad-analyzer
Voice Activity Detection analyzer using the Silero VAD ONNX model
## Overview
`SileroVADAnalyzer` is a Voice Activity Detection (VAD) analyzer that uses the Silero VAD ONNX model to detect speech in audio streams. It provides high-accuracy speech detection with efficient processing using ONNX runtime.
## Installation
The Silero VAD analyzer is now included as a core dependency:
```bash theme={null}
pip install "pipecat-ai"
```
## Constructor Parameters
Audio sample rate in Hz. Must be either 8000 or 16000.
Voice Activity Detection parameters object
Confidence threshold for speech detection. Higher values make detection more strict. Must be between 0 and 1.
Time in seconds that speech must be detected before transitioning to SPEAKING state.
Time in seconds of silence required before transitioning back to QUIET state.
Minimum audio volume threshold for speech detection. Must be between 0 and 1.
## Usage Example
```python theme={null}
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
)
```
## Technical Details
### Sample Rate Requirements
The analyzer supports two sample rates:
* 8000 Hz (256 samples per frame)
* 16000 Hz (512 samples per frame)
Model Management
* Uses ONNX runtime for efficient inference
* Automatically resets model state every 5 seconds to manage memory
* Runs on CPU by default for consistent performance
* Includes built-in model file
## Notes
* High-accuracy speech detection
* Efficient ONNX-based processing
* Automatic memory management
* Thread-safe for pipeline processing
* Built-in model file included
* CPU-optimized inference
* Supports 8kHz and 16kHz audio
# SoundfileMixer
Source: https://docs.pipecat.ai/api-reference/server/utilities/audio/soundfile-mixer
Audio mixer for combining real-time audio with sound files
## Overview
`SoundfileMixer` is an audio mixer that combines incoming audio with audio from files. It supports multiple audio file formats through the soundfile library and can handle runtime volume adjustments and sound switching.
## Installation
The soundfile mixer requires additional dependencies:
```bash theme={null}
pip install "pipecat-ai[soundfile]"
```
## Constructor Parameters
Dictionary mapping sound names to file paths. Files must be mono (single channel).
Name of the default sound to play (must be a key in sound\_files).
Initial volume for the mixed sound. Values typically range from 0.0 to 1.0, but can go higher.
Whether to loop the sound file when it reaches the end.
## Control Frames
Updates mixer settings at runtime
Changes the current playing sound (must be a key in sound\_files)
Updates the mixing volume
Updates whether the sound should loop
Enables or disables the mixer
Whether mixing should be enabled
## Usage Example
```python theme={null}
# Initialize mixer with sound files
mixer = SoundfileMixer(
sound_files={"office": "office_ambience.wav"},
default_sound="office",
volume=2.0,
)
# Add to transport
transport = DailyTransport(
room_url,
token,
"Audio Bot",
DailyParams(
audio_out_enabled=True,
audio_out_mixer=mixer,
),
)
# Control mixer at runtime
await task.queue_frame(MixerUpdateSettingsFrame({"volume": 0.5}))
await task.queue_frame(MixerEnableFrame(False)) # Disable mixing
await task.queue_frame(MixerEnableFrame(True)) # Enable mixing
```
## Notes
* Supports any audio format that soundfile can read
* Automatically resamples audio files to match output sample rate
* Files must be mono (single channel)
* Thread-safe for pipeline processing
* Can dynamically switch between multiple sound files
* Volume can be adjusted in real-time
* Mixing can be enabled/disabled on demand
# Context Summarization
Source: https://docs.pipecat.ai/api-reference/server/utilities/context-summarization
Reference for LLMAutoContextSummarizationConfig, LLMContextSummaryConfig, LLMContextSummarizer, and SummaryAppliedEvent
## Overview
Context summarization automatically compresses older conversation history when token or message limits are reached. It is configured via `LLMAutoContextSummarizationConfig` (auto-trigger thresholds) and `LLMContextSummaryConfig` (summary generation params), and managed by `LLMContextSummarizer`.
For a walkthrough of how to enable and customize context summarization, see the [Context Summarization guide](/pipecat/fundamentals/context-summarization).
## LLMAutoContextSummarizationConfig
```python theme={null}
from pipecat.utils.context.llm_context_summarization import LLMAutoContextSummarizationConfig
```
Controls when automatic context summarization triggers.
Maximum context size in estimated tokens before triggering summarization.
Tokens are estimated using the heuristic of 1 token per 4 characters. Set to
`None` to disable token-based triggering. At least one of `max_context_tokens`
or `max_unsummarized_messages` must be set.
Maximum number of new messages before triggering summarization, even if the
token limit has not been reached. Set to `None` to disable message-count
triggering. At least one of `max_context_tokens` or
`max_unsummarized_messages` must be set.
Configuration for how summaries are generated. See below.
## LLMContextSummaryConfig
```python theme={null}
from pipecat.utils.context.llm_context_summarization import LLMContextSummaryConfig
```
Controls how summaries are generated. Used as `summary_config` inside `LLMAutoContextSummarizationConfig`, or passed directly to `LLMSummarizeContextFrame` for on-demand summarization.
Target token count for the generated summary. Passed to the LLM as
`max_tokens`. Auto-adjusted to 80% of `max_context_tokens` if it exceeds that
value.
Number of recent messages to preserve uncompressed after each summarization.
Custom system prompt for the LLM when generating summaries. When `None`, uses
a built-in default prompt.
Template for formatting the summary when injected into context. Must contain ` {summary}` as a placeholder. Allows wrapping summaries in custom delimiters
(e.g., XML tags) so system prompts can distinguish summaries from live
conversation.
Dedicated LLM service for generating summaries. When set, summarization
requests are sent to this service instead of the pipeline's primary LLM.
Useful for routing summarization to a cheaper or faster model. When `None`,
the pipeline LLM handles summarization.
Maximum time in seconds to wait for the LLM to generate a summary. If
exceeded, summarization is aborted and future summarization attempts are
unblocked. Set to `None` to disable the timeout.
## LLMSummarizeContextFrame
```python theme={null}
from pipecat.frames.frames import LLMSummarizeContextFrame
```
Push this frame into the pipeline to trigger on-demand context summarization without waiting for automatic thresholds.
Per-request override for summary generation settings (prompt, token budget,
messages to keep). When `None`, the summarizer's default
`LLMContextSummaryConfig` is used.
On-demand summarization works even when `enable_auto_context_summarization` is `False` — the summarizer is always created internally to handle manually pushed frames.
```python theme={null}
from pipecat.frames.frames import LLMSummarizeContextFrame
# Trigger with default settings
await llm.queue_frame(LLMSummarizeContextFrame())
# Trigger with per-request overrides
await llm.queue_frame(
LLMSummarizeContextFrame(
config=LLMContextSummaryConfig(
target_context_tokens=2000,
min_messages_after_summary=2,
)
)
)
```
If a summarization is already in progress, the manual request is ignored.
## LLMContextSummarizer
```python theme={null}
from pipecat.processors.aggregators.llm_context_summarizer import LLMContextSummarizer
```
Monitors context size and orchestrates summarization. Created automatically by `LLMAssistantAggregator` when `enable_auto_context_summarization=True`.
### Event Handlers
| Event | Parameters | Description |
| -------------------- | ---------------------------- | --------------------------------------------------------------------- |
| `on_summary_applied` | `event: SummaryAppliedEvent` | Emitted after a summary has been successfully applied to the context. |
#### on\_summary\_applied
The `on_summary_applied` event is exposed on both `LLMContextSummarizer` and `LLMAssistantAggregator`. Register handlers on the aggregator for cleaner access:
```python theme={null}
@assistant_aggregator.event_handler("on_summary_applied")
async def on_summary_applied(aggregator, summarizer, event: SummaryAppliedEvent):
logger.info(
f"Context summarized: {event.original_message_count} -> "
f"{event.new_message_count} messages "
f"({event.summarized_message_count} summarized, "
f"{event.preserved_message_count} preserved)"
)
```
You can also register handlers directly on the summarizer if you have access to it:
```python theme={null}
summarizer = assistant_aggregator._summarizer
@summarizer.event_handler("on_summary_applied")
async def on_summary_applied(summarizer, event: SummaryAppliedEvent):
logger.info(
f"Context summarized: {event.original_message_count} -> "
f"{event.new_message_count} messages"
)
```
## SummaryAppliedEvent
```python theme={null}
from pipecat.processors.aggregators.llm_context_summarizer import SummaryAppliedEvent
```
Event data emitted when context summarization completes successfully.
Number of messages in context before summarization.
Number of messages in context after summarization.
Number of messages that were compressed into the summary.
Number of messages preserved uncompressed (system message plus recent
messages).
## Deprecated: LLMContextSummarizationConfig
```python theme={null}
from pipecat.utils.context.llm_context_summarization import LLMContextSummarizationConfig
```
`LLMContextSummarizationConfig` is deprecated since v0.0.104. Use
`LLMAutoContextSummarizationConfig` with a nested `LLMContextSummaryConfig`
instead. The old class still works but emits a `DeprecationWarning`.
Both `max_context_tokens` and `max_unsummarized_messages` can now be set to
`None` independently to disable that threshold. At least one must remain set.
The old class flattened all parameters into a single object. Migrate by splitting trigger thresholds (`max_context_tokens`, `max_unsummarized_messages`) into `LLMAutoContextSummarizationConfig` and summary generation params into `LLMContextSummaryConfig`:
```python theme={null}
# Before (deprecated)
config = LLMContextSummarizationConfig(
max_context_tokens=4000,
target_context_tokens=3000,
max_unsummarized_messages=10,
)
# After
config = LLMAutoContextSummarizationConfig(
max_context_tokens=4000,
max_unsummarized_messages=10,
summary_config=LLMContextSummaryConfig(
target_context_tokens=3000,
),
)
```
Similarly, the `LLMAssistantAggregatorParams` fields were renamed:
* `enable_context_summarization` → `enable_auto_context_summarization`
* `context_summarization_config` → `auto_context_summarization_config`
The old field names still work with a `DeprecationWarning`.
# DailyRESTHelper
Source: https://docs.pipecat.ai/api-reference/server/utilities/daily/rest-helper
Classes and methods for interacting with the Daily API to manage rooms and tokens
For complete Daily REST API reference and additional details
## Classes
### DailyRoomSipParams
Configuration for SIP (Session Initiation Protocol) parameters.
Display name for the SIP endpoint
Whether video is enabled for SIP
SIP connection mode
Number of SIP endpoints
```python theme={null}
from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams
sip_params = DailyRoomSipParams(
display_name="conference-line",
video=True,
num_endpoints=2
)
```
### RecordingsBucketConfig
Configuration for storing Daily recordings in a custom S3 bucket.
Name of the S3 bucket for storing recordings
AWS region where the S3 bucket is located
ARN of the IAM role to assume for S3 access
Whether to allow API access to the recordings
```python theme={null}
from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig
bucket_config = RecordingsBucketConfig(
bucket_name="my-recordings-bucket",
bucket_region="us-west-2",
assume_role_arn="arn:aws:iam::123456789012:role/DailyRecordingsRole",
allow_api_access=True
)
```
### DailyRoomProperties
Properties that configure a Daily room's behavior and features.
Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes)
Whether chat is enabled in the room
Whether the prejoin lobby UI is enabled
Whether emoji reactions are enabled
Whether to eject participants when room expires
Whether dial-out is enabled
Recording settings ("cloud", "cloud-audio-only", "local", or "raw-tracks")
Geographic region for room
Maximum number of participants allowed in the room
Configuration for custom S3 bucket recordings
SIP configuration parameters
SIP URI configuration (returned by Daily)
Whether the camera video is turned off by default
The class also includes a `sip_endpoint` property that returns the SIP endpoint URI if available.
`enable_recording` also supports `cloud-audio-only`, which records the call server-side and produces an audio-only MPEG-4 file with `.m4a` file extension and content type as `audio/mp4`. This recording setting behaves like `cloud`, except the `layout` options do not apply because there are no video tracks. Note: you can retrieve the resulting `.m4a` recordings via the [Daily REST API](https://docs.daily.co/reference/rest-api/recordings), in the same way you fetch `cloud` recording assets.
If you're already using `enable_recording="cloud"` and want to switch to
audio-only without changing your code, you can set
`force_audio_only_recording: 1` on your Daily domain. This forces all cloud
recordings to be audio-only (`.m4a` instead of `.mp4`) and skips recording
video tracks even if they are present. This is useful when you want to
transition to audio-only recordings immediately while avoiding an application
redeploy. Switching `force_audio_only_recording: 0` will allow recording video
tracks.
```python theme={null}
import time
from pipecat.transports.services.helpers.daily_rest import (
DailyRoomProperties,
DailyRoomSipParams,
RecordingsBucketConfig,
)
properties = DailyRoomProperties(
exp=time.time() + 3600, # 1 hour from now
enable_chat=True,
enable_emoji_reactions=True,
enable_recording="cloud-audio-only",
geo="us-west",
max_participants=50,
sip=DailyRoomSipParams(display_name="conference"),
recordings_bucket=RecordingsBucketConfig(
bucket_name="my-bucket",
bucket_region="us-west-2",
assume_role_arn="arn:aws:iam::123456789012:role/DailyRole"
)
)
# Access SIP endpoint if available
if properties.sip_endpoint:
print(f"SIP endpoint: {properties.sip_endpoint}")
```
### DailyRoomParams
Parameters for creating a new Daily room.
Room name (if not provided, one will be generated)
Room privacy setting ("private" or "public")
Room configuration properties
```python theme={null}
import time
from pipecat.transports.services.helpers.daily_rest import (
DailyRoomParams,
DailyRoomProperties,
)
params = DailyRoomParams(
name="team-meeting",
privacy="private",
properties=DailyRoomProperties(
enable_chat=True,
exp=time.time() + 7200 # 2 hours from now
)
)
```
### DailyRoomObject
Response object representing a Daily room.
Unique room identifier
Room name
Whether the room was created via API
Room privacy setting
Complete room URL
Room creation timestamp in ISO 8601 format
Room configuration
```python theme={null}
from pipecat.transports.services.helpers.daily_rest import (
DailyRoomObject,
DailyRoomProperties,
)
# Example of what a DailyRoomObject looks like when received
room = DailyRoomObject(
id="abc123",
name="team-meeting",
api_created=True,
privacy="private",
url="https://your-domain.daily.co/team-meeting",
created_at="2024-01-20T10:00:00.000Z",
config=DailyRoomProperties(
enable_chat=True,
exp=1705743600
)
)
```
### DailyMeetingTokenProperties
Properties for configuring a Daily meeting token.
The room this token is valid for. If not set, token is valid for all rooms.
Whether to eject user when token expires
Eject user after this many seconds
"Not before" timestamp - users cannot join before this time
Expiration timestamp - users cannot join after this time
Whether token grants owner privileges
User's display name in the meeting
Unique identifier for the user (36 char limit)
Whether user can share their screen
Whether to join with video off
Whether to join with audio off
Recording settings ("cloud", "cloud-audio-only", "local", or "raw-tracks")
Whether to show prejoin UI
Whether to start cloud recording when user joins
Initial default permissions for a non-meeting-owner participant
### DailyMeetingTokenParams
Parameters for creating a Daily meeting token.
Token configuration properties
```python theme={null}
from pipecat.transports.services.helpers.daily_rest import (
DailyMeetingTokenParams,
DailyMeetingTokenProperties,
)
token_params = DailyMeetingTokenParams(
properties=DailyMeetingTokenProperties(
user_name="John Doe",
enable_screenshare=True,
start_video_off=True,
enable_recording="cloud-audio-only",
start_cloud_recording=True,
permissions={"canSend": ["video", "audio"]}
)
)
```
Recording type: `cloud` will produce files with a `.mp4` extension, while
`cloud-audio-only` will produce files with a `.m4a` extension.
## Initialize DailyRESTHelper
Create a new instance of the Daily REST helper.
Your Daily API key
The Daily API base URL
An aiohttp client session for making HTTP requests
```python theme={null}
helper = DailyRESTHelper(
daily_api_key="your-api-key",
aiohttp_session=session
)
```
## Create Room
Creates a new Daily room with specified parameters.
Room configuration parameters including name, privacy, and properties
```python theme={null}
# Create a room that expires in 1 hour
params = DailyRoomParams(
name="my-room",
privacy="private",
properties=DailyRoomProperties(
exp=time.time() + 3600,
enable_chat=True
)
)
room = await helper.create_room(params)
print(f"Room URL: {room.url}")
```
## Get Room From URL
Retrieves room information using a Daily room URL.
The complete Daily room URL
```python theme={null}
room = await helper.get_room_from_url("https://your-domain.daily.co/my-room")
print(f"Room name: {room.name}")
```
## Get Token
Generates a meeting token for a specific room.
The complete Daily room URL
Token expiration time in seconds
Whether to eject user when token expires
Whether the token should have owner privileges (overrides any setting in
params)
Additional token configuration. Note that `room_name`, `exp`,
`eject_at_token_exp`, and `is_owner` will be set based on the other function
parameters.
```python theme={null}
# Basic token generation
token = await helper.get_token(
room_url="https://your-domain.daily.co/my-room",
expiry_time=1800, # 30 minutes
owner=True,
eject_at_token_exp=True
)
# Advanced token generation with additional properties
token_params = DailyMeetingTokenParams(
properties=DailyMeetingTokenProperties(
user_name="John Doe",
start_video_off=True
)
)
token = await helper.get_token(
room_url="https://your-domain.daily.co/my-room",
expiry_time=1800,
owner=False,
eject_at_token_exp=True,
params=token_params
)
```
## Delete Room By URL
Deletes a room using its URL.
The complete Daily room URL
```python theme={null}
success = await helper.delete_room_by_url("https://your-domain.daily.co/my-room")
if success:
print("Room deleted successfully")
```
## Delete Room By Name
Deletes a room using its name.
The name of the Daily room
```python theme={null}
success = await helper.delete_room_by_name("my-room")
if success:
print("Room deleted successfully")
```
## Get Name From URL
Extracts the room name from a Daily room URL.
The complete Daily room URL
```python theme={null}
room_name = helper.get_name_from_url("https://your-domain.daily.co/my-room")
print(f"Room name: {room_name}") # Outputs: "my-room"
```
# DTMFAggregator
Source: https://docs.pipecat.ai/api-reference/server/utilities/dtmf-aggregator
Aggregates DTMF (phone keypad) input into meaningful sequences for LLM processing
## Overview
`DTMFAggregator` processes incoming DTMF (Dual-Tone Multi-Frequency) frames from phone keypad input and aggregates them into complete sequences that can be understood by LLM services. It buffers individual digit presses and flushes them as transcription frames when a termination digit is pressed, a timeout occurs, or an interruption happens.
This aggregator is essential for telephony applications where users interact via phone keypad buttons, converting raw DTMF input into structured text that LLMs can process alongside voice transcriptions.
## Constructor
```python theme={null}
aggregator = DTMFAggregator(
timeout=2.0,
termination_digit=KeypadEntry.POUND,
prefix="DTMF: "
)
```
Idle timeout in seconds before flushing the aggregated digits
Digit that triggers immediate flush of the aggregation
Prefix added to DTMF sequence in the output transcription
## Input Frames
Contains a single keypad button press with a KeypadEntry value
Flushes pending aggregation and stops the aggregation task
## Output Frames
Contains the aggregated DTMF sequence as text with the configured prefix
All input frames are passed through downstream, including the original `InputDTMFFrame` instances.
## Keypad Entries
The aggregator processes these standard phone keypad entries:
| KeypadEntry | Value | Description |
| --------------------- | ------------- | ----------------- |
| `ZERO` through `NINE` | `"0"` - `"9"` | Numeric digits |
| `STAR` | `"*"` | Star/asterisk key |
| `POUND` | `"#"` | Pound/hash key |
## Aggregation Behavior
The aggregator flushes (emits a TranscriptionFrame) when:
1. **Termination digit**: The configured termination digit is pressed (default: `#`)
2. **Timeout**: No new digits received within the timeout period (default: 2 seconds)
3. **Interruption**: A `InterruptionFrame` is received
4. **Pipeline end**: An `EndFrame` is received
## Usage Examples
### Basic Telephony Integration
```python theme={null}
from pipecat.processors.aggregators.dtmf_aggregator import DTMFAggregator
from pipecat.serializers.twilio import TwilioFrameSerializer
# Create DTMF aggregator with default settings
dtmf_aggregator = DTMFAggregator()
# Set up Twilio serializer for phone integration
serializer = TwilioFrameSerializer(
stream_sid=stream_sid,
call_sid=call_sid,
account_sid=os.getenv("TWILIO_ACCOUNT_SID"),
auth_token=os.getenv("TWILIO_AUTH_TOKEN")
)
# Create pipeline with DTMF processing
pipeline = Pipeline([
transport.input(), # Websocket input from Twilio
dtmf_aggregator, # Process DTMF before STT
stt, # Speech-to-text service
context_aggregator.user(),
llm, # LLM processes both voice and DTMF
tts, # Text-to-speech
transport.output(),
context_aggregator.assistant(),
])
```
### Custom Configuration for Menu Systems
```python theme={null}
# Configure for menu system with star termination
menu_dtmf = DTMFAggregator(
timeout=5.0, # Longer timeout for menu selection
termination_digit=KeypadEntry.STAR, # Use * to confirm selection
prefix="Menu selection: " # Clear prefix for LLM
)
# Update system prompt to handle DTMF input
messages = [
{
"role": "system",
"content": """You are a phone menu assistant.
When you receive input starting with "Menu selection:", this represents
button presses on the phone keypad:
- Single digits (1-9): Menu options
- 0: Often "speak to operator"
- *: Confirmation or "go back"
- #: Usually "repeat menu"
Respond appropriately to both voice and keypad input."""
}
]
```
## Sequence Examples
| User Input | Aggregation Trigger | Output TranscriptionFrame |
| ------------------ | ------------------- | ------------------------- |
| `1`, `2`, `3`, `#` | Termination digit | `"DTMF: 123#"` |
| `*`, `0` | 2-second timeout | `"DTMF: *0"` |
| `5`, interruption | InterruptionFrame | `"DTMF: 5"` |
| `9`, `9`, EndFrame | Pipeline shutdown | `"DTMF: 99"` |
## Frame Flow
```mermaid theme={null}
graph TD
A[Twilio WebSocket] --> B[TwilioFrameSerializer]
B --> C[DTMFAggregator]
C --> D[STT Service]
C --> E[TranscriptionFrame]
E --> F[LLM Context Aggregator]
D --> F
F --> G[LLM Service]
```
## Error Handling
The aggregator gracefully handles:
* Invalid DTMF digits (logged and ignored)
* Pipeline interruptions (flushes pending sequences)
* Rapid key presses (buffers efficiently)
* Mixed voice and DTMF input (processes independently)
## Best Practices
1. **System Prompt Design**: Train your LLM to recognize and respond to DTMF prefixed input
2. **Timeout Configuration**: Use shorter timeouts (1-2s) for rapid entry, longer (3-5s) for menu selection
3. **Termination Strategy**: Use `#` for confirmation, `*` for cancel/back operations
4. **Pipeline Placement**: Always place before the user context aggregator to ensure proper frame ordering
# IVRNavigator
Source: https://docs.pipecat.ai/api-reference/server/utilities/extensions/ivr
AI-powered Interactive Voice Response system navigation with automatic classification and goal-oriented decision making
## Overview
The `IVRNavigator` is a pipeline component that provides intelligent navigation of IVR phone systems. It combines LLM-based decision making with DTMF tone generation to automatically traverse phone menus toward specified goals. The navigator includes automatic classification between IVR systems and human conversations, enabling flexible call handling scenarios.
Complete API documentation and method details
## Constructor
```python theme={null}
from pipecat.extensions.ivr.ivr_navigator import IVRNavigator, IVRStatus
IVRNavigator(
llm,
ivr_prompt,
ivr_vad_params=None
)
```
### Parameters
The LLM service instance used for text generation and navigation decision
making.
The navigation goal that will be integrated with the base IVR navigation
instructions. This should clearly describe what you want to accomplish (e.g.,
"Navigate to billing support to discuss account charges").
Voice Activity Detection parameters optimized for IVR navigation. The default
2.0 second stop time allows the system to hear complete menu options before
responding, improving navigation success rates.
## Navigation Commands
The IVRNavigator processes several XML-tagged commands from LLM responses:
### DTMF Commands
```xml theme={null}
1123
```
Valid DTMF values: `0-9`, `*`, `#`
### IVR Status Commands
```xml theme={null}
detectedcompletedstuckwait
```
### Mode Classification
```xml theme={null}
ivrconversation
```
## Event Handlers
The navigator supports two primary event handlers for different interaction scenarios.
### on\_conversation\_detected
Triggered when the system classifies incoming audio as a human conversation rather than an IVR system.
```python theme={null}
@ivr_navigator.event_handler("on_conversation_detected")
async def on_conversation_detected(processor, conversation_history: List[dict]):
# Handle human conversation scenario
# conversation_history contains previous interaction context
pass
```
**Parameters:**
* `processor`: The IVRProcessor instance
* `conversation_history`: List of message dictionaries from previous context (excluding system messages)
### on\_ivr\_status\_changed
Triggered when IVR navigation status changes during the navigation process.
```python theme={null}
@ivr_navigator.event_handler("on_ivr_status_changed")
async def on_ivr_status_changed(processor, status: IVRStatus):
# Handle navigation status changes
if status == IVRStatus.COMPLETED:
# Navigation successful
pass
elif status == IVRStatus.STUCK:
# Navigation failed
pass
```
**Parameters:**
* `processor`: The IVRProcessor instance
* `status`: IVRStatus enum value (`DETECTED`, `COMPLETED`, `STUCK`)
## IVRStatus Enumeration
```python theme={null}
from pipecat.extensions.ivr.ivr_navigator import IVRStatus
class IVRStatus(Enum):
DETECTED = "detected" # IVR system detected and navigation started
COMPLETED = "completed" # Navigation goal successfully achieved
STUCK = "stuck" # Navigation unable to proceed
WAIT = "wait" # Waiting for more complete information
```
## Built-in Prompts
The IVRNavigator includes sophisticated built-in prompts that handle classification and navigation logic automatically.
### Classification Prompt
The navigator starts with an automatic classification system that distinguishes between:
**IVR System Indicators:**
* Menu options ("Press 1 for billing", "Press 2 for support")
* Automated instructions ("Please enter your account number")
* System prompts ("Thank you for calling \[company]")
* Hold messages ("Please continue to hold")
**Human Conversation Indicators:**
* Personal greetings ("Hello, this is Sarah")
* Interactive responses ("Who am I speaking with?")
* Natural speech patterns and conversational flow
* Direct engagement ("I can help with that")
### Navigation Prompt
Once IVR is detected, the system uses your provided goal with comprehensive navigation instructions that include:
* **Decision making logic** for menu option selection
* **DTMF sequence handling** for data entry (dates, account numbers, phone numbers)
* **Verbal response generation** for conversational prompts
* **Completion detection** for successful navigation
* **Stuck state recognition** for error scenarios
* **Wait state management** for incomplete transcriptions
## Advanced Usage
### Context Preservation
The navigator automatically preserves conversation context when switching between modes:
```python theme={null}
# Context is automatically saved and restored during mode transitions
# No manual intervention required for basic scenarios
# For advanced context manipulation:
@ivr_navigator.event_handler("on_conversation_detected")
async def handle_conversation(processor, conversation_history):
# conversation_history contains preserved context
# Add additional context as needed
enhanced_context = [
{"role": "system", "content": "You are a customer service representative."},
*conversation_history,
{"role": "assistant", "content": "How can I help you today?"}
]
await task.queue_frame(LLMMessagesUpdateFrame(messages=enhanced_context, run_llm=True))
```
### VAD Parameter Management
```python theme={null}
# Switch to conversation-optimized timing
@ivr_navigator.event_handler("on_conversation_detected")
async def optimize_for_conversation(processor, conversation_history):
# Reduce response delay for natural conversation flow
conversation_vad = VADParams(stop_secs=0.8)
await task.queue_frame(VADParamsUpdateFrame(params=conversation_vad))
```
## Integration Examples
### Basic Pipeline Integration
```python theme={null}
from pipecat.pipeline.pipeline import Pipeline
from pipecat.processors.ivr.ivr_navigator import IVRNavigator
pipeline = Pipeline([
transport.input(),
stt,
context_aggregator.user(),
ivr_navigator, # Add in place of LLM
tts,
transport.output(),
context_aggregator.assistant()
])
```
# VoicemailDetector
Source: https://docs.pipecat.ai/api-reference/server/utilities/extensions/voicemail
Technical implementation details and architecture for voicemail detection
## Overview
This document provides technical implementation details for the VoicemailDetector, including its parallel pipeline architecture and performance considerations for production use.
Complete API documentation for all VoicemailDetector classes and methods
Working example showing VoicemailDetector integration in a complete pipeline
## Architecture
The VoicemailDetector uses a parallel pipeline architecture with two processing branches:
* **Conversation Branch**: Handles normal conversation flow and can be blocked when voicemail is detected to prevent the main LLM from processing additional input.
* **Classification Branch**: Contains the LLM classifier and decision logic. This branch closes after making a classification decision to prevent unnecessary LLM calls.
The system coordinates between branches using a notification system and gates that control frame flow based on classification decisions. TTS frames are buffered during classification and either released (for conversations) or cleared (for voicemail) based on the decision.
## Performance Considerations
### LLM Selection
The VoicemailDetector works with two separate LLMs in your pipeline:
**Conversation LLM (main pipeline):**
* Must be text-based for compatibility with the VoicemailDetector's gating system
* Realtime LLMs are not compatible as they don't allow the required level of output control
* This is the LLM that handles normal conversation after classification
**Classifier LLM (VoicemailDetector parameter):**
* Can be either text-based or realtime LLMs
* For realtime LLMs: Set output modality to `text` to ensure text-based responses
* Must be able to output "CONVERSATION" or "VOICEMAIL" keywords for classification
* Recommended models: OpenAILLMService with `gpt-4o`, GoogleLLMService with `gemini-2.0-flash`
The two LLMs operate independently. You can use a realtime LLM for classification (with text output) while using a text-based LLM for conversation, or use text-based LLMs for both.
### Response Timing
The `voicemail_response_delay` parameter should be tuned based on your target voicemail systems. The default value of 2 seconds is a good starting point, but you may need to adjust it based on your target voicemail systems.
## Common Issues
**Classification not working:**
* Confirm both `detector()` and `gate()` are correctly placed in pipeline
* Check that STT service is producing text input for classification
* Ensure the LLM is a text-based LLM
**Timing problems:**
* Adjust `voicemail_response_delay` based on observed voicemail greeting patterns
* Monitor classification speed; slow LLM responses affect conversation latency
**Audio playback issues:**
* Ensure `gate()` is placed immediately after TTS service
* Verify TTS frames are being generated before classification completes
**Custom prompt validation:**
The system validates custom prompts and warns if they're missing required response keywords ("CONVERSATION" and "VOICEMAIL"). Include the `CLASSIFIER_RESPONSE_INSTRUCTION` constant to ensure proper functionality.
## Event Handlers
| Event | Description |
| -------------------------- | ------------------------------------------ |
| `on_conversation_detected` | Live conversation detected (not voicemail) |
| `on_voicemail_detected` | Voicemail detected |
```python theme={null}
@voicemail_detector.event_handler("on_conversation_detected")
async def on_conversation_detected(detector):
print("Live person detected — starting conversation")
@voicemail_detector.event_handler("on_voicemail_detected")
async def on_voicemail_detected(detector):
print("Voicemail detected — leaving message")
```
# FrameFilter
Source: https://docs.pipecat.ai/api-reference/server/utilities/filters/frame-filter
Processor that selectively passes through only specified frame types
## Overview
`FrameFilter` is a processor that filters frames based on their types, only passing through frames that match specified types (plus some system frames like `EndFrame` and `SystemFrame`).
## Constructor Parameters
Tuple of frame types that should be passed through the filter
## Functionality
When a frame passes through the filter, it is checked against the provided types. Only frames that match one of the specified types (or are system frames) will be passed downstream. All other frames are dropped.
## Output Frames
The processor always passes through:
* Frames matching any of the specified types
* `EndFrame` and `SystemFrame` instances (always allowed, so as to not block the pipeline)
## Usage Example
```python theme={null}
from pipecat.frames.frames import TextFrame, AudioRawFrame, Frame
from pipecat.processors.filters import FrameFilter
from typing import Tuple, Type
# Create a filter that only passes TextFrames and AudioRawFrames
text_and_audio_filter = FrameFilter(
types=(TextFrame, AudioRawFrame)
)
# Add to pipeline
pipeline = Pipeline([
source,
text_and_audio_filter, # Filters out all other frame types
destination
])
```
## Frame Flow
```mermaid theme={null}
graph TD
A[Input Frames] --> B[FrameFilter]
B --> C{Frame Type Check}
C -->|Matches Allowed Types| D[Output Frame]
C -->|System Frame| D
C -->|Other Frame Types| E[Dropped]
```
## Notes
* Simple but powerful way to restrict which frame types flow through parts of your pipeline
* Always allows system frames to pass through for proper pipeline operation
* Can be used to isolate specific parts of your pipeline from certain frame types
* Efficient implementation with minimal overhead
# FunctionFilter
Source: https://docs.pipecat.ai/api-reference/server/utilities/filters/function-filter
Processor that filters frames using a custom filter function
## Overview
`FunctionFilter` is a flexible processor that uses a custom async function to determine which frames to pass through. This allows for complex, dynamic filtering logic beyond simple type checking.
## Constructor Parameters
Async function that examines each frame and returns True to allow it or False
to filter it out
Which direction of frames to filter (DOWNSTREAM or UPSTREAM)
## Functionality
When a frame passes through the processor:
1. System frames and end frames are always passed through
2. Frames moving in a different direction than specified are always passed through
3. Other frames are passed to the filter function
4. If the filter function returns True, the frame is passed through
## Output Frames
The processor conditionally passes through frames based on:
* Frame type (system frames and end frames always pass)
* Frame direction (only filters in the specified direction)
* Result of the custom filter function
## Usage Example
```python theme={null}
from pipecat.frames.frames import TextFrame, Frame
from pipecat.processors.filters import FunctionFilter
from pipecat.processors.frame_processor import FrameDirection
# Create filter that only allows TextFrames with more than 10 characters
async def long_text_filter(frame: Frame) -> bool:
if isinstance(frame, TextFrame):
return len(frame.text) > 10
return False
# Apply filter to downstream frames only
text_length_filter = FunctionFilter(
filter=long_text_filter,
direction=FrameDirection.DOWNSTREAM
)
# Add to pipeline
pipeline = Pipeline([
source,
text_length_filter, # Filters out short text frames
destination
])
```
## Frame Flow
```mermaid theme={null}
graph TD
A[Input Frames] --> B[FunctionFilter]
B --> C{System/End Frame?}
C -->|Yes| F[Output Frame]
C -->|No| D{Correct Direction?}
D -->|No| F
D -->|Yes| E{Filter Function}
E -->|Returns True| F
E -->|Returns False| G[Dropped]
```
## Notes
* Provides maximum flexibility for complex filtering logic
* Can incorporate dynamic conditions that change at runtime
* Only filters frames moving in the specified direction
* Always passes through system frames for proper pipeline operation
* Can be used to create sophisticated content-based filters
* Supports async filter functions for complex processing
# IdentityFilter
Source: https://docs.pipecat.ai/api-reference/server/utilities/filters/identify-filter
Processor that passes all frames through without modification
## Overview
`IdentityFilter` is a simple pass-through processor that forwards all frames without any modification or filtering. It acts as a transparent layer in your pipeline, allowing all frames to flow through unchanged.
Check out Observers for an option that delivers similar functionality but
doesn't require a processor to reside in the Pipeline.
## Constructor Parameters
The `IdentityFilter` constructor accepts no specific parameters beyond those inherited from `FrameProcessor`.
## Functionality
When a frame passes through the processor, it is immediately forwarded in the same direction with no changes. This applies to all frame types and both directions (upstream and downstream).
## Use Cases
While functionally equivalent to having no filter at all, `IdentityFilter` can be useful in several scenarios:
* Testing `ParallelPipeline` configurations to ensure frames aren't duplicated
* Acting as a placeholder where a more complex filter might be added later
* Monitoring frame flow in pipelines by adding logging in subclasses
* Creating a base class for more complex conditional filters
## Usage Example
```python theme={null}
from pipecat.processors.filters import IdentityFilter
# Create an identity filter
pass_through = IdentityFilter()
# Add to pipeline
pipeline = Pipeline([
source,
pass_through, # All frames pass through unchanged
destination
])
```
## Frame Flow
```mermaid theme={null}
graph LR
A[Input Frame] --> B[IdentityFilter] --> C[Output Frame]
```
## Notes
* Simplest possible filter implementation
* Passes all frames through without modification
* Useful in testing parallel pipelines
* Can serve as a placeholder or base class
* Zero overhead in normal operation
# NullFilter
Source: https://docs.pipecat.ai/api-reference/server/utilities/filters/null-filter
Processor that blocks all frames except system frames
## Overview
`NullFilter` is a filtering processor that blocks all frames from passing through, with the exception of system frames and end frames which are required for proper pipeline operation.
## Constructor Parameters
The `NullFilter` constructor accepts no specific parameters beyond those inherited from `FrameProcessor`.
## Functionality
When a frame passes through the processor:
* If the frame is a `SystemFrame` or `EndFrame`, it is passed through
* All other frame types are blocked and do not continue through the pipeline
This filter effectively acts as a barrier that allows only the essential system frames required for pipeline initialization, shutdown, and management.
## Use Cases
`NullFilter` is useful in several scenarios:
* Temporarily disabling parts of a pipeline without removing components
* Creating dead-end branches in parallel pipelines
* Testing pipeline behavior with blocked communication
* Implementing conditional pipelines where certain paths should be blocked
## Usage Example
```python theme={null}
from pipecat.processors.filters import NullFilter
# Create a null filter that blocks all non-system frames
blocker = NullFilter()
# Add to pipeline
pipeline = Pipeline([
source,
blocker, # Blocks all regular frames
destination # Will only receive system frames
])
```
## Frame Flow
```mermaid theme={null}
graph TD
A[Input Frames] --> B[NullFilter]
B --> C{System/End Frame?}
C -->|Yes| D[Output Frame]
C -->|No| E[Blocked]
```
## Notes
* Blocks all regular frames in both directions
* Only allows system frames and end frames to pass through
* Useful for testing, debugging, and creating conditional pipelines
* Minimal overhead as it performs simple type checking
* Can be used to temporarily disable parts of a pipeline
# STTMuteFilter
Source: https://docs.pipecat.ai/api-reference/server/utilities/filters/stt-mute
Processor for controlling STT muting and interruption handling during bot speech and function calls
DEPRECATED: STTMuteFilter has been deprecated in favor of [User Mute
Strategies](/api-reference/server/utilities/turn-management/user-mute-strategies). Configure
`user_mute_strategies` on the `LLMUserAggregator` instead.
## Overview
`STTMuteFilter` is a general-purpose processor that combines STT muting and interruption control. When active, it prevents both transcription and interruptions during specified conditions (e.g., bot speech, function calls), providing a cleaner conversation flow.
The processor supports multiple simultaneous strategies for when to mute the STT service, making it flexible for different use cases.
Want to try it out? Check out the [STTMuteFilter foundational
demo](https://github.com/pipecat-ai/pipecat/blob/main/examples/turn-management/turn-management-user-mute-strategy.py)
## Constructor Parameters
Configuration object that defines the muting strategies and optional custom
logic
The STT service to control (deprecated, will be removed in a future version)
## Configuration
The processor is configured using `STTMuteConfig`, which determines when and how the STT service should be muted:
Set of muting strategies to apply
Optional callback for custom muting logic (required when strategy is `CUSTOM`)
### Muting Strategies
`STTMuteConfig` accepts a set of these `STTMuteStrategy` values:
Mute only during the bot's first speech (typically during introduction)
Start muted and remain muted until first bot speech completes. Useful when bot
speaks first and you want to ensure its first response cannot be interrupted.
Mute during LLM function calls (e.g., API requests, external service calls)
Mute during all bot speech
Use custom logic provided via callback to determine when to mute. The callback
is invoked when the bot is speaking and can use application state to decide
whether to mute. When the bot stops speaking, unmuting occurs automatically if
no other strategy requires muting.
`MUTE_UNTIL_FIRST_BOT_COMPLETE` and `FIRST_SPEECH` strategies should not be
used together as they handle the first bot speech differently.
## Input Frames
Indicates bot has started speaking
Indicates bot has stopped speaking
Indicates a function call has started
Indicates a function call has completed
Indicates an interim transcription result (suppressed when muted)
User interruption start event (suppressed when muted)
User interruption stop event (suppressed when muted)
Indicates a transcription result (suppressed when muted)
Indicates user has started speaking (suppressed when muted)
Indicates user has stopped speaking (suppressed when muted)
## Output Frames
Control frame to mute/unmute the STT service
All input frames are passed through except VAD-related frames (interruptions and user speaking events) when muted.
## Usage Examples
### Basic Usage (Mute During Bot's First Speech)
```python theme={null}
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt_mute_filter = STTMuteFilter(
config=STTMuteConfig(strategies={
STTMuteStrategy.FIRST_SPEECH
})
)
pipeline = Pipeline([
transport.input(),
stt,
stt_mute_filter, # Between the STT service and context aggregator
context_aggregator.user(),
# ... rest of pipeline
])
```
### Mute Until First Bot Response Completes
```python theme={null}
stt_mute_filter = STTMuteFilter(
config=STTMuteConfig(strategies={STTMuteStrategy.MUTE_UNTIL_FIRST_BOT_COMPLETE})
)
```
This ensures no user speech is processed until after the bot's first complete response.
### Always Mute During Bot Speech
```python theme={null}
stt_mute_filter = STTMuteFilter(
config=STTMuteConfig(strategies={STTMuteStrategy.ALWAYS})
)
```
### Custom Muting Logic
The `CUSTOM` strategy allows you to control muting based on application state when the bot is speaking. The callback will be invoked whenever the bot is speaking, and your logic decides whether to mute:
```python theme={null}
# Create a state manager
class SessionState:
def __init__(self):
self.session_ending = False
session_state = SessionState()
# Callback function that determines whether to mute
async def session_state_mute_logic(stt_filter: STTMuteFilter) -> bool:
# Return True to mute, False otherwise
# This is called when the bot is speaking
return session_state.session_ending
# Configure filter with CUSTOM strategy
stt_mute_filter = STTMuteFilter(
config=STTMuteConfig(
strategies={STTMuteStrategy.CUSTOM},
should_mute_callback=session_state_mute_logic
)
)
# Later, when you want to trigger muting (e.g., during session timeout):
async def handle_session_timeout():
# Update state that will be checked by the callback
session_state.session_ending = True
# Send goodbye message
goodbye_message = "Thank you for using our service. This session is now ending."
await pipeline.push_frame(TTSSpeakFrame(text=goodbye_message))
# The system will automatically mute during this message because:
# 1. Bot starts speaking, triggering the callback
# 2. Callback returns True (session_ending is True)
# 3. When bot stops speaking, unmuting happens automatically
```
### Combining Multiple Strategies
```python theme={null}
async def custom_mute_logic(processor: STTMuteFilter) -> bool:
# Example: Mute during business hours only
current_hour = datetime.now().hour
return 9 <= current_hour < 17
stt_mute_filter = STTMuteFilter(
config=STTMuteConfig(
strategies={
STTMuteStrategy.FUNCTION_CALL, # Mute during function calls
STTMuteStrategy.CUSTOM, # And during business hours
STTMuteStrategy.MUTE_UNTIL_FIRST_BOT_COMPLETE # And until first bot speech completes
},
should_mute_callback=custom_mute_logic
)
)
```
### Frame Flow
```mermaid theme={null}
graph TD
A[Transport Input] --> B[STTMuteFilter]
B --> C[STT Service]
B -- "Suppressed when muted" --> D[VAD-related Frames]
B -- "STTMuteFrame" --> C
```
## Notes
* Combines STT muting and interruption control into a single concept
* Muting prevents both transcription and interruptions
* Multiple strategies can be active simultaneously
* CUSTOM strategy callback is only invoked when the bot is speaking
* Unmuting happens automatically when bot speech ends (if no other strategy requires muting)
* Placed between the STT service and context aggregator in pipeline
* Maintains conversation flow during bot speech and function calls
* Efficient state tracking for minimal overhead
# WakeCheckFilter
Source: https://docs.pipecat.ai/api-reference/server/utilities/filters/wake-check-filter
Processor that passes frames only after detecting wake phrases in transcriptions
**Deprecated:** `WakeCheckFilter` is deprecated in favor of
\[`WakePhraseUserTurnStartStrategy`]\(/api-reference/server/utilities/turn-management/user-turn-strategies#wakephraseuserturns tartstrategy).
The new strategy provides better integration with turn management and supports
both timeout and single-activation modes.
## Overview
`WakeCheckFilter` monitors `TranscriptionFrame`s for specified wake phrases and only allows frames to pass through after a wake phrase has been detected. It includes a keepalive timeout to maintain the awake state for a period after detection, allowing continuous conversation without requiring repeated wake phrases.
## Constructor Parameters
List of wake phrases to detect in transcriptions
Number of seconds to remain in the awake state after each transcription
## Functionality
The filter maintains state for each participant and processes frames as follows:
1. `TranscriptionFrame` objects are checked for wake phrases
2. If a wake phrase is detected, the filter enters the "AWAKE" state
3. While in the "AWAKE" state, all transcription frames pass through
4. After no activity for the keepalive timeout period, the filter returns to "IDLE"
5. All non-transcription frames pass through normally
Wake phrases are detected using regular expressions that match whole words with flexible spacing, making detection resilient to minor transcription variations.
## States
Default state - only non-transcription frames pass through
Active state after wake phrase detection - all frames pass through
## Output Frames
* All non-transcription frames pass through unchanged
* After wake phrase detection, transcription frames pass through
* When awake, transcription frames reset the keepalive timer
## Usage Example
```python theme={null}
from pipecat.processors.filters import WakeCheckFilter
# Create filter with wake phrases
wake_filter = WakeCheckFilter(
wake_phrases=["hey assistant", "ok computer", "listen up"],
keepalive_timeout=5.0 # Stay awake for 5 seconds after each transcription
)
# Add to pipeline
pipeline = Pipeline([
transport.input(),
stt_service,
wake_filter, # Only passes transcriptions after wake phrases
llm_service,
tts_service,
transport.output()
])
```
## Frame Flow
```mermaid theme={null}
graph TD
A[Input Frames] --> B[WakeCheckFilter]
B --> C{Transcription Frame?}
C -->|No| F[Output Frame]
C -->|Yes| D{Wake State}
D -->|AWAKE| E{Keepalive Expired?}
E -->|No| F
E -->|Yes| G[Return to IDLE]
D -->|IDLE| H{Contains Wake Phrase?}
H -->|Yes| I[Set AWAKE] --> F
H -->|No| J[Filtered Out]
```
## Notes
* Maintains separate state for each participant ID
* Uses regex pattern matching for resilient wake phrase detection
* Accumulates transcription text to detect phrases across multiple frames
* Trims accumulated text when wake phrase is detected
* Supports multiple wake phrases
* Passes all non-transcription frames through unchanged
* Error handling produces ErrorFrames for robust operation
* Case-insensitive matching for natural language use
# WakeNotifierFilter
Source: https://docs.pipecat.ai/api-reference/server/utilities/filters/wake-notifier-filter
Processor that triggers a notifier when specified frame types pass a custom filter
## Overview
`WakeNotifierFilter` monitors the pipeline for specific frame types and triggers a notification when those frames pass a custom filter condition. It passes all frames through unchanged while performing this notification side-effect.
## Constructor Parameters
The notifier object to trigger when conditions are met
Tuple of frame types to monitor
Async function that examines each matching frame and returns True to trigger
notification
## Functionality
The processor operates as follows:
1. Checks if the incoming frame matches any of the specified types
2. If it's a matching type, calls the filter function with the frame
3. If the filter returns True, triggers the notifier
4. Passes all frames through unchanged, regardless of the filtering result
This allows for notification side-effects without modifying the pipeline's data flow.
## Output Frames
* All frames pass through unchanged in their original direction
* No frames are modified or filtered out
## Usage Example
```python theme={null}
from pipecat.frames.frames import TranscriptionFrame, UserStartedSpeakingFrame
from pipecat.processors.filters import WakeNotifierFilter
from pipecat.sync.event_notifier import EventNotifier
# Create an event notifier
wake_event = EventNotifier()
# Create filter that notifies when certain wake phrases are detected
async def wake_phrase_filter(frame):
if isinstance(frame, TranscriptionFrame):
return "hey assistant" in frame.text.lower()
return False
# Add to pipeline
wake_notifier = WakeNotifierFilter(
notifier=wake_event,
types=(TranscriptionFrame, UserStartedSpeakingFrame),
filter=wake_phrase_filter
)
# In another component, wait for the notification
async def handle_wake_event():
await wake_event.wait()
print("Wake phrase detected!")
```
## Frame Flow
```mermaid theme={null}
graph TD
A[Input Frame] --> B[WakeNotifierFilter]
B --> C{Matching Type?}
C -->|Yes| D{Filter Function}
D -->|Returns True| E[Notify]
D -->|Returns False| F[Pass Through]
C -->|No| F
E --> F
```
## Notes
* Acts as a transparent pass-through for all frames
* Can trigger external events without modifying pipeline flow
* Useful for signaling between pipeline components
* Can monitor for multiple frame types simultaneously
* Uses async filter function for complex conditions
* Functions as a "listener" that doesn't affect the data stream
* Can be used for logging, analytics, or coordinating external systems
# LLMTextProcessor
Source: https://docs.pipecat.ai/api-reference/server/utilities/frame/llm-text-processor
A processor for aggregating LLMTextFrames into logical units before passing them to downstream services
## Overview
`LLMTextProcessor` is a processor designed to aggregate `LLMTextFrame`s into coherent text units before passing them to downstream services, such as TTS. By utilizing text aggregators, it ensures that text is properly segmented and structured, enhancing the quality of subsequent processing. This processor expects `LLMTextFrame`s as input and outputs `AggregatedTextFrame`s containing the aggregated text.
If an `LLMTextProcessor` is in use, the `text_aggregator` parameter of TTS
services will be ignored, as text aggregation is handled upstream.
The benefit of pre-aggregating LLM text frames is that it allows for more controlled and meaningful text synthesis. Downstream services can operate on complete sentences or logical text blocks. For TTS services, this means being able to customize how certain types of text are spoken (e.g., spelling out phone numbers, stripping out url protocols, or inserting other tts-specific annotations) or even skipping over certain text segments entirely (e.g., code snippets or markup). For other services, such as RTVI, it allows for sending these logical text units as separate `bot-output` messages, supporting custom client-side handling and rendering (e.g. collapsible code blocks, clickable links, etc.).
Pipecat's API methods for LLMTextProcessor
Complete example with LLMTextProcessor and custom TTS and RTVI handling
## Constructor Parameters
An instance of a text aggregator (e.g., `PatternPairAggregator` or a custom
aggregator type) used to aggregate incoming text from `LLMTextFrame`s. If
`None`, a `SimpleTextAggregator` will be used by default, aggregating text
based on sentence boundaries.
## Usage
The `LLMTextProcessor` should be integrated into your pipeline after the LLM service and before any services that consume text, such as TTS. It processes incoming `LLMTextFrame`s, aggregates their text content, and outputs `AggregatedTextFrame`s.
For more usage examples, check out the docs for the
[PatternPairAggregator](/api-reference/server/utilities/text/pattern-pair-aggregator#usage-examples).
```python theme={null}
from pipecat.processors.aggregators.llm_text_processor import LLMTextProcessor
...
llm_text_aggregator = PatternPairAggregator()
llm_text_aggregator.add_pattern(
type="code",
start_pattern="",
end_pattern="",
action=MatchAction.AGGREGATE,
)
llm_text_processor = LLMTextProcessor(text_aggregator=llm_text_aggregator)
...
# Pipeline - The following pipeline is typical for a STT->LLM->TTS bot + RTVI
# with the addition of the LLMTextProcessor to handle special text segments.
pipeline = Pipeline(
[
transport.input(),
rtvi,
stt,
transcript_processor.user(),
context_aggregator.user(),
llm,
llm_text_processor,
tts,
transport.output(),
transcript_processor.assistant(),
context_aggregator.assistant(),
]
)
```
# Producer & Consumer Processors
Source: https://docs.pipecat.ai/api-reference/server/utilities/frame/producer-consumer
Route frames between different parts of a pipeline, allowing selective frame sharing across parallel branches or within complex pipelines
## Overview
The Producer and Consumer processors work as a pair to route frames between different parts of a pipeline, particularly useful when working with [`ParallelPipeline`](/api-reference/server/pipeline/parallel-pipeline). They allow you to selectively capture frames from one pipeline branch and inject them into another.
## ProducerProcessor
`ProducerProcessor` examines frames flowing through the pipeline, applies a filter to decide which frames to share, and optionally transforms these frames before sending them to connected consumers.
### Constructor Parameters
An async function that determines which frames should be sent to consumers.
Should return `True` for frames to be shared.
Optional async function that transforms frames before sending to consumers. By
default, passes frames unchanged.
When `True`, passes all frames through the normal pipeline flow. When `False`,
only passes through frames that don't match the filter.
## ConsumerProcessor
`ConsumerProcessor` receives frames from a `ProducerProcessor` and injects them into its pipeline branch.
### Constructor Parameters
The producer processor that will send frames to this consumer.
Optional async function that transforms frames before injecting them into the
pipeline.
The direction in which to push received frames. Usually `DOWNSTREAM` to send
frames forward in the pipeline.
## Usage Examples
### Basic Usage: Moving TTS Audio Between Branches
```python theme={null}
# Create a producer that captures TTS audio frames
async def is_tts_audio(frame: Frame) -> bool:
return isinstance(frame, TTSAudioRawFrame)
# Define an async transformer function
async def tts_to_input_audio_transformer(frame: Frame) -> Frame:
if isinstance(frame, TTSAudioRawFrame):
# Convert TTS audio to input audio format
return InputAudioRawFrame(
audio=frame.audio,
sample_rate=frame.sample_rate,
num_channels=frame.num_channels
)
return frame
producer = ProducerProcessor(
filter=is_tts_audio,
transformer=tts_to_input_audio_transformer
passthrough=True # Keep these frames in original pipeline
)
# Create a consumer to receive the frames
consumer = ConsumerProcessor(
producer=producer,
direction=FrameDirection.DOWNSTREAM
)
# Use in a ParallelPipeline
pipeline = Pipeline([
transport.input(),
ParallelPipeline(
# Branch 1: LLM for bot responses
[
llm,
tts,
producer, # Capture TTS audio here
],
# Branch 2: Audio processing branch
[
consumer, # Receive TTS audio here
llm, # Speech-to-Speech LLM (audio in)
]
),
transport.output(),
])
```
# MCPClient
Source: https://docs.pipecat.ai/api-reference/server/utilities/mcp/mcp
Service to connect to MCP (Model Context Protocol) servers
## Overview
MCP is an open standard for enabling AI agents to interact with external data and tools. `MCPClient` provides a way to access and call tools via MCP. For example, instead of writing bespoke function call implementations for an external API, you may use an MCP server that provides a bridge to the API. *Be aware there may be security implications.* See [MCP documenation](https://github.com/modelcontextprotocol) for more details.
## Installation
To use `MCPClient`, install the required dependencies:
```bash theme={null}
pip install "pipecat-ai[mcp]"
```
You may also need to set environment variables as required by the specific MCP server to which you are connecting.
## Configuration
### Constructor Parameters
You can connect to your MCP server via Stdio, SSE, or Streamable HTTP transport. See [here](https://modelcontextprotocol.io/docs/concepts/transports#built-in-transport-types) for more documentation on MCP transports.
Connection parameters for the MCP server. Must be one of:
* `StdioServerParameters` (from `mcp`): Connects to a local MCP server process via stdio.
```python theme={null}
from mcp import StdioServerParameters
StdioServerParameters(
command="python", # Executable
args=["server.py"], # Optional command line arguments
env=None, # Optional environment variables
)
```
* `SseServerParameters` (from `mcp.client.session_group`): Connects to a remote MCP server via Server-Sent Events.
```python theme={null}
from mcp.client.session_group import SseServerParameters
SseServerParameters(
url="https://your.mcp.server/sse", # Server URL
headers=None, # Optional HTTP headers
timeout=5, # Connection timeout in seconds
sse_read_timeout=300, # SSE read timeout in seconds
)
```
* `StreamableHttpParameters` (from `mcp.client.session_group`): Connects to a remote MCP server via Streamable HTTP.
```python theme={null}
from mcp.client.session_group import StreamableHttpParameters
StreamableHttpParameters(
url="https://your.mcp.server/mcp/", # Server URL
headers=None, # Optional HTTP headers
timeout=5, # Connection timeout in seconds
sse_read_timeout=300, # SSE read timeout in seconds
terminate_on_close=True, # Terminate session on close
)
```
Optional list of tool names to register. If `None`, all tools from the MCP
server are registered. Use this to limit which tools are exposed to the LLM.
Optional dictionary mapping tool names to filter functions that post-process tool outputs. Each filter function receives the raw tool output and returns the processed output.
```python theme={null}
mcp = MCPClient(
server_params=server_params,
tools_output_filters={
"search": lambda result: result[:500], # Truncate long results
},
)
```
### Input Parameters
See more information regarding server params [here](https://github.com/modelcontextprotocol/python-sdk?tab=readme-ov-file#writing-mcp-clients).
## Usage Examples
### MCP Stdio Transport
```python theme={null}
import shutil
from mcp import StdioServerParameters
from pipecat.services.mcp_service import MCPClient
# Initialize an LLM
llm = ...
# Initialize and configure MCPClient with server parameters
mcp = MCPClient(
server_params=StdioServerParameters(
command=shutil.which("npx"),
args=["-y", "@name/mcp-server-name@latest"],
env={"ENV_API_KEY": ""},
)
)
# Create tools schema from the MCP server and register them with llm
tools = await mcp.register_tools(llm)
# Create context with system message and tools
context = LLMContext(
messages=[
{
"role": "system",
"content": "You are a helpful assistant in a voice conversation. You have access to MCP tools. Keep responses concise."
}
],
tools=tools
)
```
### MCP SSE Transport
```python theme={null}
from mcp.client.session_group import SseServerParameters
from pipecat.services.mcp_service import MCPClient
# Initialize an LLM
llm = ...
# Initialize and configure MCPClient with SSE server parameters
mcp = MCPClient(
server_params=SseServerParameters(
url="https://your.mcp.server/sse",
)
)
# Create tools schema from the MCP server and register them with llm
tools = await mcp.register_tools(llm)
# Create context with system message and tools
context = LLMContext(
messages=[
{
"role": "system",
"content": "You are a helpful assistant in a voice conversation. You have access to MCP tools. Keep responses concise."
}
],
tools=tools
)
```
### MCP Streamable HTTP Transport
```python theme={null}
import os
from mcp.client.session_group import StreamableHttpParameters
from pipecat.services.mcp_service import MCPClient
# Initialize an LLM
llm = ...
# Initialize and configure MCPClient with Streamable HTTP parameters
mcp = MCPClient(
server_params=StreamableHttpParameters(
url="https://api.githubcopilot.com/mcp/",
headers={"Authorization": f"Bearer {os.getenv('GITHUB_PERSONAL_ACCESS_TOKEN')}"},
)
)
# Create tools schema from the MCP server and register them with llm
tools = await mcp.register_tools(llm)
# Create context with tools
context = LLMContext(tools=tools)
```
### Two-Step Registration
Some LLM services (e.g. Gemini Live) require tools to be passed at construction time. Use `get_tools_schema()` to obtain the schema first, then `register_tools_schema()` to register handlers after the LLM is created.
```python theme={null}
from mcp.client.session_group import StreamableHttpParameters
from pipecat.services.mcp_service import MCPClient
# Initialize MCPClient
mcp = MCPClient(
server_params=StreamableHttpParameters(
url="https://api.githubcopilot.com/mcp/",
headers={"Authorization": f"Bearer {os.getenv('GITHUB_PERSONAL_ACCESS_TOKEN')}"},
)
)
# Step 1: Get tools schema without registering
tools = await mcp.get_tools_schema()
# Step 2: Create LLM with tools
llm = GeminiLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
system_instruction="You are a helpful assistant.",
tools=tools,
)
# Step 3: Register tool handlers with the LLM
await mcp.register_tools_schema(tools, llm)
```
### Multiple MCP Servers
You can combine tools from multiple MCP servers by merging their `ToolsSchema` objects.
```python theme={null}
from mcp import StdioServerParameters
from mcp.client.session_group import StreamableHttpParameters
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.services.mcp_service import MCPClient
# Initialize an LLM
llm = ...
# Set up multiple MCP clients
server_a = MCPClient(
server_params=StdioServerParameters(
command=shutil.which("npx"),
args=["-y", "mcp-server-a"],
env={"API_KEY": os.getenv("SERVER_A_API_KEY")},
)
)
server_b = MCPClient(
server_params=StreamableHttpParameters(
url="https://api.githubcopilot.com/mcp/",
headers={"Authorization": f"Bearer {os.getenv('GITHUB_PERSONAL_ACCESS_TOKEN')}"},
)
)
# Register tools from each server
tools_a = await server_a.register_tools(llm)
tools_b = await server_b.register_tools(llm)
# Merge tools into a single schema
all_tools = ToolsSchema(
standard_tools=tools_a.standard_tools + tools_b.standard_tools
)
# Create context with combined tools
context = LLMContext(tools=all_tools)
```
## Methods
Connects to the MCP server, discovers available tools, converts their schemas
to Pipecat format, and registers them with the LLM service. This is equivalent
to calling `get_tools_schema()` followed by `register_tools_schema()`.
```python theme={null}
async def register_tools(self, llm: LLMService | LLMSwitcher) -> ToolsSchema
```
Connects to the MCP server, discovers available tools, and converts their
schemas to Pipecat format — without registering them with an LLM. Use this
when you need the tools schema before the LLM is created.
```python theme={null}
async def get_tools_schema(self) -> ToolsSchema
```
Registers a previously obtained `ToolsSchema` with an LLM service. Use this
after `get_tools_schema()` once the LLM is available.
```python theme={null}
async def register_tools_schema(self, tools_schema: ToolsSchema, llm: LLMService | LLMSwitcher) -> None
```
## Additional documentation
See [MCP's docs](https://github.com/modelcontextprotocol/python-sdk) for MCP
related updates.
# Debug Log Observer
Source: https://docs.pipecat.ai/api-reference/server/utilities/observers/debug-observer
Comprehensive frame logging with configurable filtering in Pipecat
The `DebugLogObserver` provides detailed logging of frame activity in your Pipecat pipeline, with full visibility into frame content and flexible filtering options.
## Features
* Log all frame types and their content
* Filter by specific frame types
* Filter by source or destination components
* Automatic formatting of frame fields
* Special handling for complex data structures
## Usage
### Log All Frames
Log all frames passing through the pipeline:
```python theme={null}
from pipecat.observers.loggers.debug_log_observer import DebugLogObserver
task = PipelineTask(
pipeline,
params=PipelineParams(
observers=[DebugLogObserver()],
),
)
```
### Filter by Frame Types
Log only specific frame types:
```python theme={null}
from pipecat.frames.frames import TranscriptionFrame, InterimTranscriptionFrame
from pipecat.observers.loggers.debug_log_observer import DebugLogObserver
task = PipelineTask(
pipeline,
params=PipelineParams(
observers=[
DebugLogObserver(frame_types=(
TranscriptionFrame,
InterimTranscriptionFrame
))
],
),
)
```
### Advanced Source/Destination Filtering
Filter frames based on their type and source/destination:
```python theme={null}
from pipecat.frames.frames import InterruptionFrame, UserStartedSpeakingFrame, LLMTextFrame
from pipecat.observers.loggers.debug_log_observer import DebugLogObserver, FrameEndpoint
from pipecat.transports.base_output_transport import BaseOutputTransport
from pipecat.services.stt_service import STTService
task = PipelineTask(
pipeline,
params=PipelineParams(
observers=[
DebugLogObserver(frame_types={
# Only log InterruptionFrame when source is BaseOutputTransport
InterruptionFrame: (BaseOutputTransport, FrameEndpoint.SOURCE),
# Only log UserStartedSpeakingFrame when destination is STTService
UserStartedSpeakingFrame: (STTService, FrameEndpoint.DESTINATION),
# Log LLMTextFrame regardless of source or destination
LLMTextFrame: None
})
],
),
)
```
## Log Output Format
The observer logs each frame with its complete details:
```
[Source] → [Destination]: [FrameType] [field1: value1, field2: value2, ...] at [timestamp]s
```
For example:
```
OpenAILLMService#0 → DailyTransport#0: LLMTextFrame text: 'Hello, how can I help you today?' at 1.24s
```
## Configuration Options
| Parameter | Type | Description |
| ---------------- | -------------------------------------------------------------------------------------- | --------------------------------------------------------------- |
| `frame_types` | `Tuple[Type[Frame], ...]` or `Dict[Type[Frame], Optional[Tuple[Type, FrameEndpoint]]]` | Frame types to log, with optional source/destination filtering |
| `exclude_fields` | `Set[str]` | Field names to exclude from logging (defaults to binary fields) |
## FrameEndpoint Enum
The `FrameEndpoint` enum is used for source/destination filtering:
* `FrameEndpoint.SOURCE`: Filter by source component
* `FrameEndpoint.DESTINATION`: Filter by destination component
# LLM Log Observer
Source: https://docs.pipecat.ai/api-reference/server/utilities/observers/llm-observer
Logging LLM activity in Pipecat
The `LLMLogObserver` provides detailed logging of Large Language Model (LLM) activity within your Pipecat pipeline. It tracks the entire lifecycle of LLM interactions, from initial prompts to final responses.
## Frame Types Monitored
The observer tracks the following frame types (only from/to LLM service):
* **LLMFullResponseStartFrame**: When the LLM begins generating a response
* **LLMFullResponseEndFrame**: When the LLM completes its response
* **LLMTextFrame**: Individual text chunks generated by the LLM
* **FunctionCallInProgressFrame**: Function/tool calls made by the LLM
* **LLMMessagesFrame**: Input messages sent to the LLM (deprecated in favor of context frames)
* **LLMContextFrame**: Updated context sent to the LLM
* **FunctionCallResultFrame**: Results returned from function calls
## Usage
```python theme={null}
from pipecat.observers.loggers.llm_log_observer import LLMLogObserver
task = PipelineTask(
pipeline,
params=PipelineParams(
observers=[LLMLogObserver()],
),
)
```
## Log Output Format
The observer uses emojis and consistent formatting for easy log reading:
* 🧠 \[Source] → LLM START/END RESPONSE
* 🧠 \[Source] → LLM GENERATING: \[text]
* 🧠 \[Source] → LLM FUNCTION CALL: \[details]
* 🧠 → \[Destination] LLM MESSAGES FRAME: \[messages]
* 🧠 → \[Destination] LLM CONTEXT FRAME: \[context]
All log entries include timestamps for precise timing analysis.
# Observer Pattern
Source: https://docs.pipecat.ai/api-reference/server/utilities/observers/observer-pattern
Understanding and implementing observers in Pipecat
The Observer pattern in Pipecat allows non-intrusive monitoring of frames as they flow through the pipeline. Observers can watch frame traffic without affecting the pipeline's core functionality.
DEPRECATED: The old observer pattern with individual parameters
(`on_push_frame(src, dst, frame, direction, timestamp)`) is deprecated. Use
the new pattern with data objects (`on_push_frame(data: FramePushed)`)
instead.
## Base Observer
All observers must inherit from `BaseObserver` and can implement these methods:
* `on_push_frame(data: FramePushed)`: Called when a frame is pushed from one processor to another
* `on_process_frame(data: FrameProcessed)`: Called when a frame is being processed by a processor
* `on_pipeline_started()`: Called after the `StartFrame` has been processed by all processors in the pipeline
```python theme={null}
from pipecat.observers.base_observer import BaseObserver, FramePushed, FrameProcessed
class CustomObserver(BaseObserver):
async def on_push_frame(self, data: FramePushed):
# Your frame observation logic here
pass
async def on_process_frame(self, data: FrameProcessed):
# Your frame processing observation logic here
pass
async def on_pipeline_started(self):
# Called when the pipeline has fully started
pass
```
## Available Observers
Pipecat provides several built-in observers:
* **LLMLogObserver**: Logs LLM activity and responses
* **TranscriptionLogObserver**: Logs speech-to-text transcription events
* **RTVIObserver**: Converts internal frames to RTVI protocol messages for server to client messaging
* **[StartupTimingObserver](/api-reference/server/utilities/observers/startup-timing-observer)**: Measures processor startup times and transport readiness
* **[UserBotLatencyObserver](/api-reference/server/utilities/observers/user-bot-latency-observer)**: Measures user-to-bot response latency
* **[TurnTrackingObserver](/api-reference/server/utilities/observers/turn-tracking-observer)**: Tracks conversation turns and events
## Using Multiple Observers
You can attach multiple observers to a pipeline task. Each observer will be notified of all frames:
```python theme={null}
task = PipelineTask(
pipeline,
params=PipelineParams(
observers=[LLMLogObserver(), TranscriptionLogObserver(), CustomObserver()],
),
)
```
## Example: Debug Observer
Here's an example observer that logs interruptions and bot speaking events:
```python theme={null}
from pipecat.observers.base_observer import BaseObserver, FramePushed, FrameProcessed
from pipecat.frames.frames import (
InterruptionFrame,
BotStartedSpeakingFrame,
BotStoppedSpeakingFrame,
)
from pipecat.processors.frame_processor import FrameDirection
from loguru import logger
class DebugObserver(BaseObserver):
"""Observer to log interruptions and bot speaking events to the console.
Logs all frame instances of:
- InterruptionFrame
- BotStartedSpeakingFrame
- BotStoppedSpeakingFrame
This allows you to see the frame flow from processor to processor through the pipeline for these frames.
Log format: [EVENT TYPE]: [source processor] → [destination processor] at [timestamp]s
"""
async def on_push_frame(self, data: FramePushed):
time_sec = data.timestamp / 1_000_000_000
arrow = "→" if data.direction == FrameDirection.DOWNSTREAM else "←"
if isinstance(data.frame, InterruptionFrame):
logger.info(f"⚡ INTERRUPTION START: {data.source} {arrow} {data.destination} at {time_sec:.2f}s")
elif isinstance(data.frame, BotStartedSpeakingFrame):
logger.info(f"🤖 BOT START SPEAKING: {data.source} {arrow} {data.destination} at {time_sec:.2f}s")
elif isinstance(data.frame, BotStoppedSpeakingFrame):
logger.info(f"🤖 BOT STOP SPEAKING: {data.source} {arrow} {data.destination} at {time_sec:.2f}s")
```
## Common Use Cases
Observers are particularly useful for:
* Debugging frame flow
* Logging specific events
* Monitoring pipeline behavior
* Collecting metrics
* Converting internal frames to external messages
# Startup Timing Observer
Source: https://docs.pipecat.ai/api-reference/server/utilities/observers/startup-timing-observer
Measure processor startup times and transport readiness during pipeline initialization
The `StartupTimingObserver` measures how long each processor's `start()` method takes during pipeline startup, and tracks transport connection timing. This is useful for diagnosing startup slowness and identifying initialization bottlenecks such as WebSocket connections, API authentication, or model loading.
## Features
* Measures per-processor `start()` duration by tracking `StartFrame` propagation
* Reports total pipeline startup time and per-processor breakdown
* Tracks transport connection milestones (bot connected, client connected)
* Emits `on_startup_timing_report` with processor timing data
* Emits `on_transport_timing_report` with transport connection timing
* Supports filtering to measure only specific processor types
* Excludes internal pipeline processors by default
## Usage
### Basic Startup Monitoring
Add startup monitoring to your pipeline and handle the events:
```python theme={null}
from pipecat.observers.startup_timing_observer import StartupTimingObserver
observer = StartupTimingObserver()
@observer.event_handler("on_startup_timing_report")
async def on_startup_timing_report(observer, report):
print(f"Total startup: {report.total_duration_secs:.3f}s")
for timing in report.processor_timings:
print(f" {timing.processor_name}: {timing.duration_secs:.3f}s")
@observer.event_handler("on_transport_timing_report")
async def on_transport_timing_report(observer, report):
if report.bot_connected_secs is not None:
print(f"Bot connected: {report.bot_connected_secs:.3f}s")
print(f"Client connected: {report.client_connected_secs:.3f}s")
task = PipelineTask(
pipeline,
observers=[observer],
)
```
### Filtering Processor Types
To measure only specific processor types, pass a `processor_types` tuple:
```python theme={null}
from pipecat.services.stt_service import STTService
from pipecat.services.tts_service import TTSService
observer = StartupTimingObserver(
processor_types=(STTService, TTSService)
)
```
## Configuration
Optional tuple of processor types to measure. If `None`, all non-internal
processors are measured. Internal pipeline processors (`PipelineSource`,
`Pipeline`) are always excluded.
## Event Handlers
### on\_startup\_timing\_report
Called once after the pipeline has fully started, with timing data for all measured processors.
```python theme={null}
@observer.event_handler("on_startup_timing_report")
async def on_startup_timing_report(observer, report):
# report is a StartupTimingReport
print(f"Total: {report.total_duration_secs:.3f}s")
for timing in report.processor_timings:
print(f" {timing.processor_name}: {timing.duration_secs:.3f}s")
```
**Report fields (`StartupTimingReport`):**
| Field | Type | Description |
| --------------------- | ------------------------------ | ------------------------------------------------------ |
| `start_time` | `float` | Unix timestamp when the first processor began starting |
| `total_duration_secs` | `float` | Sum of all measured processor `start()` durations |
| `processor_timings` | `List[ProcessorStartupTiming]` | Per-processor timing data, in pipeline order |
**Processor timing fields (`ProcessorStartupTiming`):**
| Field | Type | Description |
| ------------------- | ------- | --------------------------------------------------------------- |
| `processor_name` | `str` | The name of the processor |
| `start_offset_secs` | `float` | Offset from the StartFrame to when this processor's start began |
| `duration_secs` | `float` | How long the processor's `start()` took |
### on\_transport\_timing\_report
Called once when the first client connects, with transport connection timing relative to the `StartFrame`.
```python theme={null}
@observer.event_handler("on_transport_timing_report")
async def on_transport_timing_report(observer, report):
# report is a TransportTimingReport
if report.bot_connected_secs is not None:
print(f"Bot connected: {report.bot_connected_secs:.3f}s")
print(f"Client connected: {report.client_connected_secs:.3f}s")
```
**Report fields (`TransportTimingReport`):**
| Field | Type | Description |
| ----------------------- | ----------------- | -------------------------------------------------------------------- |
| `start_time` | `float` | Unix timestamp of the StartFrame (pipeline start) |
| `bot_connected_secs` | `Optional[float]` | Seconds from StartFrame to `BotConnectedFrame` (SFU transports only) |
| `client_connected_secs` | `Optional[float]` | Seconds from StartFrame to first `ClientConnectedFrame` |
`bot_connected_secs` is only set for SFU transports (Daily, LiveKit, HeyGen,
Tavus) that emit a `BotConnectedFrame` when the bot joins the room. Non-SFU
transports (WebSocket, SmallWebRTC) will have this field set to `None`.
# Transcription Log Observer
Source: https://docs.pipecat.ai/api-reference/server/utilities/observers/transcription-observer
Logging speech-to-text transcription activity in Pipecat
The `TranscriptionLogObserver` logs all speech-to-text transcription activity in your Pipecat pipeline, providing visibility into both final and interim transcription results.
## Frame Types Monitored
The observer tracks the following frame types (only from STT service):
* **TranscriptionFrame**: Final transcription results
* **InterimTranscriptionFrame**: In-progress transcription results
## Usage
```python theme={null}
from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
task = PipelineTask(
pipeline,
params=PipelineParams(
observers=[TranscriptionLogObserver()],
),
)
```
## Log Output Format
The observer uses consistent formatting with emoji indicators:
* 💬 \[Source] → TRANSCRIPTION: \[text] from \[user\_id]
* 💬 \[Source] → INTERIM TRANSCRIPTION: \[text] from \[user\_id]
All log entries include timestamps for precise timing analysis.
# Turn Tracking Observer
Source: https://docs.pipecat.ai/api-reference/server/utilities/observers/turn-tracking-observer
Track conversation turns and events in your Pipecat pipeline
The `TurnTrackingObserver` monitors and tracks conversational turns in your Pipecat pipeline, providing events when turns start and end. It intelligently identifies when a user-bot interaction cycle begins and completes.
## Turn Lifecycle
A turn represents a complete user-bot interaction cycle:
1. **Start**: When the user starts speaking (or pipeline starts for first turn)
2. **Processing**: User speaks, bot processes and responds
3. **End**: After the bot finishes speaking and either:
* The user starts speaking again
* A timeout period elapses with no further activity
## Events
The observer emits two main events:
* **`on_turn_started`**: When a new turn begins
* Parameters: `turn_number` (int)
* **`on_turn_ended`**: When a turn completes
* Parameters: `turn_number` (int), `duration` (float, in seconds), `was_interrupted` (bool)
## Usage
The observer is automatically created when you initialize a `PipelineTask` with `enable_turn_tracking=True` (which is the default):
```python theme={null}
task = PipelineTask(
pipeline,
# Turn tracking is enabled by default
)
# Access the observer
turn_observer = task.turn_tracking_observer
# Register event handlers
@turn_observer.event_handler("on_turn_started")
async def on_turn_started(observer, turn_number):
logger.info(f"Turn {turn_number} started")
@turn_observer.event_handler("on_turn_ended")
async def on_turn_ended(observer, turn_number, duration, was_interrupted):
status = "interrupted" if was_interrupted else "completed"
logger.info(f"Turn {turn_number} {status} in {duration:.2f}s")
```
## Configuration
You can configure the observer's behavior when creating a `PipelineTask`:
```python theme={null}
from pipecat.observers.turn_tracking_observer import TurnTrackingObserver
# Create a custom observer instance
custom_turn_tracker = TurnTrackingObserver(
turn_end_timeout_secs=3.5, # Turn end timeout (default: 2.5)
)
# Add it as a regular observer
task = PipelineTask(
pipeline,
observers=[custom_turn_tracker],
# Disable the default one if adding your own
enable_turn_tracking=False,
)
```
## Interruptions
The observer automatically detects interruptions when the user starts speaking while the bot is still speaking. In this case:
* The current turn is marked as interrupted (`was_interrupted=True`)
* A new turn begins immediately
## How It Works
The observer monitors specific frame types to track conversation flow:
* **StartFrame**: Initiates the first turn
* **UserStartedSpeakingFrame**: Starts user speech or triggers a new turn
* **BotStartedSpeakingFrame**: Marks bot speech beginning
* **BotStoppedSpeakingFrame**: Starts the turn end timeout
After a bot stops speaking, the observer waits for the configured timeout period. If no further bot speech occurs, the turn ends; otherwise, it continues as part of the same turn.
## Use Cases
* **Analytics**: Measure turn durations, interruption rates, and conversation flow
* **Logging**: Record turn-based logs for diagnostics and analysis
* **Visualization**: Show turn-based conversation timelines in UIs
* **Tracing**: Group spans and metrics by conversation turns
# User-Bot Latency Observer
Source: https://docs.pipecat.ai/api-reference/server/utilities/observers/user-bot-latency-observer
Measure response time between user speech and bot responses in Pipecat
The `UserBotLatencyObserver` measures the time between when a user stops speaking and when the bot starts responding, emitting events for custom handling and optional OpenTelemetry tracing integration. It also tracks first-bot-speech latency and provides detailed per-service latency breakdowns when metrics are enabled.
## Features
* Tracks user speech start/stop timing using VAD frames
* Measures bot response latency from the actual moment the user started speaking
* Measures first bot speech latency (client connection to first speech)
* Provides detailed latency breakdown with per-service TTFB, text aggregation, user turn duration, and function call metrics
* Emits `on_latency_measured` events for custom processing
* Emits `on_latency_breakdown` events with detailed per-service metrics
* Emits `on_first_bot_speech_latency` event for greeting latency measurement
* Automatically records latency as OpenTelemetry span attributes when tracing is enabled
* Automatically resets between conversation turns
## Usage
### Basic Latency Monitoring
Add latency monitoring to your pipeline and handle the event:
```python theme={null}
from pipecat.observers.user_bot_latency_observer import UserBotLatencyObserver
latency_observer = UserBotLatencyObserver()
@latency_observer.event_handler("on_latency_measured")
async def on_latency_measured(observer, latency):
print(f"User-to-bot latency: {latency:.3f}s")
task = PipelineTask(
pipeline,
params=PipelineParams(observers=[latency_observer]),
)
```
### Detailed Latency Breakdown
Enable metrics to collect per-service latency breakdown:
```python theme={null}
from pipecat.observers.user_bot_latency_observer import UserBotLatencyObserver
latency_observer = UserBotLatencyObserver()
@latency_observer.event_handler("on_latency_breakdown")
async def on_latency_breakdown(observer, breakdown):
print(f"Latency breakdown ({len(breakdown.chronological_events())} events):")
for event in breakdown.chronological_events():
print(f" {event}")
task = PipelineTask(
pipeline,
params=PipelineParams(
observers=[latency_observer],
enable_metrics=True, # Required for breakdown metrics
),
)
```
### OpenTelemetry Integration
When tracing is enabled, latency measurements are automatically recorded as `turn.user_bot_latency_seconds` attributes on OpenTelemetry turn spans. No additional configuration is needed.
## How It Works
The observer tracks conversation flow through these key events:
1. **Client connects** (`ClientConnectedFrame`) → Records timestamp for first-bot-speech measurement
2. **User starts speaking** (`VADUserStartedSpeakingFrame`) → Resets latency tracking
3. **User stops speaking** (`VADUserStoppedSpeakingFrame`) → Records timestamp, accounting for VAD `stop_secs` delay
4. **Bot starts speaking** (`BotStartedSpeakingFrame`) → Calculates latency and emits `on_latency_measured` and `on_latency_breakdown` events
When `enable_metrics=True` in `PipelineParams`, the observer also collects per-service metrics (TTFB, text aggregation, function call latency) from `MetricsFrame` instances and includes them in the latency breakdown.
## Event Handlers
### on\_latency\_measured
Called each time a user-to-bot latency measurement is captured.
```python theme={null}
@latency_observer.event_handler("on_latency_measured")
async def on_latency_measured(observer, latency):
# latency is a float representing seconds
logger.info(f"Response latency: {latency:.3f}s")
```
### on\_latency\_breakdown
Called alongside `on_latency_measured` with detailed per-service metrics collected during the user→bot cycle. The breakdown includes TTFB from each service, text aggregation latency, user turn duration, and function call timings.
```python theme={null}
@latency_observer.event_handler("on_latency_breakdown")
async def on_latency_breakdown(observer, breakdown):
# breakdown is a LatencyBreakdown object
logger.info("Latency breakdown:")
for event in breakdown.chronological_events():
logger.info(f" {event}")
```
**LatencyBreakdown fields:**
| Field | Type | Description |
| ---------------------- | ------------------------------------------- | -------------------------------------------------------------------------------------------- |
| `ttfb` | `List[TTFBBreakdownMetrics]` | Time-to-first-byte metrics from each service |
| `text_aggregation` | `Optional[TextAggregationBreakdownMetrics]` | First text aggregation measurement (sentence aggregation latency) |
| `user_turn_start_time` | `Optional[float]` | Unix timestamp when user turn started (adjusted for VAD stop\_secs) |
| `user_turn_secs` | `Optional[float]` | User turn duration including VAD silence detection, STT finalization, and turn analyzer wait |
| `function_calls` | `List[FunctionCallMetrics]` | Latency for each function call executed during the cycle |
The `breakdown.chronological_events()` method returns a human-readable list of all metrics sorted by start time, useful for logging and debugging.
### on\_first\_bot\_speech\_latency
Called once when the bot first speaks after client connection. Measures the time from `ClientConnectedFrame` to the first `BotStartedSpeakingFrame`. This is particularly useful for measuring greeting latency.
```python theme={null}
@latency_observer.event_handler("on_first_bot_speech_latency")
async def on_first_bot_speech_latency(observer, latency):
logger.info(f"First bot speech latency: {latency:.3f}s")
```
The `on_latency_breakdown` event is also emitted for the first bot speech, allowing you to see the detailed breakdown of what contributed to the greeting latency.
## Deprecated: UserBotLatencyLogObserver
`UserBotLatencyLogObserver` is deprecated. Use `UserBotLatencyObserver`
directly with its `on_latency_measured` event handler instead.
## Configuration
### Constructor Parameters
Maximum number of frame IDs to keep in history for duplicate detection. Prevents unbounded memory growth in long conversations.
## Limitations
* Requires proper frame sequencing to work accurately
* Per-service metrics are only collected when `enable_metrics=True` in `PipelineParams`
# OpenTelemetry Tracing
Source: https://docs.pipecat.ai/api-reference/server/utilities/opentelemetry
Monitor and analyze your Pipecat conversational pipelines using OpenTelemetry
## Overview
Pipecat includes built-in support for OpenTelemetry tracing, allowing you to gain deep visibility into your voice applications. Tracing helps you:
* Track latency and performance across your conversation pipeline
* Monitor service health and identify bottlenecks
* Visualize conversation turns and service dependencies
* Collect usage metrics and operational analytics
## Installation
To use OpenTelemetry tracing with Pipecat, install the tracing dependencies:
```bash theme={null}
pip install "pipecat-ai[tracing]"
```
For local development and testing, we recommend using Jaeger as a trace collector.
You can run it with Docker:
```bash theme={null}
docker run -d --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
jaegertracing/all-in-one:latest
```
Then access the UI at [http://localhost:16686](http://localhost:16686)
## Basic Setup
Enabling tracing in your Pipecat application requires two steps:
1. **Initialize the OpenTelemetry SDK** with your preferred exporter
2. **Enable tracing in your PipelineTask**
```python theme={null}
import os
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from pipecat.utils.tracing.setup import setup_tracing
from pipecat.pipeline.task import PipelineTask, PipelineParams
# Step 1: Initialize OpenTelemetry with your chosen exporter
exporter = OTLPSpanExporter(
endpoint="http://localhost:4317", # Jaeger or other collector endpoint
insecure=True,
)
setup_tracing(
service_name="my-voice-app",
exporter=exporter,
console_export=False, # Set to True for debug output
)
# Step 2: Enable tracing in your PipelineTask
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True, # Required for some service metrics
),
enable_tracing=True, # Enable tracing for this task
enable_turn_tracking=True, # Enable turn tracking for this task
conversation_id="customer-123", # Optional - will auto-generate if not provided
additional_span_attributes={"session.id": "abc-123"} # Optional - additional attributes to attach to the otel span
)
```
For complete working examples, see our sample implementations:
* [Jaeger Tracing Example](https://github.com/pipecat-ai/pipecat-examples/tree/main/open-telemetry/jaeger) - Uses gRPC exporter with Jaeger
* [Langfuse Tracing Example](https://github.com/pipecat-ai/pipecat-examples/tree/main/open-telemetry/langfuse) - Uses HTTP exporter with Langfuse for LLM-focused observability
## Trace Structure
Pipecat organizes traces hierarchically, following the natural structure of conversations:
```
Conversation (conversation)
├── turn
│ ├── stt
│ ├── llm
│ └── tts
└── turn
├── stt
├── llm
└── tts
turn...
```
For real-time multimodal services like Gemini Live and OpenAI Realtime, the structure adapts to their specific patterns:
```
Conversation (conversation)
├── turn
│ ├── llm_setup (session configuration)
│ ├── stt (user input)
│ ├── llm_response (complete response with usage)
│ └── llm_tool_call/llm_tool_result (for function calls)
└── turn
├── stt (user input)
└── llm_response (complete response)
turn...
```
This hierarchical structure makes it easy to:
* Track the full lifecycle of a conversation
* Measure latency for individual turns
* Identify which services are contributing to delays
* Compare performance across different conversations
## Exporter Options
Pipecat supports any OpenTelemetry-compatible exporter. Common options include:
### OTLP Exporter (for Jaeger, Grafana, etc.)
```python theme={null}
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
exporter = OTLPSpanExporter(
endpoint="http://localhost:4317", # Your collector endpoint
insecure=True, # Use False for TLS connections
)
```
### HTTP OTLP Exporter (for Langfuse, etc.)
```python theme={null}
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
exporter = OTLPSpanExporter(
# Configure with environment variables:
# OTEL_EXPORTER_OTLP_ENDPOINT
# OTEL_EXPORTER_OTLP_HEADERS
)
```
See our [Langfuse example](https://github.com/pipecat-ai/pipecat-examples/tree/main/open-telemetry/langfuse) for details on configuring this exporter.
### Console Exporter (for debugging)
The console exporter can be enabled alongside any other exporter by setting `console_export=True`:
```python theme={null}
setup_tracing(
service_name="my-voice-app",
exporter=otlp_exporter,
console_export=True, # Prints traces to stdout
)
```
### Cloud Provider Exporters
Many cloud providers offer OpenTelemetry-compatible observability services:
* **AWS X-Ray**
* **Google Cloud Trace**
* **Azure Monitor**
* **Datadog APM**
Check the OpenTelemetry documentation for specific exporter configurations: [OpenTelemetry Vendors](https://opentelemetry.io/ecosystem/vendors/)
### OpenInference
Arize-ai provides OpenInference instrumentation, compatible with OpenTelemetry.
* [openinference-instrumentation-pipecat](https://github.com/Arize-ai/openinference/tree/main/python/instrumentation/openinference-instrumentation-pipecat)
See [Observability community integrations](https://docs.pipecat.ai/server/services/community-integrations#observability) for more details.
## Span Attributes
Pipecat enriches spans with detailed attributes about service operations:
### TTS Service Spans
* `gen_ai.system`: Service provider (e.g., "cartesia")
* `gen_ai.request.model`: Model ID/name
* `voice_id`: Voice identifier
* `text`: The text being synthesized
* `metrics.character_count`: Number of characters in the text
* `metrics.ttfb`: Time to first byte in seconds
* `settings.*`: Service-specific configuration parameters
### STT Service Spans
* `gen_ai.system`: Service provider (e.g., "deepgram")
* `gen_ai.request.model`: Model ID/name
* `transcript`: The transcribed text
* `is_final`: Whether the transcription is final
* `language`: Detected or configured language
* `vad_enabled`: Whether voice activity detection is enabled
* `metrics.ttfb`: Time to first byte in seconds
* `settings.*`: Service-specific configuration parameters
### LLM Service Spans
* `gen_ai.system`: Service provider (e.g., "openai", "gcp.gemini")
* `gen_ai.request.model`: Model ID/name
* `gen_ai.operation.name`: Operation type (e.g., "chat")
* `stream`: Whether streaming is enabled
* `input`: JSON-serialized input messages
* `output`: Complete response text
* `tools`: JSON-serialized tools configuration
* `tools.count`: Number of tools available
* `tools.names`: Comma-separated tool names
* `system`: System message content
* `gen_ai.usage.input_tokens`: Number of prompt tokens
* `gen_ai.usage.output_tokens`: Number of completion tokens
* `metrics.ttfb`: Time to first byte in seconds
* `gen_ai.request.*`: Standard parameters (temperature, max\_tokens, etc.)
### Multimodal Service Spans (Gemini Live & OpenAI Realtime)
#### Setup Spans
* `gen_ai.system`: "gcp.gemini" or "openai"
* `gen_ai.request.model`: Model identifier
* `tools.count`: Number of available tools
* `tools.definitions`: JSON-serialized tool schemas
* `system_instruction`: System prompt (truncated)
* `session.*`: Session configuration parameters
#### Request Spans (OpenAI Realtime)
* `input`: JSON-serialized context messages being sent
* `gen_ai.operation.name`: "llm\_request"
#### Response Spans
* `output`: Complete assistant response text
* `output_modality`: "TEXT" or "AUDIO" (Gemini Live)
* `gen_ai.usage.input_tokens`: Prompt tokens used
* `gen_ai.usage.output_tokens`: Completion tokens generated
* `function_calls.count`: Number of function calls made
* `function_calls.names`: Comma-separated function names
* `metrics.ttfb`: Time to first response in seconds
#### Tool Call/Result Spans (Gemini Live)
* `tool.function_name`: Name of the function being called
* `tool.call_id`: Unique identifier for the call
* `tool.arguments`: Function arguments (truncated)
* `tool.result`: Function execution result (truncated)
* `tool.result_status`: "completed", "error", or "parse\_error"
### Turn Spans
* `turn.number`: Sequential turn number
* `turn.type`: Type of turn (e.g., "conversation")
* `turn.duration_seconds`: Duration of the turn
* `turn.was_interrupted`: Whether the turn was interrupted
* `conversation.id`: ID of the parent conversation
### Conversation Spans
* `conversation.id`: Unique identifier for the conversation
* `conversation.type`: Type of conversation (e.g., "voice")
## Usage Metrics
Pipecat's tracing implementation automatically captures usage metrics for LLM and TTS services:
### LLM Token Usage
Token usage is captured in LLM spans as:
* `gen_ai.usage.input_tokens`
* `gen_ai.usage.output_tokens`
### TTS Character Count
Character count is captured in TTS spans as:
* `metrics.character_count`
## Performance Metrics
Pipecat traces capture key performance metrics for each service:
### Time To First Byte (TTFB)
The time it takes for a service to produce its first response:
* `metrics.ttfb` (in seconds)
### Processing Duration
The total time spent processing in each service is captured in the span duration.
## Configuration Options
### PipelineTask Parameters
Enable or disable tracing for the pipeline
Whether to enable turn tracking.
Custom ID for the conversation. If not provided, a UUID will be generated
Any additional attributes to add to top-level OpenTelemetry conversation span.
### setup\_tracing() Parameters
Name of the service for traces
A pre-configured OpenTelemetry span exporter instance
Whether to also export traces to console (useful for debugging)
## Example
Here's a complete example showing OpenTelemetry tracing setup with Jaeger:
```python theme={null}
import os
from dotenv import load_dotenv
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.utils.tracing.setup import setup_tracing
load_dotenv()
# Initialize tracing if enabled
if os.getenv("ENABLE_TRACING"):
# Create the exporter
otlp_exporter = OTLPSpanExporter(
endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://localhost:4317"),
insecure=True,
)
# Set up tracing with the exporter
setup_tracing(
service_name="pipecat-demo",
exporter=otlp_exporter,
console_export=bool(os.getenv("OTEL_CONSOLE_EXPORT")),
)
# Create your services
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="your-voice-id"
)
# Build pipeline
pipeline = Pipeline([
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
])
# Create pipeline task with tracing enabled
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
enable_tracing=True,
enable_turn_tracking=True,
conversation_id="customer-123", # Optional - will auto-generate if not provided
additional_span_attributes={"session.id": "abc-123"} # Optional - additional attributes to attach to the otel span
)
# Run the pipeline
runner = PipelineRunner()
await runner.run(task)
```
## Troubleshooting
If you're having issues with tracing:
* **No Traces Visible**: Ensure the OpenTelemetry packages are installed and that your collector endpoint is correct
* **Missing Service Data**: Verify that `enable_metrics=True` is set in PipelineParams
* **Debugging Tracing**: Enable console export with `console_export=True` to view traces in your logs
* **Connection Errors**: Check network connectivity to your trace collector
* **Collector Configuration**: Verify your collector is properly set up to receive traces
## References
* [OpenTelemetry Python Documentation](https://opentelemetry-python.readthedocs.io/)
* [OpenTelemetry Tracing Specification](https://opentelemetry.io/docs/reference/specification/trace/api/)
* [Jaeger Documentation](https://www.jaegertracing.io/docs/latest/)
* [Langfuse OpenTelemetry Documentation](https://langfuse.com/docs/opentelemetry/get-started)
# Development Runner
Source: https://docs.pipecat.ai/api-reference/server/utilities/runner/guide
Unified runner for building voice AI bots with Daily, WebRTC, and telephony transports
## Overview
The Pipecat development runner provides a unified way to run voice AI bots across multiple transport types. It handles infrastructure setup - creating Daily rooms, managing WebRTC connections, and routing telephony calls.
## Installation
```bash theme={null}
pip install pipecat-ai[runner]
```
## What is a Runner?
A runner in Pipecat refers to a "bot runner", an HTTP service that provides a gateway for spawning bots on-demand. It's the component that enables your bot to run by providing it with server infrastructure and connection details like rooms and tokens.
A bot runner typically creates transport sessions (like Daily WebRTC rooms), generates authentication tokens for both bots and users, spawns new bot processes when users request sessions, and manages bot lifecycle and cleanup. Think of it as the bridge between incoming user connections and your bot logic.
## How the Development Runner Works
The development runner operates as a FastAPI web server that automatically discovers and executes your bot code. When you start the runner, it creates the necessary web endpoints and infrastructure for your chosen transport type.
* **WebRTC connections**: It serves a built-in web interface where users can connect directly as well as an endpoint to create new WebRTC sessions
* **Daily integration**: It provides endpoints that create new rooms and tokens and redirect users to join them
* **Telephony providers**: For Twilio, it sets up webhook endpoints that handle incoming calls and establish WebSocket connections for audio streaming
The runner automatically detects which transport type you're using and configures the appropriate infrastructure. It then discovers your bot function and spawns new instances whenever users connect. This means you can focus on writing your bot logic while the runner handles all the server infrastructure, connection management, and transport-specific details.
Your bot code receives runner arguments that contain everything it needs, including Daily room URLs and tokens, WebRTC connections, or WebSocket streams for telephony. The runner abstracts away the complexity of managing these different connection types, providing a unified interface for building bots that work across multiple platforms.
## Pipecat Cloud Ready
The bot runner is designed to be cloud-ready, meaning that you can run the same bot code locally and deployed to Pipecat Cloud without any modifications. It automatically handles the differences in transport setup, providing you with the flexibility to test locally using an open source transport, like SmallWebRTCTransport, but run in production using Daily or telephony transports.
## Building with the Runner
Now let's build a practical example to see how this works. The key insight is that your bot code is structured into two parts: the core bot logic that works with any transport, and the entry point that creates the appropriate transport based on the runner arguments.
Here's the basic structure:
```python theme={null}
# Your imports
from pipecat.runner.types import RunnerArguments
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
async def run_bot(transport: BaseTransport):
"""Your core bot logic here:
- Define services (STT, TTS, LLM)
- Initialize messages and context
- Create the pipeline
- Add event handlers
- Run the pipeline
"""
# Your bot logic goes here
# Define STT, LLM, TTS...
# ...
# Run your pipeline
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible for local dev and Pipecat Cloud."""
transport = SmallWebRTCTransport(
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
webrtc_connection=runner_args.webrtc_connection,
)
await run_bot(transport)
if __name__ == "__main__":
from pipecat.runner.run import main
main()
```
The `run_bot()` function contains your actual bot logic and is transport-agnostic. The `bot()` function is the entry point that the runner calls - it creates the appropriate transport and passes it to your bot logic. This separation allows the same bot code to work across different transports.
When you run this with `python bot.py`, the development runner starts a web server and opens a browser interface at `http://localhost:7860/client`. Each time someone connects, the runner calls your `bot()` function with WebRTC runner arguments.
## Supporting Multiple Transports
To make your bot work across different platforms, you can detect the transport type from the runner arguments and create the appropriate transport. Here's how to support both Daily and WebRTC:
```python theme={null}
from pipecat.runner.types import DailyRunnerArguments, RunnerArguments, SmallWebRTCRunnerArguments
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = None
if isinstance(runner_args, DailyRunnerArguments):
from pipecat.transports.daily.transport import DailyParams, DailyTransport
transport = DailyTransport(
runner_args.room_url,
runner_args.token,
"Pipecat Bot",
params=DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
)
elif isinstance(runner_args, SmallWebRTCRunnerArguments):
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
transport = SmallWebRTCTransport(
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
webrtc_connection=runner_args.webrtc_connection,
)
else:
logger.error(f"Unsupported runner arguments type: {type(runner_args)}")
return
if transport is None:
logger.error("Failed to create transport")
return
await run_bot(transport)
if __name__ == "__main__":
from pipecat.runner.run import main
main()
```
Now you can run your bot with different transports:
```bash theme={null}
python bot.py -t webrtc # Uses SmallWebRTCRunnerArguments
python bot.py -t daily # Uses DailyRunnerArguments
```
### Understanding Runner Arguments
Runner arguments are how the runner communicates transport-specific information to your bot. The runner determines which transport to use based on the command-line arguments, then creates the appropriate runner arguments:
* **`DailyRunnerArguments`**: Contains `room_url`, `token` (Optional), `body` (Optional) for joining Daily rooms
* **`SmallWebRTCRunnerArguments`**: Contains `webrtc_connection` for local WebRTC sessions
* **`WebSocketRunnerArguments`**: Contains `websocket` for telephony connections
All runner arguments also include:
* **`handle_sigint`**: Whether the bot should handle SIGINT (Ctrl+C) signals (managed automatically)
* **`handle_sigterm`**: Whether the bot should handle SIGTERM signals (managed automatically)
`handle_sigint` and `handle_sigterm` are development only features and cannot
be used when deploying to Pipecat Cloud.
The runner handles all the complex setup - creating Daily rooms, generating tokens, establishing WebSocket connections - and provides your bot with everything it needs through these runner arguments.
Notice how we use lazy imports (`from pipecat.transports.daily.transport import ...`) inside the conditional blocks. This ensures that transport-specific dependencies are only required when that transport is actually used, making your bot more portable.
`RunnerArguments` is the base class for all runner arguments. It provides a
common interface for the runner to pass transport-specific information to your
bot.
## Environment Detection
When building bots that work both locally and in production, you often need to detect the execution environment to enable different features. The development runner sets the `ENV` environment variable to help with this:
```python theme={null}
import os
async def bot(runner_args: RunnerArguments):
# Check if running in local development environment
is_local = os.environ.get("ENV") == "local"
# Enable production features only when deployed
if not is_local:
from pipecat.audio.filters.krisp_viva_filter import KrispVivaFilter
krisp_filter = KrispVivaFilter()
else:
krisp_filter = None
transport = DailyTransport(
runner_args.room_url,
runner_args.token,
"Pipecat Bot",
params=DailyParams(
audio_in_filter=krisp_filter, # Krisp VIVA filter only in production
audio_in_enabled=True,
audio_out_enabled=True,
),
)
```
### Environment Values
The development runner automatically sets environment variables based on how your bot is running:
* **Local development**: `ENV=local` (set by the development runner)
* **Production/Cloud deployment**: `ENV` is not set or has a different value
This allows you to easily customize behavior between development and production environments:
## All Supported Transports
The development runner supports six transport types, each designed for different use cases:
### WebRTC (`-t webrtc`)
Local WebRTC connections with a built-in browser interface. Perfect for development and testing.
```bash theme={null}
python bot.py -t webrtc
# Opens http://localhost:7860/client
# ESP32 compatibility mode
python bot.py -t webrtc --esp32 --host 192.168.1.100
```
**Runner Arguments**: `SmallWebRTCRunnerArguments`
* `webrtc_connection`: Pre-configured WebRTC peer connection
### Daily (`-t daily`)
Integration with Daily for production video conferencing with rooms, participant management, and Pipecat client compatibility.
```bash theme={null}
python bot.py -t daily
# Opens http://localhost:7860 with room creation interface
# Also provides POST /start endpoint for Pipecat clients
# Direct connection for testing (bypasses web server)
python bot.py -d
# Enable PSTN dial-in webhook handling
python bot.py -t daily --dialin
```
**Runner Arguments**: `DailyRunnerArguments`
* `room_url`: Daily room URL to join
* `token`: Authentication token for the room
* `body`: Request data from /start endpoint (dict) or dial-in webhook data
**Available Endpoints**:
* `GET /`: Web interface for creating rooms
* `POST /start`: RTVI-compatible endpoint for programmatic access
* `POST /daily-dialin-webhook`: PSTN dial-in webhook handler (requires `--dialin` flag)
### Telephony (`-t twilio|telnyx|plivo|exotel`)
Phone call integration through telephony providers. Requires a public webhook endpoint and provider credentials.
```bash theme={null}
# Set up environment variables first
export TWILIO_ACCOUNT_SID=your_account_sid
export TWILIO_AUTH_TOKEN=your_auth_token
# Then run with public proxy
python bot.py -t twilio -x yourproxy.ngrok.io
```
**Environment Variables**:
* Twilio: `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN`
* Telnyx: `TELNYX_API_KEY`
* Plivo: `PLIVO_AUTH_ID`, `PLIVO_AUTH_TOKEN`
* Exotel: None required (no hang-up functionality available)
Environment variables are optional, but when provided will attempt to hang up the call when the session ends.
**Runner Arguments**: `WebSocketRunnerArguments`
* `websocket`: WebSocket connection for audio streaming
The runner automatically detects the telephony provider from incoming WebSocket messages and configures the appropriate serializers and audio settings. Your bot receives a pre-configured FastAPI WebSocket connection ready for telephony audio streaming.
## Command Line Options
The development runner accepts several command-line arguments to customize its behavior:
```bash theme={null}
python bot.py [OPTIONS]
Options:
--host TEXT Server host address (default: localhost)
--port INTEGER Server port (default: 7860)
-t, --transport Transport type: daily, webrtc, twilio, telnyx, plivo, exotel (default: webrtc)
-x, --proxy TEXT Public proxy hostname for telephony webhooks (required for telephony)
--esp32 Enable SDP munging for ESP32 WebRTC compatibility
-d, --direct Connect directly to Daily room for testing (automatically sets transport to daily)
--dialin Enable Daily PSTN dial-in webhook handling (requires Daily transport)
-v, --verbose Increase logging verbosity
```
### Key Arguments
**`--transport` / `-t`**: Determines which transport infrastructure to set up
* `webrtc`: Local WebRTC with browser interface
* `daily`: Daily.co integration with room management
* `twilio`, `telnyx`, `plivo`, `exotel`: Telephony provider integration
**`--proxy` / `-x`**: Required for most telephony transports (Twilio, Telnyx, Plivo). This should be a publicly accessible hostname (like `yourbot.ngrok.io`) that can receive webhooks from telephony providers. The runner automatically strips protocol prefixes (http\://, https\://) if provided. Not required for Exotel, which uses direct WebSocket connections.
**`--direct` / `-d`**: Special mode for Daily that bypasses the web server and connects your bot directly to a Daily room. Useful for quick testing but not recommended for production use.
**`--dialin`**: Enables the `/daily-dialin-webhook` endpoint for handling Daily PSTN dial-in calls. Only works with Daily transport (`-t daily`). This endpoint receives webhook data from Daily when a phone call dials into your configured phone number, creates a SIP-enabled room, and spawns your bot.
**`--esp32`**: Enables SDP (Session Description Protocol) modifications needed for ESP32 WebRTC compatibility. Must be used with a specific IP address via `--host`.
### Environment Variables
Different transports require different environment variables:
**Daily**:
* `DAILY_API_KEY`: Daily API key for creating rooms and tokens (required for dial-in)
* `DAILY_SAMPLE_ROOM_URL` (Optional): Existing room URL to use
* `DAILY_API_URL` (Optional): Daily API URL (defaults to [https://api.daily.co/v1](https://api.daily.co/v1))
**Telephony**:
* `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN`: Twilio credentials
* `PLIVO_AUTH_ID`, `PLIVO_AUTH_TOKEN`: Plivo credentials
* `TELNYX_API_KEY`: Telnyx API key
The runner automatically uses these environment variables when creating transport sessions and authentication tokens.
## Simplifying with the Transport Utility
While the manual approach gives you full control, the `create_transport` utility provides a much cleaner way to handle multiple transports. Instead of writing conditional logic for each transport type, you define transport configurations upfront and let the utility handle the selection:
```python theme={null}
from pipecat.runner.utils import create_transport
# Define transport configurations using factory functions
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
# add_wav_header and serializer handled automatically
),
}
async def bot(runner_args):
"""Simplified bot entry point using the transport utility."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport)
```
The utility automatically:
* Detects the telephony provider from WebSocket messages
* Configures the appropriate serializer (Twilio, Telnyx, Plivo, or Exotel)
* Sets up authentication using environment variables
* Handles WebRTC and Daily transport creation
Now your bot supports all six transport types with just two lines of code in the bot() function.
## RTVI Integration
The development runner provides RTVI (Real-Time Voice Interface) compatible endpoints for the Pipecat client SDKs. This allows you to use the Pipecat client libraries locally during development as well as when deployed to Pipecat Cloud.
### The `/start` Endpoint
For Daily transports, the runner automatically creates a `/start` POST endpoint that:
1. **Creates Daily rooms and tokens** using your `DAILY_API_KEY`
2. **Spawns bot instances** with request `body` data
3. **Returns connection details** in RTVI-compatible format
```bash theme={null}
# Start the Daily runner
python bot.py -t daily
# The /start endpoint is now available at:
# POST http://localhost:7860/start
```
**Request Format:**
```json theme={null}
{
"createDailyRoom": true,
"dailyRoomProperties": {
"start_video_off": true,
"start_audio_off": false
},
"body": {
"custom_data": "your_value",
"user_id": "user123"
}
}
```
`dailyRoomProperties` are not yet handled by the runner.
**Response Format:**
```json theme={null}
{
"dailyRoom": "https://domain.daily.co/room-name",
"dailyToken": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9..."
}
```
### Accessing Request Data in Your Bot
The `body` field from the `/start` request is passed to your bot through `DailyRunnerArguments.body`:
```python theme={null}
async def bot(runner_args: DailyRunnerArguments):
# Access custom data from the /start request
custom_data = runner_args.body.get("custom_data")
user_id = runner_args.body.get("user_id")
print(f"Bot started for user: {user_id}")
print(f"Custom data: {custom_data}")
# Your bot logic here
transport = DailyTransport(runner_args.room_url, runner_args.token, "Bot")
await run_bot(transport)
```
### RTVI Client Example
You can use RTVI client libraries to connect to your local development runner:
```javascript theme={null}
import { PipecatClient } from "@pipecat-ai/client-js";
import { DailyTransport } from "@pipecat-ai/daily-transport";
const client = new PipecatClient({
transport: new DailyTransport(),
enableMic: true,
enableCam: false,
});
// Start a session with custom data
await client.connect({ endpoint: "http://localhost:7860/start" });
```
The `/start` endpoint is only available for Daily transports (`-t daily`).
WebRTC and telephony transports use different connection methods.
## Daily Dial-In Webhook
The development runner can handle Daily PSTN dial-in webhooks when started with the `--dialin` flag. This allows you to test phone call integrations locally before deploying to production.
### Enabling Dial-In Support
```bash theme={null}
python bot.py -t daily --dialin
# Webhook endpoint available at:
# POST http://localhost:7860/daily-dialin-webhook
```
### How It Works
When a phone call dials into your Daily phone number:
1. **Daily sends a webhook** to your configured endpoint with call details
2. **The runner creates a SIP-enabled room** with appropriate configuration
3. **Your bot is spawned** with dial-in context in `runner_args.body`
4. **The call is connected** to the room where your bot is waiting
### Webhook Payload
Daily sends the following data to the `/daily-dialin-webhook` endpoint:
```json theme={null}
{
"From": "+15551234567",
"To": "+15559876543",
"callId": "uuid-call-id",
"callDomain": "uuid-call-domain",
"sipHeaders": {}
}
```
### Accessing Dial-In Data in Your Bot
The dial-in webhook data is passed to your bot through `DailyRunnerArguments.body`. You need to parse it and configure the Daily transport with dial-in settings:
```python theme={null}
from pipecat.runner.types import DailyDialinRequest, RunnerArguments
from pipecat.transports.daily.transport import DailyDialinSettings, DailyParams, DailyTransport
async def bot(runner_args: RunnerArguments):
# Parse the dial-in request from the runner
request = DailyDialinRequest.model_validate(runner_args.body)
# Configure Daily transport with dial-in settings
daily_dialin_settings = DailyDialinSettings(
call_id=request.dialin_settings.call_id,
call_domain=request.dialin_settings.call_domain,
)
transport = DailyTransport(
runner_args.room_url,
runner_args.token,
"Daily PSTN Dial-in Bot",
params=DailyParams(
api_key=request.daily_api_key,
api_url=request.daily_api_url,
dialin_settings=daily_dialin_settings,
audio_in_enabled=True,
audio_out_enabled=True,
),
)
# Log caller information if available
if request.dialin_settings.From:
print(f"Handling call from: {request.dialin_settings.From}")
# Your bot logic here
await run_bot(transport)
```
### Configuring Daily Phone Numbers
To test dial-in locally, you need to:
1. **Configure a Daily phone number** in your Daily dashboard
2. **Set the webhook URL** to your public endpoint (e.g., via ngrok)
3. **Ensure `DAILY_API_KEY` is set** in your environment
```bash theme={null}
# Example with ngrok
ngrok http 7860
# Configure webhook URL in Daily dashboard:
# https://your-subdomain.ngrok.io/daily-dialin-webhook
```
The `--dialin` flag requires the Daily transport (`-t daily`) and a valid
`DAILY_API_KEY` environment variable.
## Quick Reference
| Transport | Command | Access | Environment Variables |
| ------------- | --------------------------------------------------- | ------------------------------------------------------------ | --------------------------------------------------- |
| WebRTC | `python bot.py` | [http://localhost:7860/client](http://localhost:7860/client) | None |
| Daily | `python bot.py -t daily` | [http://localhost:7860](http://localhost:7860) | `DAILY_API_KEY`, `DAILY_SAMPLE_ROOM_URL` (Optional) |
| Daily Direct | `python bot.py -d` | Direct connection | `DAILY_API_KEY`, `DAILY_SAMPLE_ROOM_URL` (Optional) |
| Daily Dial-In | `python bot.py -t daily --dialin` | Phone calls via Daily PSTN | `DAILY_API_KEY` (Required) |
| Twilio | `python bot.py -t twilio -x proxy.ngrok.io` | Phone calls | `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN` |
| Telnyx | `python bot.py -t telnyx -x proxy.ngrok.io` | Phone calls | `TELNYX_API_KEY` |
| Plivo | `python bot.py -t plivo -x proxy.ngrok.io` | Phone calls | `PLIVO_AUTH_ID`, `PLIVO_AUTH_TOKEN` |
| Exotel | `python bot.py -t exotel` | Phone calls | None |
| ESP32 WebRTC | `python bot.py -t webrtc --esp32 --host ` | ESP32 WebRTC connection | None |
## Examples
For practical examples of using the development runner with different transports, check out the following:
Explore the examples for different ways to use the development runner with
various transports.
# Transport Utilities
Source: https://docs.pipecat.ai/api-reference/server/utilities/runner/transport-utils
Configuration and helper utilities for Daily, LiveKit, telephony, and WebRTC transports
## Overview
Pipecat provides several utility modules for configuring transports and handling transport-specific operations. While the [development runner](/api-reference/server/utilities/runner/guide) handles most of these automatically, these utilities are useful for custom setups, advanced configurations, or when building your own deployment infrastructure.
## Daily Configuration
The Daily utilities handle room creation, token generation, and authentication setup for Daily integration. They support both standard video/audio rooms and SIP-enabled rooms for telephony integration.
### Basic Configuration
Use `configure()` for simple room and token setup:
```python theme={null}
import aiohttp
from pipecat.runner.daily import configure
async with aiohttp.ClientSession() as session:
# Returns a DailyRoomConfig object
room_config = await configure(session)
room_url = room_config.room_url
token = room_config.token
# Use with DailyTransport
transport = DailyTransport(room_url, token, "Bot Name", params=DailyParams())
```
The `configure()` function returns a `DailyRoomConfig` object with:
* `room_url`: The Daily room URL for joining
* `token`: Authentication token for the bot
* `sip_endpoint`: SIP endpoint URI (None for standard rooms)
This function:
* Uses `DAILY_SAMPLE_ROOM_URL` environment variable if set, otherwise creates a new room
* Creates rooms with 2-hour expiration and automatic ejection by default
* Generates an authentication token with 2-hour expiration using `DAILY_API_KEY`
### Environment Variables
**Required**:
* `DAILY_API_KEY`: Daily API key for creating rooms and tokens
**Optional**:
* `DAILY_SAMPLE_ROOM_URL`: Use an existing room instead of creating one
* `DAILY_API_URL`: Override Daily API endpoint (defaults to [https://api.daily.co/v1](https://api.daily.co/v1))
### Room and Token Management
When no `DAILY_SAMPLE_ROOM_URL` is provided, rooms are created automatically with:
* 2-hour expiration from creation time
* Automatic participant ejection when the room expires
* Unique names using UUID prefixes (e.g. `pipecat-uuid`)
Expired rooms are automatically cleaned up by Daily, so you don't need to
manage them manually.
Tokens are generated with 2-hour expiration and include necessary permissions for bot participation. The utilities handle all the Daily REST API interactions automatically.
### SIP Configuration
Daily utilities support SIP-enabled rooms for telephony integration. When SIP parameters are provided, the function creates rooms with telephony capabilities:
```python theme={null}
import aiohttp
from pipecat.runner.daily import configure
async with aiohttp.ClientSession() as session:
# Create SIP-enabled room
sip_config = await configure(
session,
sip_caller_phone="+15551234567",
sip_enable_video=False, # Voice-only by default
sip_num_endpoints=1,
sip_codecs={"audio": ["OPUS"], "video": ["H264"]}
)
print(f"Room URL: {sip_config.room_url}")
print(f"SIP Endpoint: {sip_config.sip_endpoint}")
print(f"Token: {sip_config.token}")
```
**SIP Parameters**:
* `sip_caller_phone`: Phone number or identifier for SIP display name (enables SIP mode)
* `sip_enable_video`: Whether video is enabled for SIP calls (default: False)
* `sip_num_endpoints`: Number of allowed SIP endpoints (default: 1)
* `sip_codecs`: Audio/video codecs to support (optional, uses Daily defaults if not specified)
SIP-enabled rooms cannot use existing room URLs from `DAILY_SAMPLE_ROOM_URL`.
They always create new temporary rooms with SIP configuration.
### Backward Compatibility
The `configure()` function supports tuple unpacking:
```python theme={null}
# Tuple unpacking (legacy style)
room_url, token = await configure(session)
# Object access (recommended)
config = await configure(session)
room_url = config.room_url
token = config.token
```
## LiveKit Configuration
LiveKit utilities manage authentication tokens, room setup, and agent permissions for LiveKit server integration.
### Basic Configuration
Use `configure()` for standard setup:
```python theme={null}
from pipecat.runner.livekit import configure
url, token, room_name = await configure()
# Use with LiveKitTransport
transport = LiveKitTransport(url=url, token=token, room_name=room_name, params=LiveKitParams())
```
### Configuration with Arguments
For command-line integration:
```python theme={null}
import argparse
from pipecat.runner.livekit import configure_with_args
parser = argparse.ArgumentParser()
url, token, room_name, args = await configure_with_args(parser)
```
Supports these command-line options:
* `-r, --room`: Specify LiveKit room name
* `-u, --url`: Specify LiveKit server URL
### Token Generation
LiveKit provides two token generation functions:
**`generate_token(room_name, participant_name, api_key, api_secret)`**
Creates a standard participant token for users or testing.
**`generate_token_with_agent(room_name, participant_name, api_key, api_secret)`**
Creates an agent token with special permissions. Use this for your bots.
```python theme={null}
from pipecat.runner.livekit import generate_token_with_agent
# Generate agent token for your bot
agent_token = generate_token_with_agent("my-room", "Pipecat Bot", api_key, api_secret)
# Generate user token for testing
user_token = generate_token("my-room", "Test User", api_key, api_secret)
```
### Environment Variables
**Required**:
* `LIVEKIT_API_KEY`: LiveKit API key
* `LIVEKIT_API_SECRET`: LiveKit API secret
* `LIVEKIT_URL`: LiveKit server URL
* `LIVEKIT_ROOM_NAME`: Default room name
All environment variables are required for LiveKit to function properly.
## WebSocket and Transport Utilities
The transport utilities provide helper functions for WebSocket parsing, SDP manipulation, and transport management.
### Telephony WebSocket Parsing
Use `parse_telephony_websocket()` to auto-detect telephony providers and extract call data:
```python theme={null}
from pipecat.runner.utils import parse_telephony_websocket
transport_type, call_data = await parse_telephony_websocket(websocket)
if transport_type == "twilio":
stream_id = call_data["stream_id"]
call_id = call_data["call_id"]
elif transport_type == "telnyx":
stream_id = call_data["stream_id"]
call_control_id = call_data["call_control_id"]
outbound_encoding = call_data["outbound_encoding"]
elif transport_type == "plivo":
stream_id = call_data["stream_id"]
call_id = call_data["call_id"]
elif transport_type == "exotel":
stream_sid=call_data["stream_id"],
call_sid=call_data["call_id"],
```
The function automatically:
* Reads and parses initial WebSocket messages
* Detects provider based on message structure
* Extracts provider-specific call information
* Returns structured data for transport configuration
### Transport Helper Functions
**Client ID Detection**:
```python theme={null}
from pipecat.runner.utils import get_transport_client_id
client_id = get_transport_client_id(transport, client)
# Returns pc_id for WebRTC or participant ID for Daily
```
**Video Capture** (Daily only):
```python theme={null}
from pipecat.runner.utils import maybe_capture_participant_camera, maybe_capture_participant_screen
# Capture participant's camera
await maybe_capture_participant_camera(transport, client, framerate=30)
# Capture participant's screen
await maybe_capture_participant_screen(transport, client, framerate=15)
```
These functions safely handle transport detection and only execute if the transport supports the operation.
## When to Use These Utilities
### Automatic Usage (Most Common)
The development runner and `create_transport` utility handle these automatically. Most users won't need to call these functions directly.
### Manual Usage (Advanced)
Use these utilities directly when:
**Custom deployment infrastructure**: Building your own bot runner or deployment system
**Advanced transport configuration**: Need specific room settings, token permissions, or custom authentication
**Non-runner scenarios**: Integrating Pipecat transports into existing applications
**Testing and debugging**: Need to create rooms/tokens independently for testing
### Integration Example
Here's how you might use these utilities in a custom deployment:
```python theme={null}
import aiohttp
from pipecat.runner.daily import configure
from pipecat.runner.utils import parse_telephony_websocket
from pipecat.transports.daily.transport import DailyTransport, DailyParams
async def create_custom_bot_session(transport_type: str):
if transport_type == "daily":
async with aiohttp.ClientSession() as session:
room_config = await configure(session)
room_url = room_config.room_url
token = room_config.token
return DailyTransport(room_url, token, "Custom Bot", DailyParams())
elif transport_type == "telephony":
# Handle custom telephony setup
transport_type, call_data = await parse_telephony_websocket(websocket)
# Configure based on detected provider...
```
These utilities provide the building blocks for any transport configuration scenario while maintaining the same reliability and functionality as the development runner.
# LLMSwitcher
Source: https://docs.pipecat.ai/api-reference/server/utilities/service-switchers/llm-switcher
Dynamically switch between different LLM services with support for ad-hoc inference and unified function registration
## Overview
`LLMSwitcher` is a specialized version of `ServiceSwitcher` designed specifically for managing multiple LLM services. Beyond basic service switching, it provides convenient methods for running ad-hoc inferences and registering function handlers across all LLMs simultaneously.
This is particularly useful when you need to switch between different LLM providers based on task complexity, cost optimization, or specific model capabilities while maintaining a consistent function calling interface.
## Constructor
```python theme={null}
from pipecat.pipeline.llm_switcher import LLMSwitcher
# Uses ServiceSwitcherStrategyManual by default
switcher = LLMSwitcher(llms=[openai_service, gemini_service])
```
List of LLM service instances to switch between.
The strategy class to use for switching logic. Pass the class itself, not an
instance. Defaults to `ServiceSwitcherStrategyManual`.
## Properties
The list of LLM services managed by this switcher.
The currently active LLM service, or None if no LLMs are configured.
## Methods
### run\_inference()
Run a one-shot inference with the currently active LLM, outside of the normal pipeline flow.
```python theme={null}
result = await llm_switcher.run_inference(
context=llm_context,
max_tokens=1000,
system_instruction="You are a helpful assistant."
)
```
The LLM context containing conversation history and messages.
Additional arguments forwarded to the active LLM's `run_inference` method.
Common options include `max_tokens` (maximum tokens to generate) and
`system_instruction` (override the system prompt for this inference).
**Returns**: `Optional[str]` - The LLM's response as a string, or None if no LLM is active.
### register\_function()
Register a function handler with all LLMs in the switcher, regardless of which is currently active.
```python theme={null}
llm_switcher.register_function(
function_name="get_weather",
handler=handle_weather,
cancel_on_interruption=True
)
```
The name of the function to handle. Use None for a catch-all handler that
processes all function calls.
The async function handler. Should accept a single `FunctionCallParams`
parameter.
Legacy callback function (deprecated). Put initialization code at the top of
your handler instead.
Whether to cancel this function call when a user interruption occurs.
Optional per-tool timeout in seconds. Overrides the global `function_call_timeout_secs` for this specific function.
### register\_direct\_function()
Register a direct function handler with all LLMs in the switcher, regardless of which is currently active. Direct functions provide more control over function call execution.
```python theme={null}
llm_switcher.register_direct_function(
handler=my_direct_function,
cancel_on_interruption=True,
timeout_secs=30.0
)
```
The direct function to register. Must follow the DirectFunction protocol.
Whether to cancel this function call when a user interruption occurs.
Optional per-tool timeout in seconds. Overrides the global `function_call_timeout_secs` for this specific function.
## Usage Examples
### Basic LLM Switching
```python theme={null}
from pipecat.pipeline.llm_switcher import LLMSwitcher
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.google.llm import GoogleLLMService
from pipecat.frames.frames import ManuallySwitchServiceFrame
# Create LLM services
openai = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
gemini = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"))
# Create switcher (uses ServiceSwitcherStrategyManual by default)
llm_switcher = LLMSwitcher(llms=[openai, gemini])
# Use in pipeline
pipeline = Pipeline([
transport.input(),
stt,
context_aggregator.user(),
llm_switcher,
tts,
transport.output(),
context_aggregator.assistant()
])
# Switch to cheaper model for simple tasks
await task.queue_frame(ManuallySwitchServiceFrame(service=gpt35))
```
# ServiceSwitcher
Source: https://docs.pipecat.ai/api-reference/server/utilities/service-switchers/service-switcher
Dynamically switch between different service instances at runtime using configurable strategies
## Overview
`ServiceSwitcher` is a specialized parallel pipeline that enables dynamic switching between multiple service instances at runtime. This is useful when you need to change between different STT providers, TTS providers, or other frame processors based on user preferences, costs, performance requirements, or other runtime conditions.
The switcher uses a strategy pattern to determine which service is active. Two built-in strategies are provided: manual switching for explicit control, and automatic failover for handling service errors.
## How It Works
`ServiceSwitcher` wraps multiple services in a parallel pipeline where only the active service processes frames. Each service is "sandwiched" between two filters that check if it's the currently active service before allowing frames to pass through. When you switch services, the filters update to redirect frame flow to the newly active service.
## Constructor
```python theme={null}
from pipecat.pipeline.service_switcher import ServiceSwitcher
# Uses ServiceSwitcherStrategyManual by default
switcher = ServiceSwitcher(services=[stt_service1, stt_service2])
# Or explicitly specify a strategy
from pipecat.pipeline.service_switcher import ServiceSwitcherStrategyFailover
switcher = ServiceSwitcher(
services=[stt_service1, stt_service2],
strategy_type=ServiceSwitcherStrategyFailover
)
```
List of service instances to switch between. Can be any frame processors (STT,
TTS, or custom processors).
The strategy class to use for switching logic. Pass the class itself, not an
instance. Defaults to `ServiceSwitcherStrategyManual`.
## Switching Strategies
### ServiceSwitcherStrategyManual
The manual strategy allows explicit control over which service is active by pushing `ManuallySwitchServiceFrame` frames into the pipeline.
**Initial State**: The first service in the list is active by default.
**Switching**: Push a `ManuallySwitchServiceFrame` with the desired service instance.
### ServiceSwitcherStrategyFailover
The failover strategy automatically switches to the next available service when the active service reports a non-fatal error. This enables automatic recovery from service failures without manual intervention.
**Initial State**: The first service in the list is active by default.
**Automatic Failover**: When the active service pushes a non-fatal `ErrorFrame`, the strategy automatically switches to the next service in the list (wrapping around to the first service if needed).
**Recovery**: The failed service remains in the list and can be switched back to manually or via application logic in the `on_service_switched` event handler. This allows implementing custom recovery policies.
```python theme={null}
from pipecat.pipeline.service_switcher import ServiceSwitcher, ServiceSwitcherStrategyFailover
switcher = ServiceSwitcher(
services=[primary_stt, backup_stt],
strategy_type=ServiceSwitcherStrategyFailover
)
@switcher.strategy.event_handler("on_service_switched")
async def on_switched(strategy, service):
# Application decides when/how to recover the failed service
print(f"Switched to: {service.name}")
```
### Custom Strategies
You can create your own switching strategy by subclassing `ServiceSwitcherStrategy` and implementing the `handle_frame` and/or `handle_error` methods.
* **`handle_frame(frame, direction)`**: Called for control frames (like `ManuallySwitchServiceFrame`). Should return the newly active service if a switch occurred, or `None` otherwise.
* **`handle_error(error)`**: Called when the active service reports a non-fatal error. Override this to implement custom error-handling logic. Should return the newly active service if a switch occurred, or `None` otherwise.
Additionally, if you want to maintain either manual switching or automatic failover as an option when writing a custom strategy, your new strategy should inherit from `ServiceSwitcherStrategyManual` or `ServiceSwitcherStrategyFailover`, respectively.
## Usage Examples
### Switching Between TTS Services
```python theme={null}
from pipecat.frames.frames import ManuallySwitchServiceFrame
from pipecat.pipeline.service_switcher import ServiceSwitcher
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.services.cartesia.tts import CartesiaTTSService
# Create TTS services
elevenlabs = ElevenLabsTTSService(api_key=os.getenv("ELEVENLABS_API_KEY"), voice_id=os.getenv("ELEVENLABS_VOICE_ID"))
cartesia = CartesiaTTSService(api_key=os.getenv("CARTESIA_API_KEY"), voice_id=os.getenv("CARTESIA_VOICE_ID"))
# Create switcher with both services (uses ServiceSwitcherStrategyManual by default)
tts_switcher = ServiceSwitcher(services=[elevenlabs, cartesia])
# Use in pipeline
pipeline = Pipeline([
transport.input(),
stt,
context_aggregator.user(),
llm,
tts_switcher,
transport.output(),
context_aggregator.assistant()
])
# Later, switch to Cartesia
await task.queue_frame(ManuallySwitchServiceFrame(service=cartesia))
```
## Event Handlers
| Event | Description |
| --------------------- | --------------------------- |
| `on_service_switched` | Active service was switched |
```python theme={null}
@switcher.event_handler("on_service_switched")
async def on_service_switched(switcher, service):
print(f"Switched to: {service.name}")
```
**Parameters:**
| Parameter | Type | Description |
| ---------- | ------------------------- | ------------------------ |
| `switcher` | `ServiceSwitcherStrategy` | The switcher instance |
| `service` | `FrameProcessor` | The newly active service |
# MarkdownTextFilter
Source: https://docs.pipecat.ai/api-reference/server/utilities/text/markdown-text-filter
Converts Markdown-formatted text to TTS-friendly plain text while preserving structure
## Overview
`MarkdownTextFilter` transforms Markdown-formatted text into plain text that's suitable for text-to-speech (TTS) systems. It intelligently removes formatting elements while preserving the content structure, including proper spacing and list formatting.
This filter is especially valuable for LLM-generated content, which often includes Markdown formatting that would sound unnatural if read aloud by a TTS system.
## Constructor
```python theme={null}
filter = MarkdownTextFilter(params=InputParams())
```
Configuration parameters for the filter
### Input Parameters
Configure the filter behavior with these options:
Whether the filter is active (when False, text passes through unchanged)
Whether to remove code blocks from the output
Whether to remove Markdown tables from the output
## Features
The filter handles these Markdown elements:
* **Basic Formatting**: Removes `*italic*`, `**bold**`, and other formatting markers
* **Code**: Removes inline code ticks and optionally removes code blocks
* **Lists**: Preserves numbered lists while removing Markdown formatting
* **Tables**: Optionally removes Markdown tables
* **Whitespace**: Carefully preserves meaningful whitespace for natural speech
* **HTML**: Removes HTML tags and converts entities to their plain text equivalents
## Usage Examples
### Basic Usage with TTS Service
```python theme={null}
from pipecat.utils.text.markdown_text_filter import MarkdownTextFilter
from pipecat.services.cartesia.tts import CartesiaTTSService
# Create the filter
md_filter = MarkdownTextFilter()
# Use with TTS service
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="voice_id_here",
text_filter=md_filter
)
```
### Custom Configuration
```python theme={null}
# Create filter that removes code blocks and tables
md_filter = MarkdownTextFilter(
params=MarkdownTextFilter.InputParams(
filter_code=True,
filter_tables=True
)
)
```
## What Gets Removed
| Markdown Feature | Example | Result |
| -------------------------- | ------------------------ | ------------ |
| Bold | `**important**` | `important` |
| Italic | `*emphasized*` | `emphasized` |
| Headers | `## Section` | `Section` |
| Code (inline) | `` `code` `` | `code` |
| Code blocks (when enabled) | ` ```python\ncode\n``` ` | ` ` |
| Tables (when enabled) | `\|A\|B\|\n\|--\|--\|` | ` ` |
| HTML tags | `text` | `text` |
| Repeated characters | `!!!!!!!` | `!` |
## Notes
* Preserves sentence structure and readability
* Maintains whitespace that affects speech prosody
* Handles streaming text with partial Markdown elements
* Efficiently converts HTML entities to plain text characters
* Smart handling of code blocks and tables with state tracking
* Integrates directly with TTS services in the Pipecat pipeline
# Text Aggregators and Filters
Source: https://docs.pipecat.ai/api-reference/server/utilities/text/overview
An overview of text aggregators and filters available in the server utilities
## Overview
The terms "aggregators" and "filters" are overloaded in Pipecat and can refer to different components depending on the context. This document provides a high-level overview of the low-level text utilities often used by various pipeline services, like the TTS. These utilities operate on streaming text, allowing for dynamic processing, transformation, and aggregation of text data as it flows through the system. They are essentially text in -> text out (but maybe with some metadata about the text too).
## Aggregators
Aggregators are components that collect and combine text data over time, often buffering input until certain conditions are met (like sentence boundaries or specific patterns). They can modify, enhance, or restructure the text before passing it along. Examples include:
* SimpleTextAggregator: Buffers text until sentence boundaries are detected. This is the default aggregator used by most TTS services.
* SkipTagsAggregator: Buffers text until sentence boundaries, while skipping over specified tags so that sentences are not prematurely detected due to characters in between those tags. This is the default aggregator used by Cartesia and Rime in order to skip over custom spell tags.
* [PatternPairAggregator](./pattern-pair-aggregator): Buffers text until either a sentence boundary is detected or a complete pattern pair (like XML tags or custom delimiters) is found. Patterns found can either be removed, left in but trigger a registered callback, or aggregated separately. Useful for voice switching, structured content processing, and extracting metadata from LLM outputs.
### Usage Pattern
After initial setup, aggregators are used by repeatedly calling their `aggregate()` method with incoming text chunks. The aggregator processes the text according to its logic and returns aggregated text when appropriate. If no complete aggregation is ready, it will return `None`, otherwise it will return an `Aggregation` object containing the aggregated text along with a `type` string meant to convey the nature of the aggregation (e.g., "sentence", "xml", etc.).
```python theme={null}
from pipecat.utils.text.simple_text_aggregator import SimpleTextAggregator
aggregator = SimpleTextAggregator()
while streaming_text:
chunk = get_next_text_chunk()
aggregation = aggregator.aggregate(chunk)
if aggregation:
process_aggregated_text(aggregation.text, aggregation.type)
```
### Custom Aggregators
You can create custom aggregators by subclassing the `BaseTextAggregator` class and implementing your own aggregation logic in the `aggregate()` method. This allows for tailored text processing to meet specific application needs.
## Filters
Filters are components that process text data in a streaming fashion, transforming or modifying it as it passes through. Unlike aggregators, filters typically do not buffer text but instead operate on each chunk of text individually. Currently, the only built-in filter is the [`MarkdownTextFilter`](./markdown-text-filter), which processes markdown syntax in streaming text, converting it into TTS-friendly plain text.
### Usage Pattern
Filters are used by calling their `filter()` method with incoming text chunks. The filter processes the text and returns the transformed text immediately.
```python theme={null}
from pipecat.utils.text.markdown_text_filter import MarkdownTextFilter
filter = MarkdownTextFilter()
while streaming_text:
chunk = get_next_text_chunk()
filtered_text = filter.filter(chunk)
process_filtered_text(filtered_text)
```
### Custom Filters
You can create custom filters by subclassing the `BaseTextFilter` class and implementing your own filtering logic in the `filter()` method. This allows for specialized text transformations to suit your application's requirements.
# PatternPairAggregator
Source: https://docs.pipecat.ai/api-reference/server/utilities/text/pattern-pair-aggregator
Text aggregator that identifies and processes content between pattern pairs in streaming text
## Overview
`PatternPairAggregator` is a specialized text aggregator that buffers streaming text until it can identify complete pattern pairs (like XML tags, markdown formatting, or custom delimiters). It processes the content between these patterns using a set of pre-defined actions (remove, keep, or aggregate) and returns text outside those patterns at sentence boundaries. The aggregator supports registering callback functions that are invoked when specific pattern pairs are matched, allowing for custom processing when matches occur. Note: These callbacks do not support modifying the text being aggregated; they are intended for side effects like logging or updating state.
This aggregator is particularly useful for applications like voice switching, structured content processing, and extracting metadata from LLM outputs, ensuring that patterns spanning multiple text chunks are correctly identified or categorizing text based on embedded markers for downstream services and processing to treat different segments appropriately. For example: identifying URL patterns, code blocks, or special formatting in LLM responses that may need special speech handling in the TTS or client-side handling via RTVI.
Want to see it in action? Check out the [voice switching
demo](https://github.com/pipecat-ai/pipecat/blob/main/examples/features/features-pattern-pair-voice-switching.py)
or the [bot output
demo](https://github.com/pipecat-ai/pipecat-examples/blob/main/code-helper).
## Constructor
```python theme={null}
pattern_aggregator = PatternPairAggregator()
```
No parameters are required for initialization. The aggregator starts with an empty buffer and no registered patterns.
## Methods
### add\_pattern
```python theme={null}
pattern_aggregator.add_pattern(
type,
start_pattern,
end_pattern,
action
)
```
Registers a new pattern pair to detect in the text.
Unique identifier for this pattern pair that should also represent what the
text between the tags represents (e.g., "voice", "xml", "credit\_card", etc.).
This value will be returned as part of both PatternMatch provided to callbacks
and the Aggregation object returned from `aggregate()`.
This type may not be set to "sentence" or "word" as those are reserved for
standard aggregations.
Choose descriptive and unique type names to avoid confusion when handling
multiple patterns. This type will also be referenced in the TTS for
optionally skipping these types or providing custom text transformations
before speaking these types. It's also optionally referenced in RTVI for
similar purposes of not sending certain types or transforming them before
sending to the client.
Pattern that marks the beginning of content
Pattern that marks the end of content
What to do with the matched pattern and its content:
* `MatchAction.REMOVE`: The text along with its delimiters will be removed from
the streaming text. Sentence aggregation will continue on as if this text did not
exist.
* `MatchAction.KEEP`: The delimiters will be removed, but the content between them
will be kept. Sentence aggregation will continue on with the internal text included.
This is helpful if you want to keep the content but be notified when it occurs via
a callback.
* `MatchAction.AGGREGATE`: Aggregate the matched pattern and its content as a separate
aggregation. The matched content will be returned in an `Aggregation` object with
the specified type when the pattern is completed. When the start of this pattern is
detected, any buffered text up to that point will be returned as a standard "sentence"
aggregation.
Self for method chaining
### add\_pattern\_pair
This method is deprecated. Use `add_pattern` instead.
```python theme={null}
pattern_aggregator.add_pattern_pair(
pattern_id,
start_pattern,
end_pattern,
remove_match=True
)
```
### on\_pattern\_match
```python theme={null}
pattern_aggregator.on_pattern_match(type, handler)
```
Registers a handler function to be called when a specific pattern pair is matched.
The pattern pair type to listen for (as defined in `add_pattern`)
Function to call when the pattern is matched. The function should accept a
PatternMatch object.
Self for method chaining
## Pattern Match Object
When a pattern is matched, the handler function receives a `PatternMatch` object which is a subclass of the `Aggregation` object. It contains the following fields:
The identifier and descriptor of the matched pattern pair. This field is part
of the `Aggregation` base class.
The text content between the start and end patterns. This field is part of the
`Aggregation` base class.
The complete text including start and end patterns.
## Usage Examples
### Voice Switching in TTS
This example demonstrates finding custom `` tags in streaming text to switch voices dynamically in a TTS service like Cartesia. It removes the tags and the content between them, such that the content is treated as if it does not exist. It will not be spoken by the TTS, it will not be added to the context, and it will not be sent to clients via RTVI. Instead, it simply triggers a voice switch side effect.
```python theme={null}
# Define voice IDs
VOICE_IDS = {
"narrator": "c45bc5ec-dc68-4feb-8829-6e6b2748095d",
"female": "71a7ad14-091c-4e8e-a314-022ece01c121",
"male": "7cf0e2b1-8daf-4fe4-89ad-f6039398f359",
}
# Initialize TTS service, starting with default narrator voice
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id=VOICE_IDS["narrator"],
)
# Create pattern aggregator
pattern_aggregator = PatternPairAggregator()
# Add pattern for voice tags
pattern_aggregator.add_pattern(
type="voice",
start_pattern="",
end_pattern="",
action=MatchAction.REMOVE
)
# Register handler for voice switching
def on_voice_tag(match: PatternMatch):
voice_name = match.content.strip().lower()
if voice_name in VOICE_IDS:
voice_id = VOICE_IDS[voice_name]
tts.set_voice(voice_id)
logger.info(f"Switched to {voice_name} voice")
pattern_aggregator.on_pattern_match("voice", on_voice_tag)
# Set the aggregator on an LLMTextProcessor
llm_text_processor = LLMTextProcessor(text_aggregator=pattern_aggregator)
# add the llm_text_processor to your pipeline after the llm and before the tts
# llm -> llm_text_processor -> tts
```
### Extracting Structured Data from LLM Outputs
This example shows how to extract JSON data blocks from LLM outputs, aggregating them separately to be removed from the spoken text, but not from the context or client display.
````python theme={null}
# Create pattern aggregator
pattern_aggregator = PatternPairAggregator()
# Add pattern for JSON data
pattern_aggregator.add_pattern(
type="json",
start_pattern="```json",
end_pattern="```",
action=MatchAction.AGGREGATE
)
# Set the aggregator on an LLMTextProcessor
llm_text_processor = LLMTextProcessor(text_aggregator=pattern_aggregator)
# Initialize TTS service, and don't speak JSON data
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
skip_aggregator_types=["json"],
)
# add the llm_text_processor to your pipeline after the llm and before the tts
# llm -> llm_text_processor -> tts
````
### Handling Special Values in LLM Output
This example demonstrates how to identify and process custom tags in LLM output that denote special content, such as credit cards that should be handled differently by downstream services.
In this case, the TTS should spell it out, while RTVI should obfuscate the number.
```python theme={null}
from pipecat.utils.text.pattern_pair_aggregator import MatchAction, PatternPairAggregator
# Create pattern aggregator
pattern_aggregator = PatternPairAggregator()
# Add patterns for different parts of an explanation
pattern_aggregator.add_pattern(
type="credit_card",
start_pattern="",
end_pattern="",
action=MatchAction.AGGREGATE
)
# Set the aggregator on an LLMTextProcessor
llm_text_processor = LLMTextProcessor(text_aggregator=pattern_aggregator)
# Text-to-Speech service
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
)
# Text transformers for TTS
# This will insert Cartesia's spell tags around the provided text.
async def spell_out_text(text: str, type: str) -> str:
return CartesiaTTSService.SPELL(text)
# Setup the text transformers in TTS to spell out credit card numbers.
# The string below matches the type defined in the PatternPairAggregator
# above so that whenever those segments are encountered, this transform
# is applied
tts.add_text_transformer(spell_out_text, "credit_card")
# RTVI is automatically enabled. Use rtvi_observer_params to add
# a text transformer that obfuscates credit card numbers in client output.
from pipecat.processors.frameworks.rtvi import RTVIObserverParams
def obfuscate_credit_card(text: str, type: str) -> str:
return "XXXX-XXXX-XXXX-" + text[-4:]
task = PipelineTask(
pipeline, # llm -> llm_text_processor -> tts
rtvi_observer_params=RTVIObserverParams(
bot_output_transforms=[("credit_card", obfuscate_credit_card)],
),
)
```
## How It Works
```mermaid theme={null}
flowchart TD
A[New Text] --> B[Add to Buffer]
B --> C{Start & End Patterns Found?}
C --> |Yes| C1[Call Handlers]
C1 --> D{Pattern Action?}
D --> E{{AGGREGATE}}
E --> E1[Reset buffer with trailing text]
E1 --> E2[RETURN contents as Aggregation with specified type]
D --> F{{REMOVE}}
F --> F1[Remove pattern from buffer]
F1 --> S{End of Sentence?}
D --> G{{KEEP}}
G --> S
C --> |No| H{Start Pattern Found?}
H --> |No| S
H --> |Yes| I{Pattern Action == AGGREGATE?}
I --> |Yes| I1[Reset buffer to pattern start]
I1 --> I2[RETURN prior buffer as Aggregation of type 'sentence']
S --> |Yes| S1[return Aggregation of type 'sentence']
S --> | No| S2[return None]
```
## Notes
* Patterns are processed in the order they appear in the text
* Handlers are called when complete patterns are found
* Patterns can span multiple sentences of text, but be aware that encoding many "reasoning" tokens may slow down the LLM response
# TranscriptProcessor
Source: https://docs.pipecat.ai/api-reference/server/utilities/transcript-processor
Factory for creating and managing conversation transcript processors with shared event handling
DEPRECATED: TranscriptProcessor has been deprecated. Use
`on_user_turn_stopped` and `on_assistant_turn_stopped` events on the context
aggregators to collect transcriptions, see
[Transcriptions](/api-reference/server/utilities/turn-management/transcriptions) for
details.
## Overview
The `TranscriptProcessor` is a factory class that creates and manages processors for handling conversation transcripts from both users and assistants. It provides unified access to transcript processors with shared event handling, making it easy to track and respond to conversation updates in real-time.
The processor normalizes messages from various sources into a consistent `TranscriptionMessage` format and emits events when new messages are added to the conversation.
## Constructor
```python theme={null}
TranscriptProcessor()
```
Creates a new transcript processor factory with no parameters.
## Methods
### user()
```python theme={null}
def user(**kwargs) -> UserTranscriptProcessor
```
Get or create the user transcript processor instance. This processor handles `TranscriptionFrame`s from STT services.
**Parameters:**
* `**kwargs`: Arguments passed to the `UserTranscriptProcessor` constructor
**Returns:** `UserTranscriptProcessor` instance for processing user messages.
### assistant()
```python theme={null}
def assistant(**kwargs) -> AssistantTranscriptProcessor
```
Get or create the assistant transcript processor instance. This processor handles `TTSTextFrame`s from TTS services and aggregates them into complete utterances.
**Parameters:**
* `**kwargs`: Arguments passed to the `AssistantTranscriptProcessor` constructor
**Returns:** `AssistantTranscriptProcessor` instance for processing assistant messages.
### event\_handler()
```python theme={null}
def event_handler(event_name: str)
```
Decorator that registers event handlers for both user and assistant processors.
**Parameters:**
* `event_name`: Name of the event to handle
**Returns:** Decorator function that registers the handler with both processors.
## Event Handlers
### on\_transcript\_update
Triggered when new messages are added to the conversation transcript.
```python theme={null}
@transcript.event_handler("on_transcript_update")
async def handle_transcript_update(processor, frame):
# Handle transcript updates
pass
```
**Parameters:**
* `processor`: The specific processor instance that emitted the event (UserTranscriptProcessor or AssistantTranscriptProcessor)
* `frame`: `TranscriptionUpdateFrame` containing the new messages
## Data Structures
### TranscriptionMessage
```python theme={null}
@dataclass
class TranscriptionMessage:
role: Literal["user", "assistant"]
content: str
timestamp: str | None = None
user_id: str | None = None
```
**Fields:**
* `role`: The message sender type ("user" or "assistant")
* `content`: The transcribed text content
* `timestamp`: ISO 8601 timestamp when the message was created
* `user_id`: Optional user identifier (for user messages only)
### TranscriptionUpdateFrame
Frame containing new transcript messages, emitted by the `on_transcript_update` event.
**Properties:**
* `messages`: List of `TranscriptionMessage` objects containing the new transcript content
## Frames
### UserTranscriptProcessor
* **Input:** `TranscriptionFrame` from STT services
* **Output:** `TranscriptionMessage` with role "user"
### AssistantTranscriptProcessor
* **Input:** `TTSTextFrame` from TTS services
* **Output:** `TranscriptionMessage` with role "assistant"
## Integration Notes
### Pipeline Placement
Place the processors at specific positions in your pipeline for accurate transcript collection:
```python theme={null}
pipeline = Pipeline([
transport.input(),
stt, # Speech-to-text service
transcript.user(), # Place after STT
context_aggregator.user(),
llm,
tts, # Text-to-speech service
transport.output(),
transcript.assistant(), # Place after transport.output()
context_aggregator.assistant(),
])
```
### Event Handler Registration
Event handlers are automatically applied to both user and assistant processors:
```python theme={null}
transcript = TranscriptProcessor()
# This handler will receive events from both processors
@transcript.event_handler("on_transcript_update")
async def handle_update(processor, frame):
for message in frame.messages:
print(f"{message.role}: {message.content}")
```
# Fal Smart Turn
Source: https://docs.pipecat.ai/api-reference/server/utilities/turn-detection/fal-smart-turn
Cloud-hosted Smart Turn detection using Fal.ai
DEPRECATED: `FalSmartTurnAnalyzer` is deprecated. Please use
[LocalSmartTurnAnalyzerV3](/api-reference/server/utilities/turn-detection/smart-turn-overview#local-smart-turn)
instead, which provides fast CPU inference without requiring external API
calls.
## Overview
`FalSmartTurnAnalyzer` provides an easy way to use Smart Turn detection via Fal.ai's cloud infrastructure. This implementation requires minimal setup - just an API key - and offers scalable inference without having to manage your own servers.
## Installation
```bash theme={null}
pip install "pipecat-ai[remote-smart-turn]"
```
## Requirements
* A Fal.ai account and API key (get one at [Fal.ai](https://fal.ai))
* Internet connectivity for making API calls
## Configuration
### Constructor Parameters
Your Fal.ai API key for authentication (required unless using a custom
deployment)
URL endpoint for the Smart Turn API (defaults to the official Fal deployment)
An aiohttp client session for making HTTP requests
Audio sample rate (will be set by the transport if not provided)
Configuration parameters for turn detection. See
[SmartTurnParams](/api-reference/server/utilities/turn-detection/smart-turn-overview#configuration)
for details.
## Example
```python theme={null}
import os
import aiohttp
from pipecat.audio.turn.smart_turn.fal_smart_turn import FalSmartTurnAnalyzer
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.transports.base_transport import TransportParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
async def setup_transport():
async with aiohttp.ClientSession() as session:
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
)
# Configure Smart Turn Detection via user turn strategies
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(
turn_analyzer=FalSmartTurnAnalyzer(
api_key=os.getenv("FAL_SMART_TURN_API_KEY"),
aiohttp_session=session
)
)]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
# Continue with pipeline setup...
```
## Custom Deployment
You can also deploy the Smart Turn model yourself on Fal.ai and point to your custom deployment:
```python theme={null}
TurnAnalyzerUserTurnStopStrategy(
turn_analyzer=FalSmartTurnAnalyzer(
url="https://fal.run/your-username/your-deployment/raw",
api_key=os.getenv("FAL_API_KEY"),
aiohttp_session=session
)
)
```
## Performance Considerations
* **Latency**: While Fal provides global infrastructure, there will be network latency compared to local inference
* **Reliability**: Depends on network connectivity and Fal.ai service availability
* **Scalability**: Handles scaling automatically based on your usage
## Notes
* Fal handles the model hosting, scaling, and infrastructure management
* The session timeout is controlled by the `stop_secs` parameter
* For high-throughput applications, consider deploying your own inference service
# Krisp VIVA Turn
Source: https://docs.pipecat.ai/api-reference/server/utilities/turn-detection/krisp-viva-turn
Turn detection using Krisp VIVA SDK
## Overview
`KrispVivaTurn` is a turn analyzer that uses Krisp's VIVA SDK turn detection (Tt) API to determine when a user has finished speaking. Unlike the [Smart Turn model](/api-reference/server/utilities/turn-detection/smart-turn-overview) which analyzes audio in batches when VAD detects a pause, `KrispVivaTurn` processes audio frame-by-frame in real time using Krisp's streaming model.
Complete example with Krisp VIVA voice isolation and turn detection
Get the Krisp SDK and API key
## Installation
`KrispVivaTurn` requires the Krisp Python SDK. See the [Krisp VIVA guide](/pipecat/features/krisp-viva) for installation instructions.
## Environment Variables
You need to provide the path to the Krisp turn detection model file (.kef extension). This can either be done by setting the `KRISP_VIVA_TURN_MODEL_PATH` environment variable or by passing `model_path` to the constructor.
For SDK v1.6.1+, you also need to provide a Krisp API key via the `api_key` constructor parameter or the `KRISP_VIVA_API_KEY` environment variable.
```bash theme={null}
KRISP_VIVA_TURN_MODEL_PATH=/path/to/krisp-viva-tt-v2.kef
KRISP_VIVA_API_KEY=your_api_key_here
```
## Configuration
The `KrispTurnParams` class configures turn detection behavior:
Probability threshold for turn completion (0.0 to 1.0). Higher values require
more confidence before marking a turn as complete.
Frame duration in milliseconds for turn detection. Supported values: 10, 15,
20, 30, 32.
## Constructor Parameters
Path to the Krisp turn detection model file (.kef extension). If not provided,
falls back to the `KRISP_VIVA_TURN_MODEL_PATH` environment variable.
Audio sample rate (will be set by the transport if not provided).
Configuration parameters for turn detection.
Krisp SDK API key for licensing (required for SDK v1.6.1+). If empty, falls
back to the `KRISP_VIVA_API_KEY` environment variable.
## Example
```python theme={null}
from pipecat.audio.turn.krisp_viva_turn import KrispVivaTurn
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
# Configure Krisp turn detection via user turn strategies
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(
turn_analyzer=KrispVivaTurn()
)]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
```
## How It Works
`KrispVivaTurn` processes audio as a streaming model, analyzing each audio frame in real time:
1. **Frame-by-frame processing**: Each incoming audio frame is processed by the Krisp turn detection model, which outputs a probability that the user's turn is complete.
2. **Speech tracking**: VAD signals are used to track when speech starts and stops.
3. **Threshold crossing**: When the model's probability exceeds the configured `threshold` after speech has been detected, the turn is marked as complete.
This differs from the [Smart Turn model](/api-reference/server/utilities/turn-detection/smart-turn-overview) which buffers audio and runs batch inference when VAD detects a pause. `KrispVivaTurn` makes its decision continuously as audio flows through, which can result in faster turn detection.
## Notes
* Requires a valid Krisp SDK license and turn detection model file
* Works with any VAD analyzer (Silero is recommended)
* Emits `TurnMetricsData` with end-to-end processing time, measuring the interval from VAD speech-to-silence transition to the model crossing the probability threshold
# Smart Turn Overview
Source: https://docs.pipecat.ai/api-reference/server/utilities/turn-detection/smart-turn-overview
Advanced conversational turn detection powered by the smart-turn model
## Overview
Smart Turn Detection is an advanced feature in Pipecat that determines when a user has finished speaking and the bot should respond. Unlike basic Voice Activity Detection (VAD) which only detects speech vs. non-speech, Smart Turn Detection uses a machine learning model to recognize natural conversational cues like intonation patterns and linguistic signals.
Open source model for advanced conversational turn detection. Contribute to
model training and development.
Contribute conversational data to improve the smart-turn model
Help classify turn completion patterns in conversations
Pipecat provides `LocalSmartTurnAnalyzerV3` which runs inference locally using ONNX. This is the recommended approach due to the fast CPU inference times in Smart Turn v3.
As of v0.0.102, `TurnAnalyzerUserTurnStopStrategy` with
`LocalSmartTurnAnalyzerV3` is the **default** user turn stop strategy in
Pipecat. You no longer need to explicitly configure it unless you want to
customize its parameters.
## Installation
Smart Turn dependencies (`transformers`, `onnxruntime`) are included with the core `pipecat-ai` package — no extra installation is needed.
```bash theme={null}
pip install pipecat-ai
```
The Smart Turn model weights are bundled with Pipecat, so no need to download these separately.
## Integration with User Turn Strategies
Smart Turn Detection is integrated into your application by configuring a `TurnAnalyzerUserTurnStopStrategy` with `LocalSmartTurnAnalyzerV3` in your context aggregator:
```python theme={null}
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.transports.base_transport import TransportParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
),
)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(
turn_analyzer=LocalSmartTurnAnalyzerV3()
)]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
```
Smart Turn Detection requires VAD to be enabled and works best when the VAD
analyzer is set to a short `stop_secs` value. We recommend 0.2 seconds, which
is the default value.
## Configuration
The `SmartTurnParams` class configures turn detection behavior:
Duration of silence in seconds required before triggering a silence-based end
of turn
Amount of audio (in milliseconds) to include before speech is detected
Maximum allowed segment duration in seconds. For segments longer than this
value, a rolling window is used.
## Local Smart Turn
The `LocalSmartTurnAnalyzerV3` runs inference locally. Version 3 of the model supports fast CPU inference on ordinary cloud instances.
### Constructor Parameters
Path to the Smart Turn v3 ONNX file containing the model weights. Download this from
[https://huggingface.co/pipecat-ai/smart-turn-v3/tree/main](https://huggingface.co/pipecat-ai/smart-turn-v3/tree/main)
This parameter is optional, as Pipecat includes a copy of the model internally, and this
is used if the path is unset.
Audio sample rate (will be set by the transport if not provided)
Configuration parameters for turn detection
### Example
```python theme={null}
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.transports.base_transport import TransportParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
# Create the transport
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
)
# Configure Smart Turn Detection via user turn strategies
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(
turn_analyzer=LocalSmartTurnAnalyzerV3()
)]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
```
## How It Works
Smart Turn Detection continuously analyzes audio streams to identify natural turn completion points:
1. **Audio Buffering**: The system continuously buffers audio with timestamps, maintaining a small buffer of pre-speech audio.
2. **VAD Processing**: Voice Activity Detection (using the Silero model) detects when there is a pause in the user's speech.
3. **Smart Turn Analysis**: When VAD detects a pause in speech, the Smart Turn model analyzes the audio from the most recent 8 seconds of the user's turn, and makes a decision about whether the turn is complete or incomplete.
The system includes a fallback mechanism: if a turn is classified as incomplete but silence continues for longer than `stop_secs`, the turn is automatically marked as complete.
## Notes
* The model supports 23 languages, see the [source repository](https://github.com/pipecat-ai/smart-turn) for more details
* Smart Turn generally provides a more natural conversational experience but is computationally more intensive than simple VAD
* `LocalSmartTurnAnalyzerV3` is designed to run on CPU, and inference can be performed on low-cost cloud instances in under 100ms. However, by installing the `onnxruntime-gpu` dependency, you can achieve higher performance by making use of GPU inference.
# External Turn Management
Source: https://docs.pipecat.ai/api-reference/server/utilities/turn-management/external-turn-management
Handle turn detection externally using UserTurnProcessor or external services
## Overview
In some scenarios, turn detection happens externally, either through a dedicated processor or an external service. Pipecat provides `ExternalUserTurnStrategies`, a [user turn strategy](/api-reference/server/utilities/turn-management/user-turn-strategies) that defers turn handling to these external sources.
External turn management might be needed when:
* **Multiple context aggregators**: Parallel pipelines with multiple LLMs need a single, shared source of turn events
* **External services with turn detection**: Services like [Deepgram Flux](/api-reference/server/services/stt/deepgram) or [Speechmatics](/api-reference/server/services/stt/speechmatics) provide their own turn detection
In both cases, you need to configure your context aggregators with `ExternalUserTurnStrategies` to defer turn handling to the external source.
## External Services
Some speech-to-text services provide built-in turn detection. When using these services, configure your context aggregator with `ExternalUserTurnStrategies` to let the service handle turn management:
```python theme={null}
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
# Configure aggregator to use external turn strategies
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=ExternalUserTurnStrategies()
),
)
```
## UserTurnProcessor
`UserTurnProcessor` is a frame processor for managing user turn lifecycle when you need a single source of turn events shared across multiple context aggregators. It emits `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames and handles interruptions.
`UserTurnProcessor` only manages user turn start and end events. It does not
handle transcription aggregation, that remains the responsibility of the
context aggregators.
### Constructor Parameters
Configured strategies for starting and stopping user turns. See [User Turn
Strategies](/api-reference/server/utilities/turn-management/user-turn-strategies) for
available options.
Timeout in seconds to automatically stop a user turn if no stop strategy
triggers.
Timeout in seconds for detecting user idle state. The processor will emit an
`on_user_turn_idle` event when the user has been idle (not speaking) for this
duration after the bot finishes speaking. Set to `0` to disable idle
detection. See [Detecting Idle
Users](/pipecat/fundamentals/detecting-user-idle) for details.
### Event Handlers
`UserTurnProcessor` provides event handlers for turn lifecycle events:
```python theme={null}
@user_turn_processor.event_handler("on_user_turn_started")
async def on_user_turn_started(processor, strategy):
# Called when a user turn starts
pass
@user_turn_processor.event_handler("on_user_turn_stopped")
async def on_user_turn_stopped(processor, strategy):
# Called when a user turn stops
pass
@user_turn_processor.event_handler("on_user_turn_stop_timeout")
async def on_user_turn_stop_timeout(processor):
# Called if no stop strategy triggers before timeout
pass
```
### Usage with Parallel Pipelines
When using parallel pipelines with multiple context aggregators, place `UserTurnProcessor` before the parallel pipeline and configure each context aggregator with `ExternalUserTurnStrategies`:
```python theme={null}
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_processor import UserTurnProcessor
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies, UserTurnStrategies
# Create the external user turn processor with your preferred strategies
user_turn_processor = UserTurnProcessor(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(
turn_analyzer=LocalSmartTurnAnalyzerV3()
)
]
),
)
# Create contexts for each LLM
openai_context = LLMContext(openai_messages)
groq_context = LLMContext(groq_messages)
# Configure aggregators to use external turn strategies
openai_context_aggregator = LLMContextAggregatorPair(
openai_context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=ExternalUserTurnStrategies()
),
)
groq_context_aggregator = LLMContextAggregatorPair(
groq_context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=ExternalUserTurnStrategies()
),
)
# Build the pipeline with UserTurnProcessor before the parallel branches
pipeline = Pipeline(
[
transport.input(),
stt,
user_turn_processor, # Handles turn management for all branches
ParallelPipeline(
[
openai_context_aggregator.user(),
openai_llm,
transport.output(),
openai_context_aggregator.assistant(),
],
[
groq_context_aggregator.user(),
groq_llm,
groq_context_aggregator.assistant(),
],
),
]
)
```
## Related
* [User Turn Strategies](/api-reference/server/utilities/turn-management/user-turn-strategies) - Configure turn detection strategies
* [Parallel Pipeline](/api-reference/server/pipeline/parallel-pipeline) - Run multiple pipeline branches concurrently
* [Turn Events](/api-reference/server/utilities/turn-management/turn-events) - Handle turn lifecycle events
# Filter Incomplete User Turns
Source: https://docs.pipecat.ai/api-reference/server/utilities/turn-management/filter-incomplete-turns
Use LLM-based detection to suppress responses when users are cut off mid-thought
## Overview
Filter Incomplete Turns is an LLM-powered feature that detects when a user's conversational turn was incomplete (they were cut off or need time to think) and suppresses the bot's response accordingly. Instead of responding to partial input, the bot waits for the user to continue, then automatically re-engages if they remain silent.
This creates more natural conversations by:
* Preventing the bot from responding to incomplete thoughts
* Giving users time to finish speaking without interruption
* Automatically prompting users to continue after pauses
## How It Works
When enabled, the LLM outputs a turn completion marker as the first character of every response:
| Marker | Meaning | Bot Behavior |
| ------ | ---------------------------------------------------- | ---------------------------------------- |
| `✓` | **Complete** - User finished their thought | Respond normally |
| `○` | **Incomplete Short** - User was cut off mid-sentence | Suppress response, wait 5s, then prompt |
| `◐` | **Incomplete Long** - User needs time to think | Suppress response, wait 10s, then prompt |
The system automatically:
1. Injects turn completion instructions into the LLM's system prompt
2. Detects markers in the LLM's streaming response
3. Suppresses bot speech for incomplete turns
4. Starts a timeout based on the incomplete type
5. Re-prompts the LLM when the timeout expires
## Configuration
Enable the feature via `LLMUserAggregatorParams` when creating an `LLMContextAggregatorPair`:
```python theme={null}
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
filter_incomplete_user_turns=True,
),
)
```
### LLMUserAggregatorParams
Enable LLM-based turn completion detection. When True, the system
automatically appends turn completion instructions to the LLM's
system\_instruction and configures the LLM service to process turn markers.
Optional configuration object for customizing turn completion behavior. If not
provided, default values are used.
## UserTurnCompletionConfig
Use `UserTurnCompletionConfig` to customize timeouts, prompts, and instructions:
```python theme={null}
from pipecat.turns.user_turn_completion_mixin import UserTurnCompletionConfig
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
filter_incomplete_user_turns=True,
user_turn_completion_config=UserTurnCompletionConfig(
incomplete_short_timeout=5.0,
incomplete_long_timeout=10.0,
incomplete_short_prompt="Custom prompt for short pauses...",
incomplete_long_prompt="Custom prompt for long pauses...",
instructions="Custom turn completion instructions...",
),
),
)
```
### Parameters
Seconds to wait after detecting `○` (incomplete short) before re-prompting the
LLM. Use shorter values for more responsive re-engagement.
Seconds to wait after detecting `◐` (incomplete long) before re-prompting the
LLM. Use longer values to give users more time to think.
System prompt sent to the LLM when the short timeout expires. Should instruct
the LLM to generate a brief, natural prompt encouraging the user to continue.
System prompt sent to the LLM when the long timeout expires. Should instruct
the LLM to generate a friendly check-in message.
Complete turn completion instructions appended to the system prompt. Override
this to customize how the LLM determines turn completeness.
## Markers Explained
### Complete (✓)
The user has provided enough information for a meaningful response:
```
User: "I'd go to Japan because I love the culture and food."
LLM: "✓ Japan is a wonderful choice! The blend of ancient traditions..."
```
The `✓` marker tells the system to push the response normally. The marker itself is not spoken (marked with `skip_tts`).
### Incomplete Short (○)
The user was cut off mid-sentence and will likely continue soon:
```
User: "I'd go to Japan because I love"
LLM: "○"
```
The `○` marker suppresses the bot's response entirely. After 5 seconds (configurable), the LLM is prompted to re-engage with something like "Go ahead, I'm listening."
### Incomplete Long (◐)
The user needs more time to think or explicitly asked for time:
```
User: "That's a good question. Let me think..."
LLM: "◐"
```
The `◐` marker also suppresses the response, but waits 15 seconds (configurable) before prompting. This handles cases like:
* "Hold on a second"
* "Let me think about that"
* "Hmm, that's interesting..."
## Usage Examples
### Basic Usage
Enable turn completion with default settings:
```python theme={null}
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
messages = [
{
"role": "system",
"content": "You are a helpful assistant...",
}
]
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
filter_incomplete_user_turns=True,
),
)
```
You don't need to modify your system prompt. Turn completion instructions are
automatically appended when `filter_incomplete_user_turns` is enabled.
### Custom Timeouts
Adjust timeouts for your use case:
```python theme={null}
from pipecat.turns.user_turn_completion_mixin import UserTurnCompletionConfig
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
filter_incomplete_user_turns=True,
user_turn_completion_config=UserTurnCompletionConfig(
incomplete_short_timeout=3.0, # More responsive
incomplete_long_timeout=20.0, # More patient
),
),
)
```
### Custom Prompts
Customize what the LLM says when re-engaging:
```python theme={null}
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
filter_incomplete_user_turns=True,
user_turn_completion_config=UserTurnCompletionConfig(
incomplete_short_prompt="""The user paused briefly.
Generate a contextual prompt to encourage them to continue.
Respond with ✓ followed by your message.""",
incomplete_long_prompt="""The user has been quiet for a while.
Generate a friendly check-in.
Respond with ✓ followed by your message.""",
),
),
)
```
Custom prompts must instruct the LLM to respond with `✓` followed by the
message. This ensures the re-engagement message is spoken normally.
### With Smart Turn Detection
Combine with smart turn detection for better end-of-turn detection:
```python theme={null}
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(
turn_analyzer=LocalSmartTurnAnalyzerV3()
)
]
),
filter_incomplete_user_turns=True,
),
)
```
Smart turn detection helps determine when the user stops speaking, while turn
completion filtering determines whether to respond. They work well together
for natural conversations.
## Transcripts
Turn completion markers are automatically stripped from assistant transcripts emitted via the `on_assistant_turn_stopped` event. Your transcript handlers will receive clean text without markers:
```python theme={null}
@assistant_aggregator.event_handler("on_assistant_turn_stopped")
async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
# message.content will be "Japan is a wonderful choice!"
# NOT "✓ Japan is a wonderful choice!"
print(f"Assistant: {message.content}")
```
## Supported LLM Services
Turn completion detection works with any LLM service that inherits from `LLMService`:
* OpenAI (`OpenAILLMService`)
* Anthropic (`AnthropicLLMService`)
* Google Gemini (`GoogleLLMService`)
* AWS Bedrock (`AWSLLMService`)
* And other compatible services
## Graceful Degradation
If the LLM fails to output a turn marker:
1. The system logs a warning indicating markers were expected but not found
2. The buffered text is pushed normally to avoid losing the response
3. The conversation continues without interruption
This ensures the feature doesn't break conversations if the LLM occasionally disobeys instructions.
## Related
* [User Turn Strategies](/api-reference/server/utilities/turn-management/user-turn-strategies) - Configure turn detection
* [Smart Turn Detection](/api-reference/server/utilities/turn-detection/smart-turn-overview) - AI-powered end-of-turn detection
* [Transcriptions](/api-reference/server/utilities/turn-management/transcriptions) - Working with conversation transcripts
# Interruption Strategies
Source: https://docs.pipecat.ai/api-reference/server/utilities/turn-management/interruption-strategies
Configure when users can interrupt the bot to prevent unwanted interruptions from brief affirmations
DEPRECATED Interruption strategies have been deprecated in favor of [User Turn
Strategies](/api-reference/server/utilities/turn-management/user-turn-strategies).
## Overview
Interruption strategies allow you to control when users can interrupt the bot during speech. By default, any user speech immediately interrupts the bot, but this can be problematic when users engage in backchanneling—brief vocal responses like "yeah", "okay", or "mm-hmm" that indicate they're listening without intending to interrupt.
With interruption strategies, you can require users to meet specific criteria (such as speaking a minimum number of words or reaching a certain audio volume) before their speech will interrupt the bot, creating a more natural conversation flow.
Want to try it out? Check out the [interruption strategies foundational
demo](https://github.com/pipecat-ai/pipecat/blob/main/examples/turn-management/turn-management-interruption-config.py)
## Configuration
Interruption strategies are configured via the `interruption_strategies` parameter in `PipelineParams`. When specified, the normal immediate interruption behavior is replaced with conditional interruption based on your criteria.
List of interruption strategies to apply. When multiple strategies are
provided, the first one that evaluates to true will trigger the interruption.
If empty, normal interruption behavior applies.
## Base Strategy Interface
All interruption strategies inherit from `BaseInterruptionStrategy`, which provides a common interface for evaluating interruption conditions.
Appends audio data to the strategy for analysis. Not all strategies handle
audio.
Appends text to the strategy for analysis. Not all strategies handle text.
Called when the user stops speaking to determine if interruption should occur
based on accumulated audio and/or text.
Resets accumulated text and/or audio data.
## Available Strategies
### MinWordsInterruptionStrategy
Requires users to speak a minimum number of words before interrupting the bot.
Minimum number of words the user must speak to interrupt the bot. Must be
greater than 0.
```python theme={null}
from pipecat.audio.interruptions.min_words_interruption_strategy import MinWordsInterruptionStrategy
strategy = MinWordsInterruptionStrategy(min_words=3)
```
## How It Works
When interruption strategies are configured:
1. **Bot not speaking**: User speech interrupts immediately (normal behavior)
2. **Bot speaking**: User speech and audio are collected and fed to strategies
3. **User stops speaking**: Strategies are evaluated in order
4. **First match wins**: The first strategy that returns `True` triggers interruption
5. **No matches**: User speech is discarded
The system automatically handles both audio and text input:
* Audio frames (`InputAudioRawFrame`) are fed to `append_audio()`
* Transcription text is fed to `append_text()`
* Strategies can use either or both data types
## Usage Examples
### Basic Word Count Interruption
Require users to speak at least 3 words to interrupt the bot:
```python theme={null}
from pipecat.audio.interruptions.min_words_interruption_strategy import MinWordsInterruptionStrategy
from pipecat.pipeline.task import PipelineParams, PipelineTask
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
interruption_strategies=[MinWordsInterruptionStrategy(min_words=3)]
)
)
```
### Multiple Strategies with Priority
Strategies are evaluated in order, with the first match triggering interruption:
```python theme={null}
# Prioritize word count, then volume (hypothetical future strategy)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
interruption_strategies=[
MinWordsInterruptionStrategy(min_words=2), # Check first
# VolumeInterruptionStrategy(min_volume=0.8), # Your custom strategy
]
)
)
```
## Behavior Comparison
| Scenario | Without Strategy | With MinWordsInterruptionStrategy(min\_words=3) |
| --------------------------------------------- | ------------------------ | ----------------------------------------------- |
| User says "okay" while bot speaks | ✅ Interrupts immediately | ❌ Ignored (only 1 word) |
| User says "yes that's right" while bot speaks | ✅ Interrupts immediately | ✅ Interrupts (3 words) |
| User speaks while bot is silent | ✅ Processed immediately | ✅ Processed immediately |
## Notes
* Interruption strategies only affect behavior when the bot is actively speaking
* When the bot is not speaking, user input is processed immediately regardless of strategy configuration
* The `allow_interruptions` parameter must be `True` for interruption strategies to work
* User speech that doesn't meet interruption criteria is discarded, not queued
* Strategies are evaluated in order - first match wins
* Both audio and text data are automatically fed to strategies based on their implementation
* Word counting uses simple whitespace splitting for word boundaries
# Transcriptions
Source: https://docs.pipecat.ai/api-reference/server/utilities/turn-management/transcriptions
Collect user and assistant conversation transcripts using turn events
## Overview
Pipecat provides a straightforward way to collect conversation transcriptions using [turn events](/api-reference/server/utilities/turn-management/turn-events). When a user or assistant turn ends, the corresponding event includes the complete transcript for that turn.
The key events for transcription collection are:
* **`on_user_turn_stopped`** - Provides the user's complete transcript via `UserTurnStoppedMessage`
* **`on_assistant_turn_stopped`** - Provides the assistant's complete transcript via `AssistantTurnStoppedMessage`
## Basic Example
```python theme={null}
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
UserTurnStoppedMessage,
AssistantTurnStoppedMessage,
)
# Create context aggregator
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
# Handle user transcriptions
@user_aggregator.event_handler("on_user_turn_stopped")
async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
print(f"[USER] {message.content}")
# Handle assistant transcriptions
@assistant_aggregator.event_handler("on_assistant_turn_stopped")
async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
print(f"[ASSISTANT] {message.content}")
```
## Saving Transcripts to a File
```python theme={null}
import json
from datetime import datetime
transcript_log = []
@user_aggregator.event_handler("on_user_turn_stopped")
async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
transcript_log.append({
"role": "user",
"content": message.content,
"timestamp": message.timestamp,
"user_id": message.user_id,
})
@assistant_aggregator.event_handler("on_assistant_turn_stopped")
async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
transcript_log.append({
"role": "assistant",
"content": message.content,
"timestamp": message.timestamp,
})
# Save transcript when session ends
async def save_transcript():
with open(f"transcript_{datetime.now().isoformat()}.json", "w") as f:
json.dump(transcript_log, f, indent=2)
```
## Sending Transcripts to an External Service
```python theme={null}
import aiohttp
async def send_to_service(role: str, content: str, timestamp: str):
async with aiohttp.ClientSession() as session:
await session.post(
"https://api.example.com/transcripts",
json={"role": role, "content": content, "timestamp": timestamp}
)
@user_aggregator.event_handler("on_user_turn_stopped")
async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
await send_to_service("user", message.content, message.timestamp)
@assistant_aggregator.event_handler("on_assistant_turn_stopped")
async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
await send_to_service("assistant", message.content, message.timestamp)
```
## Message Types
For details on `UserTurnStoppedMessage` and `AssistantTurnStoppedMessage` fields, see [Turn Events - Message Types](/api-reference/server/utilities/turn-management/turn-events#message-types).
## Related
* [Turn Events](/api-reference/server/utilities/turn-management/turn-events) - Complete reference for turn lifecycle events
* [Saving Transcripts Guide](/pipecat/fundamentals/saving-transcripts) - Guide for saving conversation transcripts
# Turn Events
Source: https://docs.pipecat.ai/api-reference/server/utilities/turn-management/turn-events
Handle user and assistant turn lifecycle events for transcriptions and turn tracking
## Overview
Turn events provide hooks into the conversation turn lifecycle, allowing you to know when users and assistants start or stop speaking. These events are emitted by the context aggregators (`LLMUserAggregator` and `LLMAssistantAggregator`) and are particularly useful for:
* **Collecting transcriptions** - Get complete user and assistant transcripts when turns end
* **Turn tracking** - Monitor conversation flow and timing
* **Analytics** - Measure turn durations, detect timeouts, and track conversation patterns
## Events Summary
| Event | Emitter | Description |
| --------------------------- | ---------------------- | --------------------------------------------------- |
| `on_user_turn_started` | `user_aggregator` | User begins speaking |
| `on_user_turn_stopped` | `user_aggregator` | User finishes speaking (includes transcript) |
| `on_user_turn_stop_timeout` | `user_aggregator` | User turn ended due to timeout |
| `on_user_turn_idle` | `user_aggregator` | User has been idle (not speaking) for timeout |
| `on_user_mute_started` | `user_aggregator` | User input was muted |
| `on_user_mute_stopped` | `user_aggregator` | User input was unmuted |
| `on_assistant_turn_started` | `assistant_aggregator` | Assistant begins responding |
| `on_assistant_turn_stopped` | `assistant_aggregator` | Assistant finishes responding (includes transcript) |
| `on_assistant_thought` | `assistant_aggregator` | Assistant produced a thought (reasoning models) |
| `on_summary_applied` | `assistant_aggregator` | Context summarization was applied |
## User Turn Events
User turn events are registered on the `user_aggregator` from an `LLMContextAggregatorPair`.
### on\_user\_turn\_started
Fired when a user turn is detected to have started, based on the configured [start strategies](/api-reference/server/utilities/turn-management/user-turn-strategies#start-strategies).
```python theme={null}
@user_aggregator.event_handler("on_user_turn_started")
async def on_user_turn_started(aggregator, strategy):
print(f"User started speaking (detected by {strategy})")
```
**Parameters:**
| Parameter | Type | Description |
| ------------ | --------------------------- | ------------------------------------------ |
| `aggregator` | `LLMUserAggregator` | The user aggregator instance |
| `strategy` | `BaseUserTurnStartStrategy` | The strategy that triggered the turn start |
### on\_user\_turn\_stopped
Fired when a user turn is detected to have ended, based on the configured [stop strategies](/api-reference/server/utilities/turn-management/user-turn-strategies#stop-strategies). This event includes the complete user transcript for the turn.
```python theme={null}
@user_aggregator.event_handler("on_user_turn_stopped")
async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
print(f"User said: {message.content}")
print(f"Turn started at: {message.timestamp}")
if message.user_id:
print(f"User ID: {message.user_id}")
```
**Parameters:**
| Parameter | Type | Description |
| ------------ | -------------------------- | ------------------------------------------- |
| `aggregator` | `LLMUserAggregator` | The user aggregator instance |
| `strategy` | `BaseUserTurnStopStrategy` | The strategy that triggered the turn stop |
| `message` | `UserTurnStoppedMessage` | Contains the user's transcript and metadata |
### on\_user\_turn\_stop\_timeout
Fired when a user turn times out without any stop strategy triggering. This is a fallback mechanism that ends the turn after a configurable timeout period (default: 5.0 seconds) when the user has stopped speaking according to VAD but no transcription-based stop has occurred. Commonly, this event is used to retrigger the LLM response after the user has stopped speaking.
```python theme={null}
@user_aggregator.event_handler("on_user_turn_stop_timeout")
async def on_user_turn_stop_timeout(aggregator):
message = {
"role": "system",
"content": "Continue.",
}
await user_aggregator.queue_frame(LLMMessagesAppendFrame([message], run_llm=True))
```
**Parameters:**
| Parameter | Type | Description |
| ------------ | ------------------- | ---------------------------- |
| `aggregator` | `LLMUserAggregator` | The user aggregator instance |
After `on_user_turn_stop_timeout` fires, `on_user_turn_stopped` will also be
called with the accumulated transcript.
### on\_user\_turn\_idle
Fired when the user has been idle (not speaking) for a configured timeout period. This event is useful for re-engaging users who may have stepped away or need a prompt to continue the conversation. The idle timer starts when the bot finishes speaking and is cancelled when the user or bot starts speaking again.
```python theme={null}
@user_aggregator.event_handler("on_user_turn_idle")
async def on_user_turn_idle(aggregator):
# Re-engage the user with a contextual prompt
message = {
"role": "system",
"content": "The user has been quiet. Politely and briefly ask if they're still there.",
}
await aggregator.push_frame(LLMMessagesAppendFrame([message], run_llm=True))
```
**Parameters:**
| Parameter | Type | Description |
| ------------ | ------------------- | ---------------------------- |
| `aggregator` | `LLMUserAggregator` | The user aggregator instance |
The idle timer starts when the bot stops speaking and is cancelled when the
user or bot starts speaking. It is suppressed during function calls and active
user turns. This event can fire multiple times if the user remains idle.
### on\_user\_mute\_started
Fired when user input is muted. See [User Input Muting](/pipecat/fundamentals/user-input-muting) for details on muting.
```python theme={null}
@user_aggregator.event_handler("on_user_mute_started")
async def on_user_mute_started(aggregator):
print("User input muted")
```
**Parameters:**
| Parameter | Type | Description |
| ------------ | ------------------- | ---------------------------- |
| `aggregator` | `LLMUserAggregator` | The user aggregator instance |
### on\_user\_mute\_stopped
Fired when user input is unmuted.
```python theme={null}
@user_aggregator.event_handler("on_user_mute_stopped")
async def on_user_mute_stopped(aggregator):
print("User input unmuted")
```
**Parameters:**
| Parameter | Type | Description |
| ------------ | ------------------- | ---------------------------- |
| `aggregator` | `LLMUserAggregator` | The user aggregator instance |
## Assistant Turn Events
Assistant turn events are registered on the `assistant_aggregator` from an `LLMContextAggregatorPair`.
### on\_assistant\_turn\_started
Fired when the assistant begins generating a response.
```python theme={null}
@assistant_aggregator.event_handler("on_assistant_turn_started")
async def on_assistant_turn_started(aggregator):
print("Assistant started responding")
```
**Parameters:**
| Parameter | Type | Description |
| ------------ | ------------------------ | --------------------------------- |
| `aggregator` | `LLMAssistantAggregator` | The assistant aggregator instance |
### on\_assistant\_turn\_stopped
Fired when the assistant finishes responding or is interrupted. This event includes the complete assistant transcript for the turn.
```python theme={null}
@assistant_aggregator.event_handler("on_assistant_turn_stopped")
async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
print(f"Assistant said: {message.content}")
print(f"Turn started at: {message.timestamp}")
```
**Parameters:**
| Parameter | Type | Description |
| ------------ | ----------------------------- | ------------------------------------------------ |
| `aggregator` | `LLMAssistantAggregator` | The assistant aggregator instance |
| `message` | `AssistantTurnStoppedMessage` | Contains the assistant's transcript and metadata |
This event fires when the LLM response completes, when the user interrupts, or
when a user image is appended to context.
### on\_assistant\_thought
Fired when the assistant produces a thought from a reasoning model (e.g., extended thinking in Claude). This event provides visibility into the model's reasoning process.
```python theme={null}
@assistant_aggregator.event_handler("on_assistant_thought")
async def on_assistant_thought(aggregator, message: AssistantThoughtMessage):
print(f"Assistant thought: {message.content}")
```
**Parameters:**
| Parameter | Type | Description |
| ------------ | ------------------------- | ------------------------------------------ |
| `aggregator` | `LLMAssistantAggregator` | The assistant aggregator instance |
| `message` | `AssistantThoughtMessage` | Contains the thought content and timestamp |
### on\_summary\_applied
Fired when context summarization is successfully applied. This event is useful for monitoring context management and logging compression metrics.
```python theme={null}
from pipecat.processors.aggregators.llm_context_summarizer import SummaryAppliedEvent
@assistant_aggregator.event_handler("on_summary_applied")
async def on_summary_applied(aggregator, summarizer, event: SummaryAppliedEvent):
print(
f"Context summarized: {event.original_message_count} -> "
f"{event.new_message_count} messages "
f"({event.summarized_message_count} summarized, "
f"{event.preserved_message_count} preserved)"
)
```
**Parameters:**
| Parameter | Type | Description |
| ------------ | ------------------------ | ------------------------------------------- |
| `aggregator` | `LLMAssistantAggregator` | The assistant aggregator instance |
| `summarizer` | `LLMContextSummarizer` | The summarizer that applied the summary |
| `event` | `SummaryAppliedEvent` | Contains summarization metrics and metadata |
This event requires `enable_auto_context_summarization=True` in the
`LLMAssistantAggregatorParams` or manual triggering via
`LLMSummarizeContextFrame`. See [Context
Summarization](/api-reference/server/utilities/context-summarization) for details.
## Message Types
### UserTurnStoppedMessage
Contains the user's complete transcript when their turn ends.
```python theme={null}
from pipecat.processors.aggregators.llm_response_universal import UserTurnStoppedMessage
```
The complete transcribed text from the user's turn.
ISO 8601 timestamp indicating when the user turn started.
Optional identifier for the user, if available from the transport.
### AssistantTurnStoppedMessage
Contains the assistant's complete transcript when their turn ends.
```python theme={null}
from pipecat.processors.aggregators.llm_response_universal import AssistantTurnStoppedMessage
```
The complete text content from the assistant's turn.
ISO 8601 timestamp indicating when the assistant turn started.
### AssistantThoughtMessage
Contains an assistant's thought content from reasoning models.
```python theme={null}
from pipecat.processors.aggregators.llm_response_universal import AssistantThoughtMessage
```
The thought/reasoning text from the assistant.
ISO 8601 timestamp indicating when the thought started.
## Related
* [Transcriptions](/api-reference/server/utilities/turn-management/transcriptions) - Collect conversation transcripts using turn events
* [User Turn Strategies](/api-reference/server/utilities/turn-management/user-turn-strategies) - Configure turn detection behavior
* [Turn Tracking Observer](/api-reference/server/utilities/observers/turn-tracking-observer) - Track complete turn cycles with timing
# User Mute Strategies
Source: https://docs.pipecat.ai/api-reference/server/utilities/turn-management/user-mute-strategies
Control when user input is suppressed during bot operations
## Overview
User mute strategies control whether incoming user input should be suppressed based on the current system state. They determine when user audio and transcriptions should be muted to prevent interruptions during critical bot operations like initial responses or function calls.
By default, user input is never muted. You can configure mute strategies to automatically suppress user input in specific scenarios, such as while the bot is speaking or during function execution. Custom strategies can also be implemented for specific use cases.
## Configuration
User mute strategies are configured via `LLMUserAggregatorParams` when creating an `LLMContextAggregatorPair`:
```python theme={null}
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.turns.user_mute import (
MuteUntilFirstBotCompleteUserMuteStrategy,
FunctionCallUserMuteStrategy,
)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_mute_strategies=[
MuteUntilFirstBotCompleteUserMuteStrategy(),
FunctionCallUserMuteStrategy(),
],
),
)
```
## Available Strategies
### AlwaysUserMuteStrategy
Mutes user input whenever the bot is speaking. This prevents any interruptions during bot speech.
```python theme={null}
from pipecat.turns.user_mute import AlwaysUserMuteStrategy
strategy = AlwaysUserMuteStrategy()
```
**Behavior:**
* Mutes when `BotStartedSpeakingFrame` is received
* Unmutes when `BotStoppedSpeakingFrame` is received
### FirstSpeechUserMuteStrategy
Mutes user input only during the bot's first speech. After the initial response completes, user input is allowed even while the bot is speaking.
```python theme={null}
from pipecat.turns.user_mute import FirstSpeechUserMuteStrategy
strategy = FirstSpeechUserMuteStrategy()
```
**Behavior:**
* Allows user input before bot speaks
* Mutes during the first bot speech only
* Unmutes permanently after first speech completes
Use this strategy when you want to ensure the bot's greeting or initial
response isn't interrupted, but allow normal interruptions afterward.
### MuteUntilFirstBotCompleteUserMuteStrategy
Mutes user input from the start of the interaction until the bot completes its first speech. This ensures the bot maintains full control at the beginning of a conversation.
```python theme={null}
from pipecat.turns.user_mute import MuteUntilFirstBotCompleteUserMuteStrategy
strategy = MuteUntilFirstBotCompleteUserMuteStrategy()
```
**Behavior:**
* Mutes immediately when the pipeline starts (before bot speaks)
* Remains muted until first `BotStoppedSpeakingFrame` is received
* Unmutes permanently after first speech completes
Unlike `FirstSpeechUserMuteStrategy`, this strategy mutes user input even
before the bot starts speaking. Use this when you don't want to process any
user input until the bot has delivered its initial message.
### FunctionCallUserMuteStrategy
Mutes user input while function calls are executing. This prevents user interruptions during potentially long-running tool operations.
```python theme={null}
from pipecat.turns.user_mute import FunctionCallUserMuteStrategy
strategy = FunctionCallUserMuteStrategy()
```
**Behavior:**
* Mutes when `FunctionCallsStartedFrame` is received
* Tracks multiple concurrent function calls
* Unmutes when all function calls complete (via `FunctionCallResultFrame` or `FunctionCallCancelFrame`)
This strategy is particularly useful when function calls trigger external API
requests or database operations that may take several seconds to complete and
you don't want to the user to interrupt the output.
## Combining Multiple Strategies
Multiple strategies can be combined in a list. The strategies are combined with OR logic—if **any** strategy indicates the user should be muted, user input is suppressed.
```python theme={null}
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_mute_strategies=[
MuteUntilFirstBotCompleteUserMuteStrategy(), # Mute until first response
FunctionCallUserMuteStrategy(), # Mute during function calls
],
),
)
```
In this example, user input is muted:
* From pipeline start until the bot completes its first speech
* Whenever function calls are executing (even after first speech)
## Usage Examples
### Prevent Interruptions During Greeting
Ensure the bot's greeting plays completely before accepting user input:
```python theme={null}
from pipecat.turns.user_mute import MuteUntilFirstBotCompleteUserMuteStrategy
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_mute_strategies=[
MuteUntilFirstBotCompleteUserMuteStrategy(),
],
),
)
```
### Mute During Function Calls Only
Allow normal interruptions but prevent them during tool execution:
```python theme={null}
from pipecat.turns.user_mute import FunctionCallUserMuteStrategy
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_mute_strategies=[
FunctionCallUserMuteStrategy(),
],
),
)
```
### Never Allow Interruptions
Always mute user input while the bot is speaking:
```python theme={null}
from pipecat.turns.user_mute import AlwaysUserMuteStrategy
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_mute_strategies=[
AlwaysUserMuteStrategy(),
],
),
)
```
## Event Handlers
You can register event handlers to be notified when user muting starts or stops. This is useful for observability or providing feedback to users.
### Available Events
#### on\_user\_mute\_started
Called when user input becomes muted due to any active mute strategy.
```python theme={null}
@user_aggregator.event_handler("on_user_mute_started")
async def on_user_mute_started(aggregator):
logger.info("User mute started")
```
#### on\_user\_mute\_stopped
Called when user input is unmuted (no active mute strategies).
```python theme={null}
@user_aggregator.event_handler("on_user_mute_stopped")
async def on_user_mute_stopped(aggregator):
logger.info("User mute stopped")
```
These events fire whenever the mute state changes, regardless of which
strategy triggered the change. Use them to provide consistent feedback across
all mute scenarios.
## Related
* [User Turn Strategies](/api-reference/server/utilities/turn-management/user-turn-strategies) - Configure turn detection behavior
* [User Input Muting Guide](/pipecat/fundamentals/user-input-muting) - Guide for controlling user input
# User Turn Strategies
Source: https://docs.pipecat.ai/api-reference/server/utilities/turn-management/user-turn-strategies
Configure how user turns are detected and managed in conversations
## Overview
User turn strategies provide fine-grained control over how user speaking turns are detected in conversations. They determine when a user's turn starts (user begins speaking) and when it stops (user finishes speaking and expects a response).
By default, Pipecat uses a combination of VAD (Voice Activity Detection) and AI-powered turn detection:
* **Start**: VAD detection or transcription received
* **Stop**: AI-powered turn detection using `LocalSmartTurnAnalyzerV3`
You can customize this behavior by providing your own strategies for more sophisticated turn detection, such as requiring a minimum number of words before triggering a turn, or using AI-powered turn detection models.
## How It Works
1. **Turn Start Detection**: When any start strategy triggers, the user aggregator:
* Marks the start of a user turn
* Optionally emits `UserStartedSpeakingFrame`
* Optionally emits an interruption frame (if the bot is speaking)
2. **During User Turn**: The aggregator collects transcriptions and audio frames.
3. **Turn Stop Detection**: When a stop strategy triggers, the user aggregator:
* Marks the end of the user turn
* Emits `UserStoppedSpeakingFrame`
* Pushes the aggregated user message to the LLM context
4. **Timeout Handling**: If no stop strategy triggers within `user_turn_stop_timeout` seconds (default: 5.0), the turn is automatically ended.
## Configuration
User turn strategies are configured via `LLMUserAggregatorParams` when creating an `LLMContextAggregatorPair`:
```python theme={null}
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.turns.user_turn_strategies import UserTurnStrategies
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
start=[...], # List of start strategies
stop=[...], # List of stop strategies
),
),
)
```
## Start Strategies
Start strategies determine when a user's turn begins. Multiple strategies can be provided, and the first one to trigger will signal the start of a user turn.
### Base Parameters
All start strategies inherit these parameters:
If True, the user aggregator will emit an interruption frame when the user
turn starts, allowing the user to interrupt the bot.
If True, the user aggregator will emit frames indicating when the user starts
speaking. Disable this if another component (e.g., an STT service) already
generates these frames.
### VADUserTurnStartStrategy
Triggers a user turn start based on Voice Activity Detection. This is the most responsive strategy, detecting speech as soon as the VAD indicates the user has started speaking.
```python theme={null}
from pipecat.turns.user_start import VADUserTurnStartStrategy
strategy = VADUserTurnStartStrategy()
```
### TranscriptionUserTurnStartStrategy
Triggers a user turn start when a transcription is received. This serves as a fallback for scenarios where VAD-based detection fails (e.g., when the user speaks very softly) but the STT service still produces transcriptions.
Whether to trigger on interim (partial) transcription frames for earlier
detection.
```python theme={null}
from pipecat.turns.user_start import TranscriptionUserTurnStartStrategy
strategy = TranscriptionUserTurnStartStrategy(use_interim=True)
```
### MinWordsUserTurnStartStrategy
Requires the user to speak a minimum number of words before triggering a turn start. This is useful for preventing brief utterances like "okay" or "yeah" from triggering responses.
Minimum number of spoken words required to trigger the start of a user turn.
Whether to consider interim transcription frames for earlier detection.
```python theme={null}
from pipecat.turns.user_start import MinWordsUserTurnStartStrategy
# Require at least 3 words to start a turn
strategy = MinWordsUserTurnStartStrategy(min_words=3)
```
When the bot is not speaking, this strategy will trigger after just 1 word.
The `min_words` threshold only applies when the bot is actively speaking,
preventing short affirmations from interrupting the bot.
### WakePhraseUserTurnStartStrategy
Requires a wake phrase to be detected before allowing interaction. This strategy blocks subsequent strategies until a wake phrase is detected in a transcription, then allows interaction for a configurable timeout period.
List of wake phrases to detect (e.g., `["hey pipecat", "ok pipecat"]`).
Inactivity timeout in seconds before returning to IDLE state. In timeout mode,
the timer resets on activity. In single activation mode, acts as a keepalive
window after wake phrase detection.
If True, the wake phrase is required before every turn. The strategy returns
to IDLE after each turn completes.
```python theme={null}
from pipecat.turns.user_start import WakePhraseUserTurnStartStrategy
# Timeout mode: wake phrase unlocks interaction for 10 seconds
strategy = WakePhraseUserTurnStartStrategy(
phrases=["hey pipecat", "ok pipecat"],
timeout=10.0,
)
# Single activation: wake phrase required before every turn
strategy = WakePhraseUserTurnStartStrategy(
phrases=["hey pipecat"],
single_activation=True,
)
```
**Event Handlers**
The strategy provides event handlers for wake phrase detection:
| Event | Signature | Description |
| ------------------------- | ------------------------------------------ | -------------------------------------------------------------- |
| `on_wake_phrase_detected` | `async def handler(strategy, phrase: str)` | Called when a wake phrase is matched |
| `on_wake_phrase_timeout` | `async def handler(strategy)` | Called when the inactivity timeout expires (timeout mode only) |
```python theme={null}
@strategy.event_handler("on_wake_phrase_detected")
async def on_wake_phrase_detected(strategy, phrase):
print(f"Wake phrase detected: {phrase}")
@strategy.event_handler("on_wake_phrase_timeout")
async def on_wake_phrase_timeout(strategy):
print("Wake phrase timeout, returning to IDLE")
```
This strategy should be placed **first** in the start strategies list to
properly gate all subsequent strategies. Use
`default_user_turn_start_strategies()` to extend the defaults with wake
phrase detection.
### ExternalUserTurnStartStrategy
Delegates turn start detection to an external processor. This strategy listens for `UserStartedSpeakingFrame` frames emitted by other components in the pipeline (such as speech-to-speech services).
```python theme={null}
from pipecat.turns.user_start import ExternalUserTurnStartStrategy
strategy = ExternalUserTurnStartStrategy()
```
This strategy automatically sets `enable_interruptions=False` and
`enable_user_speaking_frames=False` since these are expected to be handled by
the external processor.
## Stop Strategies
Stop strategies determine when a user's turn ends and the bot should respond.
### Base Parameters
All stop strategies inherit these parameters:
If True, the aggregator will emit frames indicating when the user stops
speaking. Disable this if another component already generates these frames.
### SpeechTimeoutUserTurnStopStrategy
Signals the end of a user turn when transcription is received and VAD indicates silence. Waits for a configurable timeout after VAD detects silence before finalizing the turn, and supports finalized transcripts for earlier triggering.
How long to wait (in seconds) after VAD detects silence before finalizing the
user turn.
```python theme={null}
from pipecat.turns.user_stop import SpeechTimeoutUserTurnStopStrategy
strategy = SpeechTimeoutUserTurnStopStrategy(user_speech_timeout=0.6)
```
Built-in STT P99 latency values assume `VADParams.stop_secs=0.2` (the
recommended default). If you change `stop_secs`, the strategy will log a
warning suggesting you re-run the [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark)
with your VAD settings and pass the measured TTFS P99 latency to your STT
service constructor via `ttfs_p99_latency`. The strategy will also warn if
`stop_secs >= STT p99 latency`, which collapses the STT wait timeout to 0s
and may cause delayed turn detection.
### TurnAnalyzerUserTurnStopStrategy
Uses an AI-powered turn detection model to determine when the user has finished speaking. This provides more intelligent end-of-turn detection that can understand conversational context.
The turn detection analyzer instance to use for end-of-turn detection.
```python theme={null}
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
strategy = TurnAnalyzerUserTurnStopStrategy(
turn_analyzer=LocalSmartTurnAnalyzerV3()
)
```
See the [Smart Turn
Detection](/api-reference/server/utilities/turn-detection/smart-turn-overview) documentation for
more information on available turn analyzers.
Built-in STT P99 latency values assume `VADParams.stop_secs=0.2` (the
recommended default). If you change `stop_secs`, the strategy will log a
warning suggesting you re-run the [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark)
with your VAD settings and pass the measured TTFS P99 latency to your STT
service constructor via `ttfs_p99_latency`. The strategy will also warn if
`stop_secs >= STT p99 latency`, which collapses the STT wait timeout to 0s
and may cause delayed turn detection.
### ExternalUserTurnStopStrategy
Delegates turn stop detection to an external processor. This strategy listens for `UserStoppedSpeakingFrame` frames emitted by other components in the pipeline.
A short delay in seconds used to handle consecutive or slightly delayed
transcriptions.
```python theme={null}
from pipecat.turns.user_stop import ExternalUserTurnStopStrategy
strategy = ExternalUserTurnStopStrategy()
```
## Helper Functions
Pipecat provides helper functions to compose custom strategy lists that extend the defaults.
### default\_user\_turn\_start\_strategies()
Returns the default user turn start strategies: `[VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]`.
Useful when building a custom strategy list that extends the defaults, such as adding wake phrase detection before the standard strategies.
```python theme={null}
from pipecat.turns.user_start import WakePhraseUserTurnStartStrategy
from pipecat.turns.user_turn_strategies import default_user_turn_start_strategies
# Add wake phrase detection before the defaults
start_strategies = [
WakePhraseUserTurnStartStrategy(phrases=["hey pipecat"]),
*default_user_turn_start_strategies(),
]
```
### default\_user\_turn\_stop\_strategies()
Returns the default user turn stop strategies: `[TurnAnalyzerUserTurnStopStrategy(LocalSmartTurnAnalyzerV3)]`.
Useful when building a custom strategy list that extends or replaces the defaults.
```python theme={null}
from pipecat.turns.user_turn_strategies import default_user_turn_stop_strategies
# Use the defaults
stop_strategies = default_user_turn_stop_strategies()
```
## UserTurnStrategies
Container for configuring user turn start and stop strategies.
List of strategies used to detect when the user starts speaking. The first
strategy to trigger will signal the start of the user's turn.
List of strategies used to detect when the user stops speaking and expects a
response. Defaults to AI-powered turn detection using
`LocalSmartTurnAnalyzerV3`.
## ExternalUserTurnStrategies
A convenience class that preconfigures `UserTurnStrategies` with external strategies for both start and stop detection. Use this when an external processor (such as a speech-to-speech service) controls turn management.
```python theme={null}
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=ExternalUserTurnStrategies(),
),
)
```
## Usage Examples
### Default Behavior
The default configuration uses VAD for turn start detection and AI-powered Smart Turn for turn end detection:
```python theme={null}
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.turns.user_turn_strategies import UserTurnStrategies
# This is equivalent to the default behavior
strategies = UserTurnStrategies(
start=[VADUserTurnStartStrategy(), TranscriptionUserTurnStartStrategy()],
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())],
)
```
### Minimum Words for Interruption
Require users to speak at least 3 words before they can interrupt the bot:
```python theme={null}
from pipecat.turns.user_start import MinWordsUserTurnStartStrategy
from pipecat.turns.user_stop import SpeechTimeoutUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
start=[MinWordsUserTurnStartStrategy(min_words=3)],
stop=[SpeechTimeoutUserTurnStopStrategy()],
),
),
)
```
### Wake Phrase Detection
Require a wake phrase before allowing interaction, then use the default turn strategies:
```python theme={null}
from pipecat.turns.user_start import WakePhraseUserTurnStartStrategy
from pipecat.turns.user_turn_strategies import (
UserTurnStrategies,
default_user_turn_start_strategies,
)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
start=[
WakePhraseUserTurnStartStrategy(phrases=["hey pipecat"]),
*default_user_turn_start_strategies(),
],
),
),
)
```
### Local Smart Turn Detection
Use a local turn detection model instead of a cloud service:
```python theme={null}
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(
turn_analyzer=LocalSmartTurnAnalyzerV3()
)
]
),
),
)
```
## Related
* [User Input Muting](/pipecat/fundamentals/user-input-muting) - Control when user input is ignored
* [Smart Turn Detection](/api-reference/server/utilities/turn-detection/smart-turn-overview) - AI-powered turn detection
# UserIdleProcessor
Source: https://docs.pipecat.ai/api-reference/server/utilities/user-idle-processor
A processor that monitors user inactivity and triggers callbacks after specified timeout periods
DEPRECATED: UserIdleProcessor has been deprecated. Use `user_idle_timeout`
parameter when creating your aggregator, see [Detecting Idle
Users](/pipecat/fundamentals/detecting-user-idle) for details.
The `UserIdleProcessor` is a specialized frame processor that monitors user activity in a conversation and executes callbacks when the user becomes idle. It's particularly useful for maintaining engagement by detecting periods of user inactivity and providing escalating responses to inactivity.
## Constructor Parameters
An async function that will be called when user inactivity is detected. Can be
either:
* Basic callback: `async def(processor: UserIdleProcessor) -> None`
* Retry callback: `async def(processor: UserIdleProcessor, retry_count: int) ->
bool` where returning `False` stops idle monitoring
The number of seconds to wait before considering the user idle.
## Behavior
The processor starts monitoring for inactivity only after the first conversation activity (either `UserStartedSpeakingFrame` or `BotSpeakingFrame`). It manages idle state based on the following rules:
* Resets idle timer when user starts or stops speaking
* Pauses idle monitoring while user is speaking
* Resets idle timer when bot is speaking
* Stops monitoring on conversation end or cancellation
* Manages a retry count for the retry callback
* Stops monitoring when retry callback returns `False`
## Properties
The current number of retry attempts made to engage the user.
## Example Implementations
Here are two example showing how to use the `UserIdleProcessor`: one with the basic callback and one with the retry callback:
```python theme={null}
from pipecat.frames.frames import LLMMessagesAppendFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.processors.user_idle_processor import UserIdleProcessor
async def handle_idle(user_idle: UserIdleProcessor) -> None:
await user_idle.push_frame(
LLMMessagesAppendFrame(
[
{
"role": "system",
"content": "Ask the user if they are still there and try to prompt for some input.",
}
],
run_llm=True,
)
)
# Create the processor
user_idle = UserIdleProcessor(callback=handle_idle, timeout=5.0)
# Add to pipeline
pipeline = Pipeline(
[
transport.input(),
user_idle, # Add the processor to monitor user activity
context_aggregator.user(),
# ... rest of pipeline
]
)
```
```python theme={null}
from pipecat.frames.frames import EndFrame, LLMMessagesAppendFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.processors.user_idle_processor import UserIdleProcessor
async def handle_user_idle(user_idle: UserIdleProcessor, retry_count: int) -> bool:
if retry_count == 1:
# First attempt: Gentle reminder
await user_idle.push_frame(
LLMMessagesAppendFrame(
[
{
"role": "system",
"content": "The user has been quiet. Politely and briefly ask if they're still there.",
}
],
run_llm=True,
)
)
return True
elif retry_count == 2:
# Second attempt: Direct prompt
await user_idle.push_frame(
LLMMessagesAppendFrame(
[
{
"role": "system",
"content": "The user is still inactive. Ask if they'd like to continue our conversation.",
}
],
run_llm=True,
)
)
return True
else:
# Third attempt: End conversation
await user_idle.push_frame(
TTSSpeakFrame("It seems like you're busy right now. Have a nice day!")
)
await task.queue_frame(EndFrame())
return False # Stop monitoring
# Create the processor
user_idle = UserIdleProcessor(callback=handle_user_idle, timeout=5.0)
# Add to pipeline
pipeline = Pipeline(
[
transport.input(),
user_idle, # Add the processor to monitor user activity
context_aggregator.user(),
# ... rest of pipeline
]
)
```
## Frame Handling
The processor handles the following frame types:
* `UserStartedSpeakingFrame`: Marks user as active, resets idle timer and retry count
* `UserStoppedSpeakingFrame`: Starts idle monitoring
* `BotSpeakingFrame`: Resets idle timer
* `EndFrame` / `CancelFrame`: Stops idle monitoring
## Notes
* The idle callback won't be triggered while the user or bot is actively speaking
* The processor automatically cleans up its resources when the pipeline ends
* Basic callbacks are supported for backward compatibility
# API Reference
Source: https://docs.pipecat.ai/client/android/api-reference
# SDK Introduction
Source: https://docs.pipecat.ai/client/android/introduction
Build Android applications with Pipecat's Kotlin client library
The Pipecat Android SDK provides a Kotlin implementation for building voice and multimodal AI applications on Android. It handles:
* Real-time audio and video streaming
* Bot communication and state management
* Media device handling
* Event handling
## Installation
Add the dependency for your chosen transport to your `build.gradle` file. For example, to use the Daily transport:
```gradle theme={null}
implementation "ai.pipecat:daily-transport:1.0.3"
```
## Example
Here's a simple example using Daily as the transport layer.
```kotlin theme={null}
val callbacks = object : PipecatEventCallbacks() {
override fun onBackendError(message: String) {
Log.e(TAG, "Error from backend: $message")
}
// ...
}
val options = PipecatClientOptions(callbacks = callbacks)
val client: PipecatClientDaily = PipecatClient(DailyTransport(context), options)
// Kotlin coroutines:
client.startBotAndConnect(startBotParams).await()
// Or, callbacks:
client.startBotAndConnect(startBotParams).withCallback {
// ...
}
```
## Documentation
SDK API documentation
Daily, Gemini, OpenAI WebRTC, and SmallWebRTC transports
# Daily WebRTC Transport
Source: https://docs.pipecat.ai/client/android/transports/daily
WebRTC implementation for Android using Daily
The Daily transport implementation enables real-time audio and video communication in your Pipecat Android applications using [Daily's](https://www.daily.co/) WebRTC infrastructure.
## Installation
Add the Daily transport dependency to your `build.gradle`:
```gradle theme={null}
implementation "ai.pipecat:daily-transport:1.0.3"
```
## Usage
Create a client using the Daily transport:
```kotlin theme={null}
val callbacks = object : PipecatEventCallbacks() {
override fun onBackendError(message: String) {
Log.e(TAG, "Error from backend: $message")
}
// ...
}
val options = PipecatClientOptions(callbacks = callbacks)
val client: PipecatClientDaily = PipecatClient(DailyTransport(context), options)
// Kotlin coroutines
client.startBotAndConnect(startBotParams).await()
// Callbacks
client.startBotAndConnect(startBotParams).withCallback {
// ...
}
```
## Configuration
Your server endpoint should return Daily-specific configuration:
```json theme={null}
{
"dailyRoom": "https://your-domain.daily.co/room-name",
"dailyToken": "your-daily-token"
}
```
## Resources
Simple Chatbot Demo
Client Transports
Complete API documentation for the Daily transport implementation
# Gemini Live Websocket Transport
Source: https://docs.pipecat.ai/client/android/transports/gemini-websocket
Websocket implementation for Android using Gemini
The Gemini Live Websocket transport implementation enables real-time audio communication with the Gemini Live service, using a direct websocket connection.
Transports of this type are designed primarily for development and testing
purposes. For production applications, you will need to build a server
component with a server-friendly transport, like the
[DailyTransport](./daily), to securely handle API keys.
## Installation
Add the transport dependency to your `build.gradle`:
```gradle theme={null}
implementation "ai.pipecat:gemini-live-websocket-transport:0.3.7"
```
## Usage
Create a client:
```kotlin theme={null}
val transport = GeminiLiveWebsocketTransport.Factory(context)
val options = RTVIClientOptions(
params = RTVIClientParams(
baseUrl = null,
config = GeminiLiveWebsocketTransport.buildConfig(
apiKey = "",
generationConfig = Value.Object(
"speech_config" to Value.Object(
"voice_config" to Value.Object(
"prebuilt_voice_config" to Value.Object(
"voice_name" to Value.Str("Puck")
)
)
)
),
initialUserMessage = "How tall is the Eiffel Tower?"
)
)
)
val client = RTVIClient(transport, callbacks, options)
client.start().withCallback {
// ...
}
```
## Resources
Simple Chatbot Demo
Client Transports
Complete API documentation for the Pipecat Android client.
# OpenAI Realtime WebRTC Transport
Source: https://docs.pipecat.ai/client/android/transports/openai-webrtc
WebRTC implementation for Android using OpenAI
The OpenAI Realtime WebRTC transport implementation enables real-time audio communication with the OpenAI Realtime service, using a direct WebRTC connection.
## Installation
Add the transport dependency to your `build.gradle`:
```gradle theme={null}
implementation "ai.pipecat:openai-realtime-webrtc-transport:0.3.7"
```
## Usage
Create a client:
```kotlin theme={null}
val transport = OpenAIRealtimeWebRTCTransport.Factory(context)
val options = RTVIClientOptions(
params = RTVIClientParams(
baseUrl = null,
config = OpenAIRealtimeWebRTCTransport.buildConfig(
apiKey = apiKey,
initialMessages = listOf(
LLMContextMessage(role = "user", content = "How tall is the Eiffel Tower?")
),
initialConfig = OpenAIRealtimeSessionConfig(
voice = "ballad",
turnDetection = Value.Object("type" to Value.Str("semantic_vad")),
inputAudioNoiseReduction = Value.Object("type" to Value.Str("near_field")),
inputAudioTranscription = Value.Object("model" to Value.Str("gpt-4o-transcribe"))
)
)
)
)
val client = RTVIClient(transport, callbacks, options)
client.start().withCallback {
// ...
}
```
## Resources
Simple Chatbot Demo
Client Transports
Complete API documentation for the Pipecat Android client.
# Small WebRTC Transport
Source: https://docs.pipecat.ai/client/android/transports/small-webrtc
WebRTC implementation for Android
The Small WebRTC transport implementation enables real-time audio communication with the Small WebRTC Pipecat transport, using a direct WebRTC connection.
## Installation
Add the transport dependency to your `build.gradle`:
```gradle theme={null}
implementation "ai.pipecat:small-webrtc-transport:0.3.7"
```
## Usage
Create a client:
```kotlin theme={null}
val transport = SmallWebRTCTransport.Factory(context, baseUrl)
val options = RTVIClientOptions(
params = RTVIClientParams(baseUrl = null),
enableMic = true,
enableCam = true
)
val client = RTVIClient(transport, callbacks, options)
client.start().withCallback {
// ...
}
```
## Resources
Demo App
Client Transports
Complete API documentation for the Pipecat Android client.
# SDK Introduction
Source: https://docs.pipecat.ai/client/c++/introduction
Build native applications with Pipecat’s C++ client library
The Pipecat C++ SDK provides a native implementation for building voice and multimodal AI applications. It supports:
* Linux (`x86_64` and `aarch64`)
* macOS (`aarch64`)
* Windows (`x86_64`)
## Dependencies
### libcurl
The SDK uses [libcurl](https://curl.se/libcurl/) for HTTP requests.
```bash theme={null}
sudo apt-get install libcurl4-openssl-dev
```
On macOS `libcurl` is already included so there is nothing to install.
On Windows we use [vcpkg](https://vcpkg.io/en/) to install dependencies. You
need to set it up following one of the
[tutorials](https://learn.microsoft.com/en-us/vcpkg/get_started/get-started).
The `libcurl` dependency will be automatically downloaded when building.
## Installation
Clone the SDK:
```bash theme={null}
git clone https://github.com/pipecat-ai/pipecat-client-cxx
cd pipecat-client-cxx
```
Build the SDK using CMake:
```bash theme={null}
cmake . -G Ninja -Bbuild -DCMAKE_BUILD_TYPE=Release
ninja -C build
```
```bash theme={null}
# Initialize Visual Studio environment
"C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Auxiliary\Build\vcvarsall.bat" amd64
# Configure and build
cmake . -Bbuild --preset vcpkg
cmake --buildbuild --config Release
```
### Cross-compilation
For Linux aarch64:
```bash theme={null}
cmake . -G Ninja -Bbuild -DCMAKE_TOOLCHAIN_FILE=aarch64-linux-toolchain.cmake -DCMAKE_BUILD_TYPE=Release
ninja -C build
```
## Documentation
Complete SDK API documentation
WebRTC implementation using Daily
# Daily WebRTC Transport
Source: https://docs.pipecat.ai/client/c++/transport
WebRTC implementation for C++ using Daily
The Daily transport implementation enables real-time audio and video communication in your Pipecat C++ applications using [Daily's](https://www.daily.co/) WebRTC infrastructure.
## Dependencies
### Daily Core C++ SDK
Download the [Daily Core C++ SDK](https://github.com/daily-co/daily-core-sdk) from the [available releases](https://github.com/daily-co/daily-core-sdk/releases) for your platform and set:
```bash theme={null}
export DAILY_CORE_PATH=/path/to/daily-core-sdk
```
### Pipecat C++ SDK
Build the base [Pipecat C++ SDK](https://github.com/pipecat-ai/pipecat-client-cxx-daily) first and set:
```bash theme={null}
export PIPECAT_SDK_PATH=/path/to/pipecat-client-cxx
```
## Building
First, set a few environment variables:
```bash theme={null}
PIPECAT_SDK_PATH=/path/to/pipecat-client-cxx
DAILY_CORE_PATH=/path/to/daily-core-sdk
```
Then, build the project:
```bash theme={null}
cmake . -G Ninja -Bbuild -DCMAKE_BUILD_TYPE=Release
ninja -C build
```
```bash theme={null}
# Initialize Visual Studio environment
"C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Auxiliary\Build\vcvarsall.bat" amd64
# Configure and build
cmake . -Bbuild --preset vcpkg
cmake --build build --config Release
```
## Examples
Simple C++ implementation example
C++ client with PortAudio support
Example Node.js proxy implementation
# Client SDKs
Source: https://docs.pipecat.ai/client/introduction
Client libraries for building real-time AI applications with Pipecat
All Client SDKs have transitioned to v1.0, which uses a new, simpler API
design. For guidance in transitioning to the new API, please refer to the
migration guide for each platform. If you have any questions or need
assistance, please reach out to us on [Discord](https://discord.gg/pipecat).
Pipecat provides client SDKs for multiple platforms, all implementing the RTVI (Real-Time Voice and Video Inference) standard. These SDKs make it easy to build real-time AI applications that can handle voice, video, and text interactions.
Pipecat JS SDK
Pipecat React SDK
Pipecat React Native SDK
Pipecat iOS SDK
Pipecat Android SDK
Pipecat C++ SDK
## Core Functionality
All Pipecat client SDKs provide:
Handle device inputs and media streams for audio and video
Configure and communicate with your Pipecat bot
Manage connection state and error handling
## Core Types
### PipecatClient
The main class for interacting with Pipecat bots. It is the primary type you will interact with.
### Transport
The `PipecatClient` wraps a Transport, which defines and provides the underlying connection mechanism (e.g., WebSocket, WebRTC). Your Pipecat pipeline will contain a corresponding transport.
### RTVIMessage
Represents a message sent to or received from a Pipecat bot.
## Simple Usage Examples
Establish ongoing connections via WebSocket or WebRTC for:
* Live voice conversations
* Real-time video processing
* Continuous interactions
```javascript javascript theme={null}
// Example: Establishing a real-time connection
import { RTVIEvent, RTVIMessage, PipecatClient } from "@pipecat-ai/client-js";
import { DailyTransport } from "@pipecat-ai/daily-transport";
const pcClient = new PipecatClient({
transport: new DailyTransport(),
enableMic: true,
enableCam: false,
enableScreenShare: false,
callbacks: {
onBotConnected: () => {
console.log("[CALLBACK] Bot connected");
},
onBotDisconnected: () => {
console.log("[CALLBACK] Bot disconnected");
},
onBotReady: () => {
console.log("[CALLBACK] Bot ready to chat!");
},
},
});
try {
// Below, we use a REST endpoint to fetch connection credentials for our
// Daily Transport. Alternatively, you could provide those credentials
// directly to `connect()`.
await pcClient.startBotAndConnect({
endpoint: "https://your-connect-end-point-here/connect",
});
} catch (e) {
console.error(e.message);
}
// Events (alternative approach to constructor-provided callbacks)
pcClient.on(RTVIEvent.Connected, () => {
console.log("[EVENT] User connected");
});
pcClient.on(RTVIEvent.Disconnected, () => {
console.log("[EVENT] User disconnected");
});
```
```jsx react theme={null}
// Example: Using PipecatClient in a React component
import { PipecatClient } from "@pipecat-ai/client-js";
import {
PipecatClientProvider,
PipecatClientAudio,
usePipecatClient,
useRTVIClientEvent,
} from "@pipecat-ai/client-react";
import { DailyTransport } from "@pipecat-ai/daily-transport";
// Create the client instance
const client = new PipecatClient({
transport: new DailyTransport(),
enableMic: true,
});
// Root component wraps the app with the provider
function App() {
return (
);
}
// Component using the client
function VoiceBot() {
const client = usePipecatClient();
const handleClick = async () => {
await client.startBotAndConnect({
endpoint: `${process.env.PIPECAT_API_URL || "/api"}/connect`
});
};
return (
;
);
}
function EventListener() {
useRTVIClientEvent(
RTVIEvent.Connected,
useCallback(() => {
console.log("[EVENT] User connected");
}, [])
);
}
```
Send custom messages and handle responses from your bot. This is useful for:
* Running server-side functionality
* Triggering specific bot actions
* Querying the server
* Responding to server requests
```javascript javascript theme={null}
import { PipecatClient } from "@pipecat-ai/client-js";
const pcClient = new PipecatClient({
transport: new DailyTransport(),
callbacks: {
onBotConnected: () => {
pcClient
.sendClientRequest("get-language")
.then((response) => {
console.log("[CALLBACK] Bot using language:", response);
if (response !== preferredLanguage) {
pcClient.sendClientMessage("set-language", {
language: preferredLanguage,
});
}
})
.catch((error) => {
console.error("[CALLBACK] Error getting language:", error);
});
},
onServerMessage: (message) => {
console.log("[CALLBACK] Received message from server:", message);
},
},
});
// Here we have obtained the connection details separately and pass them
// directly to connect().
// Alternatively, you can use a connection endpoint to fetch these details
// using `startBotAndConnect()`.
await pcClient.connect({
url: "https://your-daily-room-url",
token: "your-daily-token",
});
```
```jsx react theme={null}
// Example: Messaging in a React application
import { useCallback } from "react";
import { RTVIEvent, TransportState } from "@pipecat-ai/client-js";
import { usePipecatClient, useRTVIClientEvent } from "@pipecat-ai/client-react";
function EventListener() {
const pcClient = usePipecatClient();
useRTVIClientEvent(
RTVIEvent.BotConnected,
useCallback(() => {
pcClient
.sendClientRequest("get-language")
.then((response) => {
console.log("[CALLBACK] Bot using language:", response);
if (response !== preferredLanguage) {
pcClient.sendClientMessage("set-language", {
language: preferredLanguage,
});
}
})
.catch((error) => {
console.error("[CALLBACK] Error getting language:", error);
});
}, []),
);
useRTVIClientEvent(
RTVIEvent.ServerMessage,
useCallback((data) => {
console.log("[CALLBACK] Received message from server:", data);
}, []),
);
}
```
## About RTVI
Pipecat's client SDKs implement the RTVI (Real-Time Voice and Video Inference) standard, an open specification for real-time AI inference. This means:
* Your code can work with any RTVI-compatible inference service
* You get battle-tested tooling for real-time multimedia handling
* You can easily set up development and testing environments
## Next Steps
Get started by trying out examples:
Complete client-server example with both bot backend (Python) and frontend
implementation (JS, React, React Native, iOS, and Android).
Explore our full collection of example applications and implementations
across different platforms and use cases.
# API Reference
Source: https://docs.pipecat.ai/client/ios/api-reference
# SDK Introduction
Source: https://docs.pipecat.ai/client/ios/introduction
Build iOS applications with Pipecat’s Swift client library
The Pipecat iOS SDK provides a Swift implementation for building voice and multimodal AI applications on iOS. It handles:
* Real-time audio streaming
* Bot communication and state management
* Media device handling
* Configuration management
* Event handling
## Installation
Add the SDK to your project using Swift Package Manager:
```swift theme={null}
// Core SDK
.package(url: "https://github.com/pipecat-ai/pipecat-client-ios.git", from: "1.0.0"),
// Daily transport implementation
.package(url: "https://github.com/pipecat-ai/pipecat-client-ios-daily.git", from: "1.0.0"),
```
Then add the dependencies to your target:
```swift theme={null}
.target(name: "YourApp", dependencies: [
.product(name: "PipecatClientIOS", package: "pipecat-client-ios")
.product(name: "PipecatClientIOSDaily", package: "pipecat-client-ios-daily")
]),
```
## Example
Here's a simple example using Daily as the transport layer:
```swift theme={null}
import PipecatClientIOS
import PipecatClientIOSDaily
let pipecatClientOptions = PipecatClientOptions.init(
transport: DailyTransport.init(),
enableMic: currentSettings.enableMic,
enableCam: false,
)
self.pipecatClientIOS = PipecatClient.init(
options: pipecatClientOptions
)
let startBotParams = APIRequest.init(endpoint: URL(string: $PIPECAT_API_URL + "/connect")!)
self.pipecatClientIOS?.startBotAndConnect(startBotParams: startBotParams) { (result: Result) in
switch result {
case .failure(let error):
// handle error
case .success(_):
// handle success
}
}
```
## Documentation
SDK API documentation
Daily, Gemini, OpenAI WebRTC, and SmallWebRTC transports
Pipecat Client iOS on GitHub
Simple Chatbot Demo
# Daily WebRTC Transport
Source: https://docs.pipecat.ai/client/ios/transports/daily
WebRTC implementation for iOS using Daily
The Daily transport implementation enables real-time audio and video communication in your Pipecat iOS applications using [Daily's](https://www.daily.co/) WebRTC infrastructure.
## Installation
Add the Daily transport package to your project:
```swift theme={null}
.package(url: "https://github.com/pipecat-ai/pipecat-client-ios-daily.git", from: "1.0.0")
// Add to your target dependencies
.target(name: "YourApp", dependencies: [
.product(name: "PipecatClientIOSDaily", package: "pipecat-client-ios-daily")
])
```
## Usage
Create a client using the Daily transport:
```swift theme={null}
import PipecatClientIOS
import PipecatClientIOSDaily
let pipecatClientOptions = PipecatClientOptions.init(
transport: DailyTransport.init(),
enableMic: currentSettings.enableMic,
enableCam: false,
)
self.pipecatClientIOS = PipecatClient.init(
options: pipecatClientOptions
)
let startBotParams = APIRequest.init(endpoint: URL(string: $PIPECAT_API_URL + "/connect")!)
self.pipecatClientIOS?.startBotAndConnect(startBotParams: startBotParams) { (result: Result) in
switch result {
case .failure(let error):
// handle error
case .success(_):
// handle success
}
}
```
## Configuration
Your server endpoint should return Daily-specific configuration:
```swift theme={null}
// Example server response
{
"url": "https://your-domain.daily.co/room-name",
"token": "your-daily-token"
}
```
## API Reference
Simple Chatbot Demo
Daily Transport
Complete API documentation for the Daily transport implementation
# Gemini Live Websocket Transport
Source: https://docs.pipecat.ai/client/ios/transports/gemini-websocket
Websocket implementation for iOS using Gemini
The Gemini Live Websocket transport implementation enables real-time audio communication with the Gemini Live service, using a direct websocket connection.
Transports of this type are designed primarily for development and testing
purposes. For production applications, you will need to build a server
component with a server-friendly transport, like the
[DailyTransport](./daily), to securely handle API keys.
## Installation
Add the Gemini transport package to your project:
```swift theme={null}
.package(url: "https://github.com/pipecat-ai/pipecat-client-ios-gemini-live-websocket.git", from: "0.3.1"),
// Add to your target dependencies
.target(name: "YourApp", dependencies: [
.product(name: "PipecatClientIOSGeminiLiveWebSocket", package: "pipecat-client-ios-gemini-live-websocket")
],
```
## Usage
Create a client:
```swift theme={null}
let options: RTVIClientOptions = .init(
params: .init(config: [
.init(
service: "llm",
options: [
.init(name: "api_key", value: .string("")),
.init(name: "initial_messages", value: .array([
.object([
"role": .string("user"), // "user" | "system"
"content": .string("I need your help planning my next vacation.")
])
])),
.init(name: "generation_config", value: .object([
"speech_config": .object([
"voice_config": .object([
"prebuilt_voice_config": .object([
"voice_name": .string("Puck") // "Puck" | "Charon" | "Kore" | "Fenrir" | "Aoede"
])
])
])
]))
]
)
])
)
let client = GeminiLiveWebSocketVoiceClient(options: options)
try await client.start()
```
## API Reference
Simple Chatbot Gemini Demo
iOS Gemini Live WebSocket
Complete API documentation for the Gemini transport implementation
# OpenAIRealTimeWebRTCTransport
Source: https://docs.pipecat.ai/client/ios/transports/openai-webrtc
## Overview
The OpenAI Realtime WebRTC transport implementation enables real-time audio communication directly with the [OpenAI Realtime API using WebRTC](https://platform.openai.com/docs/guides/realtime-webrtc) voice-to-voice service.
It handles media device management, audio/video streams, and state management for the connection.
## Installation
Add the OpenAI transport package to your project:
```swift theme={null}
.package(url: "https://github.com/pipecat-ai/pipecat-client-ios-openai-realtime.git", from: "0.0.1"),
// Add to your target dependencies
.target(name: "YourApp", dependencies: [
.product(name: "PipecatClientIOSOpenAIRealtimeWebrtc", package: "pipecat-client-ios-openai-realtime")
],
```
## Usage
Create a client:
```swift theme={null}
let rtviClientOptions = RTVIClientOptions.init(
enableMic: currentSettings.enableMic,
enableCam: false,
params: .init(config: [
.init(
service: "llm",
options: [
.init(name: "api_key", value: .string(openaiAPIKey)),
.init(name: "initial_messages", value: .array([
.object([
"role": .string("user"), // "user" | "system"
"content": .string("Start by introducing yourself.")
])
])),
.init(name: "session_config", value: .object([
"instructions": .string("You are Chatbot, a friendly and helpful assistant who provides useful information, including weather updates."),
"voice": .string("echo"),
"input_audio_noise_reduction": .object([
"type": .string("near_field")
]),
"turn_detection": .object([
"type": .string("semantic_vad")
])
])),
]
)
])
)
self.rtviClientIOS = RTVIClient.init(
transport: OpenAIRealtimeTransport.init(options: rtviClientOptions),
options: rtviClientOptions
)
try await rtviClientIOS.start()
```
Currently, invalid session configurations will result in the OpenAI connection
being failed.
## API Reference
Simple Chatbot OpenAI Demo
iOS OpenAI Realtime WebRTC
Complete API documentation for the OpenAI transport implementation
# SmallWebRTCTransport
Source: https://docs.pipecat.ai/client/ios/transports/small-webrtc
A lightweight WebRTC transport for peer-to-peer connections with Pipecat for iOS
`SmallWebRTCTransport` enables peer-to-peer WebRTC connections between clients and your Pipecat application. It implements bidirectional audio and video streaming using WebRTC for real-time communication.
This transport is intended for lightweight implementations. It expects your Pipecat server to include the corresponding [`SmallWebRTCTransport` server-side](/api-reference/server/services/transport/small-webrtc) implementation.
## Installation
Add the `SmallWebRTCTransport` package to your project:
```swift theme={null}
.package(url: "https://github.com/pipecat-ai/pipecat-client-ios-small-webrtc.git", from: "0.0.1")
// Add to your target dependencies
.target(name: "YourApp", dependencies: [
.product(name: "PipecatClientIOSSmallWebrtc", package: "pipecat-client-ios-small-webrtc")
])
```
## Usage
Create a client using the `SmallWebRTCTransport`:
```swift theme={null}
import PipecatClientIOS
import PipecatClientIOSSmallWebrtc
let rtviClientOptions = RTVIClientOptions.init(
enableMic: currentSettings.enableMic,
enableCam: currentSettings.enableCam,
params: RTVIClientParams(
config: [
.init(
service: SmallWebRTCTransport.SERVICE_NAME,
options: [
.init(name: "server_url", value: .string($PIPECAT_SERVER_URL))
]
)
]
)
)
self.rtviClientIOS = RTVIClient.init(
transport: SmallWebRTCTransport.init(options: rtviClientOptions),
options: rtviClientOptions
)
self.rtviClientIOS?.start() { result in
switch result {
case .failure(let error):
// handle error
case .success(_):
// handle success
}
}
```
## API Reference
Video transform Demo
Small WebRTC transport
Complete API documentation for the Small WebRTC transport implementation
# Callbacks and events
Source: https://docs.pipecat.ai/client/js/api-reference/callbacks
The Pipecat JavaScript client listens for messages and events from the bot via the transport layer. This allows you to respond to changes in state, errors, and other events. The client implements the RTVI standard for these communications.
## Event Handling Options
You can handle events in two ways:
### 1. Callbacks
Define handlers in the client constructor:
```typescript theme={null}
const pcClient = new PipecatClient({
callbacks: {
onBotReady: () => console.log("Bot ready via callback"),
// ... other callbacks
},
});
```
### 2. Event Listeners
Add handlers using the event emitter pattern:
```typescript theme={null}
pcClient.on(RTVIEvent.BotReady, () => console.log("Bot ready via event"));
```
Events and callbacks provide the same functionality. Choose the pattern that
best fits your application's architecture.
## Callbacks
### State and connectivity
Local user successfully established a connection to the transport.
Local user disconnected from the transport, either intentionally by calling
`pcClient.disconnect()` or due to an error.
Provides a `TransportState` string representing the connectivity state of the
local client. See [transports](../transports/transport) for state explanation.
A call to [`startBot()`](./client-methods#startbot) (i.e. a pre-connection
REST endpoint) was successful and the bot should now be started or in the
process of starting. The callback receives any data returned from your
endpoint.
The bot has been instantiated, its pipeline is configured, and it is receiving
user media and interactions. This method is passed a `BotReadyData` object,
which contains the RTVI `version` number. Since the bot is remote and may be
using a different version of RTVI than the client, you can use the passed
`version` string to check for compatibility.
Bot connected to the transport and is configuring. Note: bot connectivity does
not infer that its pipeline is yet ready to run. Please use `onBotReady`
instead.
Bot disconnected from the transport. This may occur due to session expiry, a
pipeline error or for any reason the server deems the session over.
A non-bot participant joined the session.
A non-bot participant left the session. Note: excluded local participant.
### Messages and errors
Receives custom messages sent from the server to the client. This provides a
generic channel for server-to-client communication. The data structure is
flexible and defined by the server implementation.
Response error when an action fails or an unknown message type is sent from
the client.
Error signalled by the bot. This could be due to a malformed config update or
an unknown action dispatch or the inability to complete a client request. The
message parameter is of type `error` and matches [the RTVI
standard](/client/rtvi-standard#error-%F0%9F%A4%96). Its `data` field includes
a `message` string that describes the error and a `fatal` boolean indicating
if the error is unrecoverable and resulted in a bot disconnection. If `fatal`
is true, the client will automatically disconnect.
### Media and devices
Lists available local media microphone devices. Triggered when a new device
becomes available, a device is removed, or in response to
`pcClient.initDevices()`.
Lists available local media camera devices. Triggered when a new device
becomes available, a device is removed, or in response to
`pcClient.initDevices()`.
Lists available local speaker devices. Triggered when a new device becomes
available, a device is removed, or in response to `pcClient.initDevices()`.
User selected a new microphone as their selected/active device.
User selected a new camera as their selected/active device.
User selected a new speaker as their selected/active device.
Error related to media devices, such as camera or microphone issues. This
could be due to permissions, device unavailability, or other related problems.
See the [DeviceError](./errors#deviceerror) section for more details about the
return type.
Media track from a local or remote participant/bot was started and playable.
Can be either an audio or video track.
Media track from a local or remote participant/bot was stopped and no longer
playable.
Media track from a local or remote participant's screenshare was started and
playable. Can be either an audio or video track.
Media track from a local or remote participant's screenshare was stopped and
no longer playable.
### Audio and Voice Activity
Local audio gain level (0 to 1).
Remote audio gain level (0 to 1). Note: if more than one participant is
connected to the transport, the `participant` property details the associated
peer/bot.
The bot started speaking/sending speech audio.
The bot stopped speaking/sending speech audio.
The local user started speaking. This method is more reliable than using audio
gain and is the result of the bot's VAD (voice activity detection) model. This
provides a more accurate result in noisy environments.
The local user stopped speaking, indicated by the VAD model.
The server has started ignoring audio from the client (server-side muting).
The client should continue sending audio normally but may want to show an
indication to the user that their input is not being processed. See [User
Input Muting](/pipecat/fundamentals/user-input-muting) for more details.
The server has stopped ignoring audio from the client (server-side muting
ended). The client can update its UI to indicate that the user's input is
being processed again.
### Transcription
Transcribed local user input (both partial and final).
Callback receives a `TranscriptData` object:
The transcribed text.
Indicates if the text is final (true) or partial (false).
The timestamp of the transcription.
The ID of the user the transcription is for.
A best-effort stream of the bot's output text, including both spoken and unspoken content. This callback is triggered as the bot aggregates the LLM's response into sentences or other logical text blocks as well as word-by-word during TTS synthesis. The callback receives a `BotOutputData` object:
The aggregated output text from the bot.
Indicates if the text has been spoken by the bot.
The aggregation type used for this output (e.g., "sentence", "code").
"sentence" and "word" are reserved aggregation types defined by the RTVI
standard. Other aggregation types may be defined by custom text aggregators
used by the server. "word" aggregated outputs are sent at the time of TTS
synthesis for real-time word-level streaming and can be used in lieu of
`onBotTtsText` if desired.
DEPRECATED in favor of `onBotOutput` in Pipecat version 0.0.95 and client-js
version 1.5.0
Finalized bot output text generated by the LLM. Sentence aggregated.
### Service-specific Events
Bot LLM search response text generated by the LLM service. This is typically
used for search or retrieval tasks.
Search capabilities are currently only supported by Google Gemini. To take
advantage of this event, your pipeline must include a
[`GoogleLLMService`](/api-reference/server/services/llm/google) and your pipeline task
should include the
[`GoogleRTVIObserver`](/api-reference/server/rtvi/google-rtvi-observer) in lieu
of the typical `RTVIObserver`.
The search result text.
The rendered content of the search result.
The origins of the search result.
The URI of the site where the search result was found.
The title of the site where the search result was found.
The individual search results.
The text of the search result.
The confidence scores for the search result.
Streamed LLM token response text generated by the LLM service.
The text of the LLM response.
LLM service inference started.LLM service inference concluded.
If your TTS service supports streamed responses over sockets, the text
parameter contains the words from TTS service as they are spoken. If you are
using a HTTP based TTS service, the text parameter will contain the full text
of the TTS response.
The text of the LLM response.
TTS service started inference.TTS service inference concluded.
### Function Calling
A function call has been initiated by the LLM. The metadata included depends
on the server's [`function_call_report_level`](/api-reference/server/rtvi/rtvi-observer#configuration) configuration.
Name of the function being called. Only included if report level is `NAME` or `FULL`.
A function call is in progress. This replaces the deprecated `onLLMFunctionCall`
callback and is the event that triggers registered
[`FunctionCallHandler`s](/client/js/api-reference/client-methods#registerfunctioncallhandler)
when a `function_name` is present.
Name of the function being called. Only included if report level is `NAME` or `FULL`.
Unique identifier for this function call.
Arguments passed to the function. Only included if report level is `FULL`.
A function call has completed or been cancelled.
Name of the function that was called. Only included if report level is `NAME` or `FULL`.
Identifier matching the original function call.
Whether the function call was cancelled before completing.
The result of the function call, if available. Only included if report level is `FULL`.
DEPRECATED in favor of `onLLMFunctionCallInProgress` in Pipecat version
0.0.102 and client-js version 1.6.0
A function call request from the LLM.
### Other
Pipeline mterics data provided by Pipecat. [Learn
more](/pipecat/fundamentals/metrics).
## Events
Each callback described above has a corresponding event that can be listened for using the `.on()` method. This allows you to handle the same functionality using either callbacks or event listeners, depending on your preferred architecture.
Here's the complete reference mapping events to their corresponding callbacks:
### State and connectivity Events
| Event Name | Callback Name | Data Type |
| ----------------------- | ------------------------- | ---------------- |
| `Connected` | `onConnected` | - |
| `Disconnected` | `onDisconnected` | - |
| `TransportStateChanged` | `onTransportStateChanged` | `TransportState` |
| `BotReady` | `onBotReady` | `BotReadyData` |
| `BotConnected` | `onBotConnected` | - |
| `BotDisconnected` | `onBotDisconnected` | `Participant` |
| `ParticipantConnected` | `onParticipantJoined` | `Participant` |
| `ParticipantLeft` | `onParticipantLeft` | `Participant` |
### Message and Error Events
| Event Name | Callback Name | Data Type |
| --------------- | ----------------- | ------------- |
| `ServerMessage` | `onServerMessage` | `any` |
| `MessageError` | `onMessageError` | `RTVIMessage` |
| `Error` | `onError` | `RTVIMessage` |
### Media Events
| Event Name | Callback Name | Data Type |
| ---------------------- | ------------------------ | ------------------------------- |
| `TrackStarted` | `onTrackStarted` | `MediaStreamTrack, Participant` |
| `TrackStopped` | `onTrackStopped` | `MediaStreamTrack, Participant` |
| `AvailableMicsUpdated` | `onAvailableMicsUpdated` | `MediaDeviceInfo[]` |
| `AvailableCamsUpdated` | `onAvailableCamsUpdated` | `MediaDeviceInfo[]` |
| `MicUpdated` | `onMicUpdated` | `MediaDeviceInfo` |
| `CamUpdated` | `onCamUpdated` | `MediaDeviceInfo` |
| `SpeakerUpdated` | `onSpeakerUpdated` | `MediaDeviceInfo` |
| `DeviceError` | `onDeviceError` | `DeviceError` |
### Audio Activity Events
| Event Name | Callback Name | Data Type |
| --------------------- | ----------------------- | --------------------- |
| `LocalAudioLevel` | `onLocalAudioLevel` | `number` |
| `RemoteAudioLevel` | `onRemoteAudioLevel` | `number, Participant` |
| `BotStartedSpeaking` | `onBotStartedSpeaking` | - |
| `BotStoppedSpeaking` | `onBotStoppedSpeaking` | - |
| `UserStartedSpeaking` | `onUserStartedSpeaking` | - |
| `UserStoppedSpeaking` | `onUserStoppedSpeaking` | - |
| `UserMuteStarted` | `onUserMuteStarted` | - |
| `UserMuteStopped` | `onUserMuteStopped` | - |
### Text and Transcription Events
| Event Name | Callback Name | Data Type |
| ------------------- | --------------------- | ---------------- |
| `UserTranscript` | `onUserTranscript` | `TranscriptData` |
| `BotOutput` | `onBotOutput` | `BotOutputData` |
| ~~`BotTranscript`~~ | ~~`onBotTranscript`~~ | `BotLLMTextData` |
| `BotLlmText` | `onBotLlmText` | `BotLLMTextData` |
| `BotTtsText` | `onBotTtsText` | `BotTTSTextData` |
### Service State Events
| Event Name | Callback Name | Data Type |
| ---------------------- | ------------------------ | -------------------------- |
| `BotLlmSearchResponse` | `onBotLlmSearchResponse` | `BotLLMSearchResponseData` |
| `BotLlmStarted` | `onBotLlmStarted` | - |
| `BotLlmStopped` | `onBotLlmStopped` | - |
| `BotTtsStarted` | `onBotTtsStarted` | - |
| `BotTtsStopped` | `onBotTtsStopped` | - |
### Function Call Events
| Event Name | Callback Name | Data Type |
| --------------------------- | ----------------------------- | ------------------------------- |
| `LLMFunctionCallStarted` | `onLLMFunctionCallStarted` | `LLMFunctionCallStartedData` |
| `LLMFunctionCallInProgress` | `onLLMFunctionCallInProgress` | `LLMFunctionCallInProgressData` |
| `LLMFunctionCallStopped` | `onLLMFunctionCallStopped` | `LLMFunctionCallStoppedData` |
| ~~`LLMFunctionCall`~~ | ~~`onLLMFunctionCall`~~ | `LLMFunctionCallData` |
### Other Events
| Event Name | Callback Name | Data Type |
| ---------- | ------------- | -------------------- |
| `Metrics` | `onMetrics` | `PipecatMetricsData` |
## Usage Example
```typescript theme={null}
import { PipecatClient, RTVIEvent } from "@pipecat-ai/client-js";
// Using callbacks
const pcClient = new PipecatClient({
callbacks: {
onBotReady: () => console.log("Bot ready via callback"),
onUserTranscript: (data) => console.log("Transcript:", data.text),
},
});
// Alternate approach: Using event listeners
pcClient.on(RTVIEvent.BotReady, () => {
console.log("Bot ready via event");
});
```
## Transport Compatibility
# PipecatClient Constructor
Source: https://docs.pipecat.ai/client/js/api-reference/client-constructor
Setting up the PipecatClient
```javascript theme={null}
import { PipecatClient } from "@pipecat-ai/client-js";
```
`PipecatClient` is the primary component for building the client-side portion of a client-bot interaction. It is designed to work with various transport layers, such as WebRTC, WebSockets, or HTTP, allowing you to pick and choose the communication layer that best suits your application while maintaining a consistent API.
When initializing the `PipecatClient`, you must provide a transport instance
to the constructor for your chosen protocol or provider. See
[Transport](/client/js/transports) for more information. For the purpose of
this guide, we'll demonstrate using the [Daily WebRTC
transport](/client/js/transports/daily).
## Example
```typescript theme={null}
import { RTVIEvent, RTVIMessage, PipecatClient } from "@pipecat-ai/client-js";
import { DailyTransport } from "@pipecat-ai/daily-transport";
const PipecatClient = new PipecatClient({
transport: new DailyTransport(),
enableMic: true,
enableCam: false,
enableScreenShare: false,
timeout: 15 * 1000,
callbacks: {
onConnected: () => {
console.log("[CALLBACK] User connected");
},
onDisconnected: () => {
console.log("[CALLBACK] User disconnected");
},
onTransportStateChanged: (state: string) => {
console.log("[CALLBACK] State change:", state);
},
onBotConnected: () => {
console.log("[CALLBACK] Bot connected");
},
onBotDisconnected: () => {
console.log("[CALLBACK] Bot disconnected");
},
onBotReady: () => {
console.log("[CALLBACK] Bot ready to chat!");
},
},
});
```
***
## API reference
### transport
An instance of the `Transport` type you will use to connect to your bot service (`PipecatClient.connect()`). Transports implement the underlying device management, connectivity, media transmission, and state logic that manage the lifecycle of your session.
As a best practice, we recommend you construct the transport inline in the
client constructor, as opposed to holding a reference to it. Access to the
transport is typically unnecessary. For advanced use cases that do require
access to the transport, we recommend doing so via the
`PipecatClient.transport` property, which provides some additional safeguards.
```typescript theme={null}
import { PipecatClient } from "@pipecat-ai/client-js";
import { DailyTransport } from "@pipecat-ai/daily-transport";
const pcClient = new PipecatClient({
transport: new DailyTransport(),
});
```
### callbacks
Map of callback functions. See [callbacks](./callbacks).
### Media Initialization
Enable user's local microphone device.
Enable user's local webcam device. Note: Not all transports support video.
Setting this value in that case will have no effect.
Enable user's local screen share. Note: Not all transports support screen
sharing. Setting this value in that case will have no effect.
# Client Methods
Source: https://docs.pipecat.ai/client/js/api-reference/client-methods
The Pipecat JavaScript client provides a comprehensive set of methods for managing bot interactions and media handling. These core methods are documented below.
## Session connectivity
### startBot()
`async startBot(startBotParams: APIEndpoint): Promise`
This method hits your server endpoint to start the bot and optionally obtain the connection parameters needed for `connect()` to connect the `Transport`. It returns a Promise that resolves with the response from the server.
The `APIEndpoint` object should have the following shape:
The URL of the endpoint to connect to. This should be a valid REST endpoint.
Optional headers to include in the request to the endpoint. This can be used to pass authentication tokens or other necessary headers.
Optional request data to include in the request to the endpoint. This can be used to pass additional data to your server-side endpoint. Oftentimes, this is used to pass the initial prompt or other configuration data to initialize the bot.
Optional timeout in milliseconds for the request to the endpoint.
During the `startBot()` process, the transport state will transition through the states: "authenticating" and "authenticated".
```javascript theme={null}
try {
await pcClient.startBot({
endpoint: "/api/start", // Your server endpoint to start the bot
requestData: {
initial_prompt: "You are a pirate captain",
llm_provider: "openai",
},
});
} catch (error) {
console.error("Error starting the bot:", error);
}
```
### connect()
`async connect(connectParams): Promise`
This method initiates the connection process, optionally passing parameters that your transport class requires to establish a connection or an endpoint to your server for obtaining those parameters.
An object containing the `TransportConnectionParams` your Transport expects.
Check your transport class documentation for the expected shape of `TransportConnectionParams`. For example, the DailyTransport expects a `url` and `token`.
In 1.2.0 we deprecated support for passing a `ConnectionEndpoint` object directly to `connect()`. Instead, you should use the `startBot()` or `startBotAndConnect()` methods to fetch connection parameters from your server endpoint and then pass those parameters directly to `connect()`.
This method can be try / catched to handle errors at startup:
```typescript theme={null}
try {
await pcClient.connect({
webrtcUrl: "http://my-server/api/offer",
});
} catch (error) {
console.error("Error connecting to the bot:", error);
}
```
During the connection process, the transport state will transition through the following states: "connecting", "connected", "ready".
Calling `connect()` asynchronously will resolve when the bot and client signal
that they are ready. See [messages and events](./messages). If you want to
call `connect()` without `await`, you can use the `onBotReady` callback or
`BotReady` event to know when you can interact with the bot.
Attempting to call `connect()` when the transport is already in a 'connected'
or 'ready' state will throw an error. You should [disconnect](#disconnect)
from a session first before attempting to connect again.
### startBotAndConnect()
`async startBotAndConnect(startBotParams: APIEndpoint): Promise`
This method combines the functionality of `startBot()` and `connect()`. It first starts the bot by hitting your server endpoint and then connects the transport passing the response from the endpoint to the transport as connection parameters.
```javascript theme={null}
try {
await pcClient.startBotAndConnect({
endpoint: "/api/start", // Your server endpoint to start the bot
requestData: {
initial_prompt: "You are a pirate captain",
llm_provider: "openai",
},
});
} catch (error) {
console.error("Error starting up:", error);
}
```
It's equivalent to: `pcClient.startBot(...).then((resp) => pcClient.connect(resp))`.
### disconnect()
`async disconnect(): Promise`
Disconnects from the active session. The transport state will transition to "disconnecting" and then "disconnected".
It is common practice for bots to exit and cleanup when the client disconnects.
```typescript theme={null}
await pcClient.disconnect();
```
### disconnectBot()
`disconnectBot(): void`
Triggers the bot to disconnect from the session, leaving the client connected.
```typescript theme={null}
await pcClient.disconnectBot();
```
## Messages
Custom messaging between the client and the bot. This is useful for sending data to the bot, triggering specific actions, reacting to server events, or querying the server.
For more, see: [messages and events](./messages).
### sendClientMessage()
`sendClientMessage(msgType: string, data?: unknown): void`
Sends a custom message to the bot and does not expect a response. This is useful for sending data to the bot or triggering specific actions.
A string identifying the message.
Optional data to send with the message. This can be any JSON-serializable
object.
### sendClientRequest()
`async sendClientRequest(msgType: string, data: unknown, timeout?: number): Promise`
Sends a custom request to the bot and expects a response. This is useful for querying the server or triggering specific actions that require a response. The method returns a Promise that resolves with the data from response.
A string identifying the message.
Optional data to send with the message. This can be any JSON-serializable
object.
Optional timeout in milliseconds for the request. If the request does not
receive a response within this time, it will reject with an RTVIMessage of
type `'error-response'`.
## Devices
### initDevices()
`async initDevices(): Promise`
Initializes the media device selection machinery, based on `enableCam`/`enableMic` selections and defaults (i.e. turns on the local cam/mic). This method can be called before `connect()` to test and switch between camera and microphone sources.
```typescript theme={null}
await pcClient.initDevices();
```
### getAllMics()
`async getAllMics(): Promise`
Returns a list of available microphones in the form of [`MediaDeviceInfo[]`](https://developer.mozilla.org/en-US/docs/Web/API/MediaDeviceInfo).
```typescript theme={null}
mic_device_list = pcClient.getAllMics();
```
### getAllCams()
`async getAllCams(): Promise`
Returns a list of available cameras in the form of [`MediaDeviceInfo[]`](https://developer.mozilla.org/en-US/docs/Web/API/MediaDeviceInfo).
```typescript theme={null}
cam_device_list = pcClient.getAllCams();
```
### getAllSpeakers()
`async getAllSpeakers(): Promise`
Returns a list of available speakers in the form of [`MediaDeviceInfo[]`](https://developer.mozilla.org/en-US/docs/Web/API/MediaDeviceInfo).
```typescript theme={null}
speaker_device_list = pcClient.getAllSpeakers();
```
### selectedMic
`selectedMic: MediaDeviceInfo | {}`
The currently selected microphone, represented as a `MediaDeviceInfo` object. If no microphone is selected, it returns an empty object.
```typescript theme={null}
current_mic = pcClient.selectedMic;
```
### selectedCam
`selectedCam: MediaDeviceInfo | {}`
The currently selected camera, represented as a `MediaDeviceInfo` object. If no camera is selected, it returns an empty object.
```typescript theme={null}
current_cam = pcClient.selectedCam;
```
### selectedSpeaker
`selectedSpeaker: MediaDeviceInfo | {}`
The currently selected speaker, represented as a `MediaDeviceInfo` object. If no speaker is selected, it returns an empty object.
```typescript theme={null}
current_speaker = pcClient.selectedSpeaker;
```
### updateMic()
`updateMic(micId: string): void`
Switches to the microphone identified by the provided `micId`, which should match a `deviceId` in the list returned from [`getAllMics()`](#getAllMics).
deviceId
```typescript theme={null}
pcClient.updateMic(deviceId);
```
### updateCam()
`updateCam(camId: string): void`
Switches to the camera identified by the provided `camId`, which should match a `deviceId` in the list returned from [`getAllCams()`](#getAllCams).
deviceId
```typescript theme={null}
pcClient.updateCam(deviceId);
```
### updateSpeaker()
`updateSpeaker(speakerId: string): void`
Switches to the speaker identified by the provided `speakerId`, which should match a `deviceId` in the list returned from [`getAllSpeakers()`](#getAllSpeakers).
deviceId
```typescript theme={null}
pcClient.updateSpeaker(deviceId);
```
### enableMic(enable: boolean)
`enableMic(enable: boolean): void`
Turn on or off (unmute or mute) the client mic input.
A boolean indicating whether to enable (`true`) or disable (`false`) the
microphone.
```typescript theme={null}
pcClient.enableMic(true);
```
### enableCam(enable: boolean)
`enableCam(enable: boolean): void`
Turn on or off the client cam input.
A boolean indicating whether to enable (`true`) or disable (`false`) the
camera.
```typescript theme={null}
pcClient.enableCam(true);
```
### enableScreenShare(enable: boolean)
`enableScreenShare(enable: boolean): void`
Start a screen share from the client's device.
A boolean indicating whether to enable (`true`) or disable (`false`) screen
sharing.
```typescript theme={null}
pcClient.enableScreenShare(true);
```
### isMicEnabled
`isMicEnabled: boolean`
An accessor to determine if the client's microphone is enabled.
```typescript theme={null}
mic_enabled = pcClient.isMicEnabled;
```
### isCamEnabled
`isCamEnabled: boolean`
An accessor to determine if the client's camera is enabled.
```typescript theme={null}
cam_enabled = pcClient.isCamEnabled;
```
### isSharingScreen
An accessor to determine if the client is sharing their screen.
```typescript theme={null}
screen_sharing = pcClient.isSharingScreen;
```
## Tracks (audio and video)
### tracks()
`tracks(): Tracks`
Returns a `Tracks` object with available `MediaStreamTrack` objects for both the client and the bot.
```typescript theme={null}
live_tracks_list = pcClient.tracks();
```
**Tracks Type**
```typescript theme={null}
{
local: {
audio?: MediaStreamTrack;
video?: MediaStreamTrack;
},
bot?: {
audio?: MediaStreamTrack;
video?: MediaStreamTrack;
}
}
```
## Advanced LLM Interactions
### sendText()
`async sendText(content: string, options?: SendTextOptions = {}): void`
A method to append text to the user's context. This is useful for providing text input as an alternative to audio input for the user.
The text content to send to the bot.
An optional set of options for how the bot should handle the text input.
Whether to immediately run the bot with the updated context. If `false`,
the context will be updated but the bot will not be run until the next
message or action that triggers the bot to run (like the user speaking).
Whether the bot should respond with audio. If `true`, the bot's response
will be processed by TTS and be spoken. If `false`, the bot will bypass
the TTS and respond with text only.
### appendToContext()
`async appendToContext(context: LLMContextMessage): boolean`
In 0.0.4, we deprecated `appendToContext` in favor of `send-text` to close a
security vulnerability as well as lay the groundwork for proper support of
passing other types of context, like images and files.
A method to append data to the bot's context. This is useful for providing additional information or context to the bot during the conversation.
The context to append. This should be an object with the following shape:
The role to append the context to. Currently only "user" or "assistant"
are supported.
The content to append to the context. This can be any JSON-serializable
object.
Whether to immediately run the bot with the updated context. If `false`,
the context will be updated but the bot will not be run until the next
message or action that triggers the bot to run (like the user speaking).
### registerFunctionCallHandler()
`registerFunctionCallHandler(functionName: string, callback: FunctionCallCallback): void`
Registers a function call handler that will be called when the bot requests a function call. This is useful for when the server-side function handler needs information from the client to execute the function call or when the client needs to perform some action based on the running of function call.
The name of the function to handle. This should match the function name in the bot's context.
`type FunctionCallCallback = (fn: FunctionCallParams) => Promise`
The callback function to call when the bot sends a function call request. This function should accept the following parameters:
The name of the function being called. It should always match the name you
registered the handler under.
The arguments passed to the function call. This is a key-value object where
the keys are the argument names and the values are the argument values.
The callback should return a Promise that resolves with the result of the function call or void if no result is needed. If returning a result, it should be a `string` or `Record`.
## Other
### transport
`transport: Transport`
A safe accessor for the transport instance used by the client. This is useful for accessing transport-specific methods or properties that are not exposed directly on the client.
```typescript theme={null}
const transport = pcClient.transport as DailyTransport;
transport.getSessionInfo();
```
### setLogLevel()
`setLogLevel(level: LogLevel): void`
Sets the log level for the client. This is useful for debugging and controlling the verbosity of logs. The log levels are defined in the `LogLevel` enum:
```typescript theme={null}
export enum LogLevel {
NONE = 0,
ERROR = 1,
WARN = 2,
INFO = 3,
DEBUG = 4,
}
```
By default, the log level is set to `LogLevel.DEBUG`.
```typescript theme={null}
pcClient.setLogLevel(LogLevel.INFO);
```
## Transport Compatibility
# Errors
Source: https://docs.pipecat.ai/client/js/api-reference/errors
## RTVIError Type
Base `PipecatClient` error type, extends [`Error`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Error) and primarily introduces the `status` field. Most methods will try to throw an error of this type when something goes wrong. This is different from the RTVI error event and its corresponding [`onError` callback](./callbacks#param-on-error), which are used for communicating errors that are sent by the bot.
A unique identifier (or HTTP code if applicable) for the error.
A human-readable message describing the error.
## Pre-defined RTVIErrors
### ConnectionTimeoutError
Emitted when the bot does not enter a ready state within the specified timeout period during the `connect()` method call.
### StartBotError
Emitted from `startBot()` or `startBotAndConnect()` when the endpoint responds with an error or the `fetch` itself fails. This may be due to the endpoint being unavailable, or the server failing to parse the provided data.
All `StartBotError` instances will have an `error` field set to
`invalid-request-error`.
HTTP status code returned by the endpoint, if applicable.
Verbose error message returned by the endpoint, if provided. To take advantage
of this, the endpoint should return an error response with a JSON object with
an `info` field containing the error message.
### TransportStartError
Emitted when the Transport is not able to connect. You may need to check the connection parameters provided or returned from you endpoint.
### BotNotReadyError
Emitted when the client attempts to perform an action or method that requires the bot to be in a ready state, but the bot is not ready. You must call `connect()` first and wait for the bot to be ready before performing such actions.
### DeviceError
Emitted when there is an issue with acquiring or using a media device, such as a camera or microphone. This could be due to permissions issues, device unavailability, or other related problems.
List of media devices, `'cam'`, `'mic'`, and/or `'speaker'`, that are
unavailable or could not be accessed.
The `type` field will indicate what type of device error occurred. Options include:
* `"in-use"`: A device is currently in use by another application and cannot be accessed. *windows only*
* `"permissions"`: The user has not granted permission to access the media device.
* `"undefined-mediadevices"`: `getUserMedia()` is not an available API on the current platform or browser.
* `"not-found"`: The specified media device could not be found.
* `"constraints"`: The media device could not be configured with the specified constraints.
* `"unknown"`: An unknown error occurred while accessing the media device.
Additional details about the device error, if available.
### UnsupportedFeatureError
Not all Transports are created equal, and some may not support certain features. This error is thrown when a feature is requested that the current Transport does not support.
This custom field will contain the name of the unsupported feature.
# Custom Messaging
Source: https://docs.pipecat.ai/client/js/api-reference/messages
The Pipecat JavaScript client can send and receive arbitrary messages to/from the server running the bot. This page outlines and demonstrates both client and server code for passing and responding to custom messages as well as providing arbitrary data at connection time.
## Connection-Time Configuration
Oftentimes clients need to provide configuration data to the server when starting the bot. This can include things like preferred language, user preferences, initial messages, or any other data that the server needs to know about the client. This must occur before the bot is started and therefore is not part of the RTVI standard, but rather a custom implementation. That said, the `PipecatClient` makes it easy to send this data as part of the `startBot()` and `startBotAndConnect()` methods, by passing an API endpoint object with the `requestData` property. Your server endpoint can then handle this data as needed. In the example below, we demonstrate sending an initial prompt and preferred language to the server when connecting.
```javascript client theme={null}
try {
pcClient.startBotAndConnect({
endpoint: '/api/start', // Your server endpoint to start the bot
requestData: {
initial_prompt: "You are a pirate captain",
preferred_language: 'en-US'
}
});
} catch (error) {
console.error("Error starting the bot:", error);
}
```
```python FastAPI endpoint theme={null}
def validate_request_data(body: Dict[str, Any]) -> Tuple[str, str]:
"""Validate and extract prompt and language from request data."""
if not isinstance(body, dict):
raise ValueError("Request body must be a dictionary")
prompt = body.get("initial_prompt", "You are a pirate captain")
lang = body.get("preferred_language", "en-US")
if not isinstance(prompt, str) or not isinstance(lang, str):
raise ValueError("Both initial_prompt and preferred_language must be strings")
return prompt, lang
@app.post("/api/start")
async def start(request: Request) -> Dict[Any, Any]:
"""Startup endpoint that creates a room, starts the bot, and returns
connection credentials.
This endpoint is called by clients to kick off a session and establish
a connection.
Returns:
Dict[Any, Any]: Authentication bundle containing room_url and token
Raises:
HTTPException: If room creation, token generation, or bot startup fails
"""
body = await request.json()
try:
prompt, lang = validate_request_data(body)
except ValueError as e:
raise HTTPException(status_code=400, info=f"Invalid request data: {e}")
print("Creating room for RTVI connection", body)
room_url, token = await create_room_and_token()
print(f"Room URL: {room_url}")
# Start the bot process
try:
proc = subprocess.Popen(
[f"python3 -m bot -u {room_url} -t {token} -p {prompt} -l {lang}"],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__)),
)
bot_procs[proc.pid] = (proc, room_url)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to start subprocess: {e}")
# Return the authentication bundle in format expected by DailyTransport
return {"url": room_url, "token": token}
```
```python bot theme={null}
import argparse
import asyncio
def extract_arguments():
parser = argparse.ArgumentParser(description="Example")
parser.add_argument(
"-u", "--room-url", type=str, default=os.getenv("DAILY_SAMPLE_ROOM_URL", "")
)
parser.add_argument(
"-t", "--token", type=str, default=os.getenv("DAILY_SAMPLE_ROOM_TOKEN", None)
)
parser.add_argument(
"-p", "--prompt", type=str, default="You are a pirate captain"
)
parser.add_argument("-l", "--language", type=str, default="en-US")
return parser.parse_args()
async def main():
args = extract_arguments()
print(f"room_url: {args.room_url}")
daily_transport = DailyTransport(
args.room_url,
args.token,
"Chatbot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
)
llm = GeminiLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
settings=GeminiLiveLLMService.Settings(
system_instruction=SYSTEM_INSTRUCTION,
voice="Puck",
language=args.language, # Pass preferred language to LLM
),
)
messages = [ { role: "system", content: args.prompt } ]
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
daily_transport.input(),
user_aggregator,
llm,
daily_transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(pipeline)
@rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
logger.debug("Client ready")
# Kick off the conversation
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.debug(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())
```
## Sending Custom Messages to the Server
Once connected, you can send custom messages to the server using the `sendClientMessage` method. This is useful for triggering specific actions or sending data that the server needs to process.
```javascript client theme={null}
try {
pcClient.sendClientMessage('set-language', { language: 'en-US' });
} catch (error) {
console.error("Error sending message to server:", error);
}
```
```python bot theme={null}
task = PipelineTask(
pipeline,
params,
observers=[RTVIObserver(rtvi)],
)
@rtvi.event_handler("on_client_message")
async def on_client_message(rtvi, msg):
print("RTVI client message:", msg.type, msg.data)
if msg.type == "set-language":
language = msg.data.get("language", "en-US")
await task.queue_frames([STTUpdateSettingsFrame(language=language)])
### Alternatively, if your message requires asynchronous processing or storing of
### state, you may want to handle it from inside a FrameProcessor, listen for a
### RTVIClientMessageFrame and push a RTVIServerResponseFrame
class CustomFrameProcessor(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, RTVIClientMessageFrame):
print("RTVI client message:", frame.msg_id, frame.type, frame.data)
if frame.type == "set-language":
language = frame.data.get("language", "en-US")
await self.push_frame(STTUpdateSettingsFrame(language=language))
return
await self.push_frame(frame, direction)
```
## Requesting Information from the Server
You can also request information from the server using the `sendClientRequest` method. This is useful for querying the server for specific data or triggering and action and getting a success/failure response.
```javascript client theme={null}
try {
const response = await pcClient.sendClientRequest('get-language');
console.log("Current language:", response.language);
} catch (error) {
console.error("Error requesting data from server:", error);
}
```
```python bot theme={null}
@rtvi.event_handler("on_client_message")
async def on_client_message(rtvi, msg):
print("RTVI client message:", msg.type, msg.data)
if msg.type == "get-language":
await rtvi.send_server_response(msg, {"language": get_current_language()})
else:
await rtvi.send_error_response(msg, "Unknown request type")
### Alternatively, if your message requires asynchronous processing or storing of
### state, you may want to handle it from inside a FrameProcessor, listen for a
### RTVIClientMessageFrame and push a RTVIServerResponseFrame
class CustomFrameProcessor(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, RTVIClientMessageFrame):
print("RTVI client message:", frame.msg_id, frame.type, frame.data)
if frame.type == "get-language":
data = {"language": get_current_language()}
await self.push_frame(
RTVIServerResponseFrame(
client_msg=frame,
data=data,
),
)
return
else:
await self.push_frame(
RTVIServerResponseFrame(
client_msg=frame,
error="Unknown request type"
)
)
await self.push_frame(frame, direction)
```
## Handling Custom Messages from the Server
You can handle custom messages sent from the server using the `onServerMessage` callback. This allows you to process messages that the server sends back to the client, such as notifications or updates. For full details on sending server messages from your bot, see the [RTVIProcessor custom messaging docs](/api-reference/server/rtvi/rtvi-processor#custom-messaging).
```javascript client theme={null}
pcClient.onServerMessage((message) => {
console.log("Received message from server:", message);
if (message.data.msg === 'language-updated') {
console.log("Language updated to:", message.data.language);
}
});
```
```python bot theme={null}
## From inside an Observer, call `send_server_message` directly on your rtvi instance
class CustomObserver(BaseObserver):
async def on_push_frame(self, data: FramePushed):
if isinstance(frame, STTUpdateSettingsFrame):
for key, value in settings.items():
if key == "language":
await rtvi.send_server_message({
"msg": "language-updated",
"language": value
})
### Alternatively, from inside a FrameProcessor, push a RTVIServerMessageFrame
class CustomFrameProcessor(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, STTUpdateSettingsFrame):
for key, value in settings.items():
if key == "language":
await self.push_frame(
RTVIServerMessageFrame(
data={
"msg": "language-updated",
"language": value
}
)
)
await self.push_frame(frame, direction)
```
# SDK Introduction
Source: https://docs.pipecat.ai/client/js/introduction
Build web applications with Pipecat’s JavaScript client library
The Pipecat JavaScript SDK provides a lightweight client implementation that handles:
* Device and media stream management
* Connecting to Pipecat bots
* Messaging with Pipecat bots and handling responses using the RTVI standard
* Managing session state and errors
## Installation
Install the SDK and a transport implementation (e.g. Daily for WebRTC):
```bash theme={null}
npm install @pipecat-ai/client-js
npm install @pipecat-ai/[daily-transport, small-webrtc-transport, etc.]
```
## Example
Here's a simple example using Daily as the transport layer:
```javascript theme={null}
import { PipecatClient } from "@pipecat-ai/client-js";
import { DailyTransport } from "@pipecat-ai/daily-transport";
// Handle incoming audio from the bot
function handleBotAudio(track, participant) {
if (participant.local || track.kind !== "audio") return;
const audioElement = document.createElement("audio");
audioElement.srcObject = new MediaStream([track]);
document.body.appendChild(audioElement);
audioElement.play();
}
// Create and configure the client
const pcClient = new PipecatClient({
transport: new DailyTransport(),
enableMic: true,
callbacks: {
onTrackStarted: handleBotAudio,
},
});
// Connect to your bot
pcClient.connect({
url: "https://your-daily-room-url",
token: "your-daily-token",
});
```
## Explore the SDK
Configure your client instance with transport and callbacks
Core methods for interacting with your bot
Handle bot events, messages, and state changes
Daily, SmallWebRTC, WebSocket, and other transports
The Pipecat JavaScript SDK implements the [RTVI standard](/client/rtvi-standard) for real-time AI inference, ensuring compatibility with any RTVI-compatible server and transport layer.
# Daily WebRTC Transport
Source: https://docs.pipecat.ai/client/js/transports/daily
The DailyTransport class provides a WebRTC transport layer using [Daily.co's](https://daily.co) infrastructure. It wraps a Daily-JS call client to handle audio/video device management, WebRTC connections, and real-time communication between clients and bots. For complete documentation on Daily's API, see the [Daily API Reference](https://docs.daily.co/reference/daily-js).
This transport is designed for production use cases, leveraging Daily's global infrastructure for low-latency, high-quality audio and video streaming. It expects your Pipecat server to include the corresponding [`DailyTransport` server-side](/api-reference/server/services/transport/daily) implementation.
## Usage
### Basic Setup
```javascript theme={null}
import { PipecatClient } from "@pipecat-ai/client-js";
import { DailyTransport } from "@pipecat-ai/daily-transport";
const pcClient = new PipecatClient({
transport: new DailyTransport({
// DailyTransport constructor options
bufferLocalAudioUntilBotReady: true, // Optional, defaults to false
inputSettings: { video: { processor: { type: "background-blur" } } },
}),
enableCam: false, // Default camera off
enableMic: true, // Default microphone on
callbacks: {
// Event handlers
},
// ...
});
await pcClient.connect({
url: "https://your-domain.daily.co/room",
token: "your-daily-token", // Optional, if your room requires authentication
});
```
## API Reference
### Constructor Options
```typescript theme={null}
interface DailyTransportConstructorOptions extends DailyFactoryOptions {
bufferLocalAudioUntilBotReady?: boolean;
}
```
If set to `true`, the transport will buffer local audio until the bot is ready. This is useful for ensuring that bot gets any audio from the user that started before the bot is ready to process it.
The `DailyTransportConstructorOptions` extends the `DailyFactoryOptions` type that is accepted by the underlying Daily instance. These options are passed directly through to the Daily constructor. See the [Daily API Reference](https://docs.daily.co/reference/daily-js/daily-call-client/properties) for a complete list of options.
While you can provide the room url and optional token as part of your
constructor options, the typical pattern is to provide them via a connection
endpoint with `startBot()` or directly as part of `connect()`. See below.
### TransportConnectionParams
On `connect()`, the `DailyTransport` optionally takes a set of [`DailyCallOptions`](https://docs.daily.co/reference/daily-js/daily-call-client/methods#dailycalloptions) to connect to a Daily room. This can be provided directly to the `PipecatClient`'s `connect()` method or via a starting endpoint passed to the `PipecatClient`'s `startBotAndConnect()` method. If using an endpoint, your endpoint should return a JSON object matching the `DailyCallOptions` type. See the [client connect()](/client/js/api-reference/client-methods#connect) documentation for more information.
```typescript client theme={null}
pcClient.connect({
url: 'https://your.daily.co/room'
});
// OR...
pcClient.startBotAndConnect({
endpoint: '/api/start', // Your server endpoint to start the bot
});
```
```python server theme={null}
@app.post("/api/start")
async def start(request: Request) -> Dict[Any, Any]:
print("Creating room and token for RTVI connection")
room_url, token = await create_room_and_token()
# Start the bot process
print("Starting bot subprocess")
try:
subprocess.Popen(
[f"python3 -m bot.py -u {room_url} -t {token}"],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__)),
)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to start subprocess: {e}")
# Return the Daily call options in format expected by DailyTransport/Daily Call Object
return {"url": room_url, "token": token}
```
### Methods
For most operations, you will not interact with the transport directly. Most methods have an equivalent in the `PipecatClient` and should be called from the `PipecatClient`. However, there are a few transport-specific methods that you may need to call directly. When doing so, be sure to access your transport via the `transport` property of the `PipecatClient` instance.
* `preAuth()`
This is the one method meant to be called directly, which is used to allow you to gather information about the Daily room prior to connecting. As a Daily-specific action, it is not exposed through the `PipecatClient`. This method must be called prior to `connect()` and use the same `room_url` and `token` (optional) as what will be used on `connect()`.
```typescript theme={null}
pcClient.transport.preAuth({
url: "https://your.daily.co/room",
token: "your_token",
});
const roomInfo = pcClient.transport.dailyCallClient.room();
```
## Events
The transport implements the various [`PipecatClient` event handlers](/client/js/api-reference/callbacks). For Daily-specific events, you can attach listeners to the underlying Daily call client. For a list of available events, see the [Daily API Reference](https://docs.daily.co/reference/daily-js/events).
```typescript theme={null}
pcClient.transport.dailyCallClient.on('recording-started', (ev) => {...});
```
## Advanced
### Accessing the Daily Call
For advanced use cases, where you may need to work with the Daily call client directly, you can access it via the `dailyCallClient` property.
```javascript theme={null}
const dailyCall = pcClient.transport.dailyCallClient;
```
The Daily call client returned is safe-guarded to not allow you to call
functions which affect the call's lifecycle and will redirect you to use
either a Transport method or the `PipecatClient` to perform the equivalent
action.
## More Information
Simple Chatbot Demo
`DailyTransport`
`@pipecat-ai/daily-transport`
# GeminiLiveWebSocketTransport
Source: https://docs.pipecat.ai/client/js/transports/gemini
## Overview
The `GeminiLiveWebsocketTransport` class implements a fully functional [Pipecat `Transport`](./transport), providing a framework for implementing real-time communication directly with the [Gemini Live](https://ai.google.dev/api/multimodal-live) service. Like all transports, it handles media device management, audio/video streams, and state management for the connection.
Transports of this type are designed primarily for development and testing
purposes. For production applications, you will need to build a server
component with a server-friendly transport, like the
[DailyTransport](./daily), to securely handle API keys.
## Usage
### Basic Setup
```javascript theme={null}
import {
GeminiLiveWebsocketTransport,
GeminiLLMServiceOptions,
} from "@pipecat-ai/gemini-live-websocket-transport";
import { PipecatClient } from "@pipecat-ai/client-js";
const options: GeminiLLMServiceOptions = {
api_key: "YOUR_API_KEY",
initial_messages: [
// Set up initial system and user messages.
// Without the user message, the bot will not respond immediately
// and wait for the user to speak first.
{
role: "model",
content: "You are a confused jellyfish.",
},
{ role: "user", content: "Blub blub!" },
],
settings: {
temperature: 0.7,
maxOutput_tokens: 1000,
},
};
const transport = new GeminiLiveWebsocketTransport(options);
let pcClient = new PipecatClient({
transport: new GeminiLiveWebsocketTransport(options),
callbacks: {
// Event handlers
},
});
pcClient.connect();
```
## API Reference
### Constructor Options
#### `GeminiLLMServiceOptions`
```typescript theme={null}
interface GeminiLLMServiceOptions {
api_key: string; // Required: Your Gemini API key
initial_messages?: Array<{
// Optional: Initial conversation context
content: string;
role: string;
}>;
settings?: {
// Optional: Generation parameters
candidate_count?: number;
max_output_tokens?: number;
temperature?: number;
top_p?: number;
top_k?: number;
presence_penalty?: number;
frequency_penalty?: number;
response_modalities?: string;
speech_config?: {
voice_config?: {
prebuilt_voice_config?: {
voice_name: "Puck" | "Charon" | "Kore" | "Fenrir" | "Aoede";
};
};
};
};
}
```
### TransportConnectionParams
The `GeminiLiveWebsocketTransport` does not take connection parameters. It connects directly to the Gemini Live service using the API key provided as part of the initial configuration.
### Events
The GeminiLiveWebSocketTransport implements the various [PipecatClient event handlers](/client/js/api-reference/callbacks). Check out the docs or samples for more info.
## More Information
Gemini Live Basic Demo
`GeminiLiveWebsocketTransport`
`@pipecat-ai/gemini-live-websocket-transport`
# OpenAIRealTimeWebRTCTransport
Source: https://docs.pipecat.ai/client/js/transports/openai-webrtc
## Overview
The `OpenAIRealTimeWebRTCTransport` is a fully functional [Pipecat `Transport`](/client/js/transports/transport). It provides a framework for implementing real-time communication directly with the [OpenAI Realtime API using WebRTC](https://platform.openai.com/docs/guides/realtime-webrtc) voice-to-voice service. It handles media device management, audio/video streams, and state management for the connection.
Transports of this type are designed primarily for development and testing
purposes. For production applications, you will need to build a server
component with a server-friendly transport, like the
[DailyTransport](./daily), to securely handle API keys.
## Usage
### Basic Setup
```javascript theme={null}
import { OpenAIRealTimeWebRTCTransport, OpenAIServiceOptions } from '@pipecat-ai/openai-realtime-webrtc-transport';
import { PipecatClient } from '@pipecat-ai/client-js';
const options: OpenAIServiceOptions = {
api_key: 'YOUR_API_KEY',
settings: {
instructions: 'You are a confused jellyfish.',
},
initial_messages: [{ role: "user", content: "Blub blub!" }],
};
let pcClient = new PipecatClient({
transport: new OpenAIRealTimeWebRTCTransport (options),
...
});
pcClient.connect();
```
## API Reference
### Constructor Options
Below is the transport's type definition for the OpenAI Session configuration you need to pass in to the `create()` method. See the [OpenAI Realtime API documentation](https://platform.openai.com/docs/api-reference/realtime-client-events/session/update) for more details on each of the options and their defaults.
```typescript theme={null}
export type OpenAIFunctionTool = {
type: "function";
name: string;
description: string;
parameters: JSONSchema;
};
export type OpenAIServerVad = {
type: "server_vad";
create_response?: boolean;
interrupt_response?: boolean;
prefix_padding_ms?: number;
silence_duration_ms?: number;
threshold?: number;
};
export type OpenAISemanticVAD = {
type: "semantic_vad";
eagerness?: "low" | "medium" | "high" | "auto";
create_response?: boolean; // defaults to true
interrupt_response?: boolean; // defaults to true
};
export type OpenAISessionConfig = Partial<{
modalities?: string;
instructions?: string;
voice?:
| "alloy"
| "ash"
| "ballad"
| "coral"
| "echo"
| "sage"
| "shimmer"
| "verse";
input_audio_noise_reduction?: {
type: "near_field" | "far_field";
} | null; // defaults to null/off
input_audio_transcription?: {
model: "whisper-1" | "gpt-4o-transcribe" | "gpt-4o-mini-transcribe";
language?: string;
prompt?: string[] | string; // gpt-4o models take a string
} | null; // we default this to gpt-4o-transcribe
turn_detection?: OpenAIServerVad | OpenAISemanticVAD | null; // defaults to server_vad
temperature?: number;
max_tokens?: number | "inf";
tools?: Array;
}>;
export interface OpenAIServiceOptions {
api_key: string;
model?: string;
initial_messages?: LLMContextMessage[];
settings?: OpenAISessionConfig;
}
```
### TransportConnectionParams
The `OpenAIRealTimeWebRTCTransport` does not take connection parameters. It connects directly to the OpenAI Realtime API using the API key provided as part of the initial configuration.
### Events
The transport implements the various [`PipecatClient` event handlers](/client/js/api-reference/callbacks). Check out the docs or samples for more info.
## More Information
OpenAI Realtime Basic Demo
`OpenAIRealTimeWebRTCTransport`
`@pipecat-ai/openai-realtime-webrtc-transport`
# SmallWebRTCTransport
Source: https://docs.pipecat.ai/client/js/transports/small-webrtc
A lightweight WebRTC transport for peer-to-peer connections with Pipecat
`SmallWebRTCTransport` enables peer-to-peer WebRTC connections between clients and your Pipecat application. It implements bidirectional audio and video streaming using WebRTC for real-time communication.
This transport is intended for lightweight implementations, particularly for local development and testing. It expects your Pipecat server to include the corresponding [`SmallWebRTCTransport` server-side](/api-reference/server/services/transport/small-webrtc) implementation.
## Usage
### Basic Setup
```javascript theme={null}
import { PipecatClient } from "@pipecat-ai/client-js";
import { SmallWebRTCTransport } from "@pipecat-ai/small-webrtc-transport";
const pcClient = new PipecatClient({
transport: new SmallWebRTCTransport({
// Optional configuration for the transport
iceServers: [{ urls: "stun:stun.l.google.com:19302" }],
}),
enableCam: false, // Default camera off
enableMic: true, // Default microphone on
callbacks: {
// Event handlers
},
});
await pcClient.connect({
webrtcUrl: "/api/offer", // Your WebRTC signaling server endpoint
});
```
## API Reference
### Constructor Options
```typescript theme={null}
interface SmallWebRTCTransportConstructorOptions {
iceServers?: RTCIceServer[];
waitForICEGathering?: boolean;
webrtcUrl?: string;
audioCodec?: string;
videoCodec?: string;
mediaManager?: MediaManager;
}
```
#### Properties
Array of ICE server configurations for connection establishment. Default is `[{ urls: "stun:stun.l.google.com:19302" }]`.
```javascript theme={null}
// Set custom ICE servers
transport.iceServers = [
{ urls: "stun:stun.l.google.com:19302" },
{ urls: "stun:stun1.l.google.com:19302" },
];
```
If `true`, the transport will wait for ICE gathering to complete before being
considered `'connected'`.
URL of the WebRTC signaling server's offer endpoint. This endpoint may also be provided as part of `connect()`.
Note: This field used to be called `connectionUrl` in versions prior to
`1.2.0`.
Preferred audio codec to use. If not specified, your browser default will be
used.
Preferred video codec to use. If not specified, your browser default will be
used.
The media manager to use for handling local audio and video streams. This
should not be overridden unless you have a specific reason to use a different
media manager. The default is `DailyMediaManager`, which is suitable for most
use cases. Note that the `DailyMediaManager` does not use any of Daily's
services, it simply takes advantage of vast media support provided by the
Daily library.
### TransportConnectionParams
```typescript theme={null}
export type SmallWebRTCTransportConnectionOptions = {
webrtcUrl?: string;
};
```
On `connect()`, the `SmallWebRTCTransport` optionally takes a set of connection parameters. This can be provided directly to the `PipecatClient`'s `connect()` method or via a starting endpoint passed to the `PipecatClient`'s `startBotAndConnect()` method. If using an endpoint, your endpoint should return a JSON object matching the `SmallWebRTCTransportConnectionOptions` type, which currently expects a single `webrtcUrl` property.
```typescript client theme={null}
pcClient.startBotAndConnect({
endpoint: '/api/start', // Your server endpoint to start the bot and return the webrtcUrl
});
// OR...
pcClient.connect({
webrtcUrl: '/api/offer', // Your WebRTC offer/answer endpoint
});
```
```python server theme={null}
# See
# https://github.com/pipecat-ai/pipecat-examples/blob/main/p2p-webrtc/video-transform/server/server.py
# for a complete example of how to implement the server-side endpoint.
@app.post("/api/offer")
async def offer(request: dict, background_tasks: BackgroundTasks):
pipecat_connection = SmallWebRTCConnection(ice_servers)
await pipecat_connection.initialize(sdp=request["sdp"], type=request["type"])
@pipecat_connection.event_handler("closed")
async def handle_disconnected(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Discarding peer connection for pc_id: {webrtc_connection.pc_id}")
pcs_map.pop(webrtc_connection.pc_id, None)
background_tasks.add_task(run_bot, pipecat_connection)
answer = pipecat_connection.get_answer()
return answer
```
### Methods
For most operations, you will not interact with the transport directly. Most methods have an equivalent in the `PipecatClient` and should be called from the `PipecatClient`. However, there are a few transport-specific methods that you may need to call directly. When doing so, be sure to access your transport via the `transport` property of the `PipecatClient` instance.
Sets the preferred audio codec.
```javascript theme={null}
transport.setAudioCodec("opus");
```
Sets the preferred video codec.
```javascript theme={null}
transport.setVideoCodec("VP8");
```
## Events
The transport implements the various [`PipecatClient` event handlers](/client/js/api-reference/callbacks).
## Connection Process
The connection process follows these steps:
1. The transport negotiates a WebRTC connection with the corresponding pipecat transport, complete with transceivers for the media and a data channel for messaging.
2. The transport sends a message to the pipecat transport to let it know it's ready.
3. The Pipecat transport sends a message letting the client know it is ready.
## Reconnection Handling
The transport includes automatic reconnection logic:
* Up to 3 reconnection attempts after connection failures
* Detection of ICE connection state changes
* Graceful recovery from temporary disconnections
* Graceful disconnect when reconnection attempts fail
## More Information
Real-time video transformation example
`@pipecat-ai/small-webrtc-transport`
# Transport Overview
Source: https://docs.pipecat.ai/client/js/transports/transport
Transports are the means by which `PipecatClient`s communicate with their bot services. Transports implement the underlying device management, connectivity, media transmission, and state logic that manage the lifecycle of your session.
All transport packages (such as `DailyTransport`) extend from the `Transport` base class defined in the `client-js` library. You can extend this class if you are looking to implement your own or add additional functionality.
## Transport lifecycle
Each Pipecat client instance is associated with a transport instance. The instance will re-use the transport instance across multiple calls to `connect()`, allowing you to connect to different bot services without needing to create a new transport or client each time.
```typescript theme={null}
import { PipecatClient } from "@pipecat-ai/client-js";
import { DailyTransport } from "@pipecat-ai/daily-transport";
const pcClient = new PipecatClient({
transport: new DailyTransport(),
...
});
await pcClient.startBotAndConnect({ endpoint: "/api/start" });
await pcClient.disconnect();
await pcClient.connect(); // re-uses url returned from previous startBotAndConnect call, skipping the endpoint
```
## Transport states
`TransportState`
Your transport instance goes through a series of states during its lifecycle. These states are:
Transport is idle and has not yet been initialized (default state).
Transport is being initialized. This occurs in response to a
`pcClient.initDevices()` call, where the transport is being set up in order
to enumerate local media devices. If you call `connect()` and bypass
`initDevices()`, the transport will skip this state and go directly to
`Connecting`.
Transport has been initialized and is ready to connect. This state is
reached after a successful `pcClient.initDevices()` call and skipped if
`initDevices()` is not used.
Your client has called `pcClient.startBot()` or
`pcClient.startBotAndConnect()` and is waiting for a response from your
server containing connection details for your transport (such as a session
URL and token). Note: If you provide the `TransportConnectionParams`
directly to `connect()` without calling either `startBot` methods, the
transport will skip this state and go directly to `Connecting`.
Your client has called `pcClient.startBot()` or
`pcClient.startBotAndConnect()` and has successfully received a response. If
using `startBotAndConnect()`, it will quickly move into the `Connecting`
state. Note: If you provide the `TransportConnectionParams` directly to
`connect()` without calling either `startBot` methods, the transport will
skip this state and go directly to `Connecting`.
The transport is connecting to the server.
The transport has successfully connected to the session and is awaiting a
client-ready signal (indicated audio and video tracks are ready to be sent
and received).
Transport is ready and the session can begin.
Transport is disconnecting from the session.
An error occurred during the transport lifecycle. This indicates a fatal
error and the transport should move quickly into the `Disconnected` state.
You can access the current transport state via `pcClient.state`, or by defining a callback or event:
```typescript theme={null}
// Callback
const pcClient = new PipecatClient({
transport: new DailyTransport(),
callbacks: {
onTransportStateChange: (state) => {
console.log(state);
}
//...
});
// Event
pcClient.on(RTVIEvent.TransportStateChanged, (e) => console.log(e));
// Client getter
console.log(pcClient.state); // Disconnected
```
# WebSocketTransport
Source: https://docs.pipecat.ai/client/js/transports/websocket
A lightweight transport for WebSocket based connections with Pipecat
`WebSocketTransport` enables a purely WebSocket based connection between clients and your Pipecat application. It implements bidirectional audio and video streaming using a WebSocket for real-time communication.
This transport is intended for lightweight implementations, particularly for local development and testing. It expects your Pipecat server to include the corresponding [`WebSocketTransport` server-side](/api-reference/server/services/transport/websocket-server) implementation.
The `WebSocketTransport` is best suited for server-server applications and prototyping client/server apps.
For client/server production applications, we strongly recommend using a WebRTC-based transport for robust network and media handling. For more on WebRTC vs. Websocket communication, check out [this article](https://voiceaiandvoiceagents.com/#websockets-webrtc).
## Usage
### Basic Setup
```javascript theme={null}
import { PipecatClient } from "@pipecat-ai/client-js";
import {
WebSocketTransport,
ProtobufFrameSerializer,
} from "@pipecat-ai/websocket-transport";
const pcClient = new PipecatClient({
transport: new WebSocketTransport({
serializer: new ProtobufFrameSerializer(),
recorderSampleRate: 8000,
playerSampleRate: 8000,
}),
enableCam: false, // Default camera off
enableMic: true, // Default microphone on
callbacks: {
// Event handlers
},
});
await pcClient.connect({
wsUrl: "ws://localhost:7860/ws", // Your WebSocket server URL
});
```
## API Reference
### Constructor Options
```typescript theme={null}
type WebSocketTransportOptions = {
wsUrl?: string;
serializer?: WebSocketSerializer;
recorderSampleRate?: number;
playerSampleRate?: number;
};
export interface WebSocketTransportConstructorOptions extends WebSocketTransportOptions {
mediaManager?: MediaManager;
}
```
#### Properties
URL of the WebSocket server. This is the endpoint your client will connect to
for WebSocket communication.
The serializer to use for encoding/decoding messages sent over the WebSocket
connection. The websocket-transport package provides two serializer options: -
`ProtobufFrameSerializer`: Uses Protocol Buffers for serialization. -
`TwilioSerializer`: Uses Twilio's serialization format. The main purpose of
the TwilioSerializer is to allow testing the bots built to work with Twilio
without having to make phone calls.
Sample rate for which to encode the audio input. Default is `16000`.
Sample rate for which to decode the incoming audio for output. Default is
`24000`.
The media manager to use for handling local audio and video streams. This
should not be overridden unless you have a specific reason to use a different
media manager. The default is `DailyMediaManager`, which is suitable for most
use cases. Note that the `DailyMediaManager` does not use any of Daily's
services, it simply takes advantage of vast media support provided by the
Daily library.
### TransportConnectionParams
The `WebSocketTransport` takes the same options as the constructor; `WebSocketTransportOptions`. Anything provided here will override the defaults set in the constructor. The `wsUrl` is required to establish a connection.
```typescript client theme={null}
pcClient.connect({
wsUrl: 'http://localhost:7860/ws'
});
// OR...
pcClient.startBotAndConnect({
endpoint: '/api/start', // returns { wsUrl }
});
```
```python server theme={null}
# See
# https://github.com/pipecat-ai/pipecat-examples/blob/main/websocket/server/server.py
# for a complete example of how to implement the server-side endpoint.
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
print("WebSocket connection accepted")
try:
await run_bot(websocket)
except Exception as e:
print(f"Exception in run_bot: {e}")
@app.post("/api/start")
async def start(request: Request) -> Dict[Any, Any]:
ws_url = "ws://localhost:7860/ws"
return {"wsUrl": ws_url}
```
```python bot theme={null}
# See
# https://github.com/pipecat-ai/pipecat-examples/blob/main/websocket/server/bot_websocket_server.py
# for a complete example of a bot script using the WebSocketTransport.
from pipecat.serializers.protobuf import ProtobufFrameSerializer
from pipecat.transports.websocket.fastapi import (
FastAPIWebsocketParams,
FastAPIWebsocketTransport,
)
async def run_bot(websocket_client):
ws_transport = FastAPIWebsocketTransport(
websocket=websocket_client,
params=FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
add_wav_header=False,
serializer=ProtobufFrameSerializer(),
),
)
llm = ... # Initialize your LLM here, e.g., OpenAI, HuggingFace, etc.
messages = [{ role: "system", content: "You are a helpful assistant." }]
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
ws_transport.input(),
user_aggregator,
llm, # LLM
ws_transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params,
)
...
```
### Methods
For most operations, you will not interact with the transport directly. Most methods have an equivalent in the `PipecatClient` and should be called from the `PipecatClient`. However, there is one transport-specific methods that you may need to call directly. When doing so, be sure to access your transport via the `transport` property of the `PipecatClient` instance.
If implementing your own serializer, you will need to pass the user audio
stream to the transport via this method, which takes an `ArrayBuffer` of audio
data.
```javascript theme={null}
transport.handleUserAudioStream(chunk.data);
```
## Events
The transport implements the various [`PipecatClient` event handlers](/client/js/api-reference/callbacks).
## Reconnection Handling
The WebSocketTransport does provide reconnection handling. If the WebSocket connection is lost, it will attempt to reconnect twice. If all reconnection attempts fail, the transport will gracefully disconnect.
## More Information
Basic Agent example using a WebSocket transport
Example using a WebSocket transport to simulate a Twilio connection to a bot
`WebSocketTransport`
`@pipecat-ai/websocket-transport`
# API Reference
Source: https://docs.pipecat.ai/client/react-native/api-reference
API reference for the Pipecat React Native SDK
The Pipecat React Native SDK leverages the Pipecat JavaScript SDK for seamless integration with React Native applications.
For detailed information, please reference to the [Javascript SDK docs](/client/js/api-reference/client-constructor).
**Just ensure you use the appropriate transport layer for React Native.**
# SDK Introduction
Source: https://docs.pipecat.ai/client/react-native/introduction
Build React Native applications with Pipecat's React Native client library
The Pipecat React Native SDK leverages the [Pipecat JavaScript SDK](/client/js/introduction) and its `PipecatClient` to provide seamless integration for React Native applications.
Since the JavaScript SDK is designed to work across both web and React Native platforms, the core functionalities remain the same:
* Device and media stream management
* Connecting to Pipecat bots
* Messaging with Pipecat bots and handling responses using the RTVI standard
* Managing session state and errors
The primary difference lies in the transport layer, which is tailored to support the unique requirements of the React Native environment.
For example, when using the SDK with React Native, you would install `RNDailyTransport` instead of `DailyTransport`.
## Installation
Install the SDK and a transport implementation. Follow the appropriate docs for each transport:
* [Daily](https://github.com/pipecat-ai/pipecat-client-react-native-transports/tree/main/transports/daily#installation)
* [SmallWebRTC](https://github.com/pipecat-ai/pipecat-client-react-native-transports/tree/main/transports/smallwebrtc#installation)
Installing the React Native automatically includes the corresponding version
of the JavaScript SDK.
## Requirements
This package introduces some constraints on what OS/SDK versions your project can support:
* iOS: Deployment target >= 15
* Android: `minSdkVersion` >= 24
## Quick start
Here's a simple example using Daily as the transport layer:
```tsx theme={null}
import { RNDailyTransport } from "@pipecat-ai/react-native-daily-transport";
import { PipecatClient } from "@pipecat-ai/client-js";
// Create and configure the client
let pipecatClient = new PipecatClient({
transport: new RNDailyTransport(),
enableMic: true,
enableCam: false,
});
// Connect to your bot
await pipecatClient.startBotAndConnect({
endpoint: `${process.env.PIPECAT_API_URL || "/start"}`,
});
```
### More Examples
A basic example demonstrating how to integrate an RNDailyTransport with a
React Native project.
A more comprehensive Daily example showcasing a more feature-rich React
Native application along with a server-side bot component.
A more comprehensive SmallWebRTC example showcasing a more feature-rich
React Native application along with a server-side bot component.
## Explore the SDK
The Pipecat React Native SDK leverages the Pipecat JavaScript SDK for seamless integration with React Native applications. For detailed information, refer to our JavaScript documentation.
> Just ensure you use the appropriate transport layer for React Native.
React Native-specific API documentation
Daily and SmallWebRTC transports for React Native
Configure your client instance with transport and callbacks
Core methods for interacting with your bot
# Daily WebRTC Transport
Source: https://docs.pipecat.ai/client/react-native/transports/daily
The DailyTransport class provides a WebRTC transport layer using [Daily.co's](https://daily.co) infrastructure. It wraps a React-Native-Daily-JS call client to handle audio/video device management, WebRTC connections, and real-time communication between clients and bots. For complete documentation on Daily's API, see the [Daily RN API Reference](https://docs.daily.co/reference/rn-daily-js).
This transport is designed for production use cases, leveraging Daily's global infrastructure for low-latency, high-quality audio and video streaming. It expects your Pipecat server to include the corresponding [`DailyTransport` server-side](/api-reference/server/services/transport/daily) implementation.
## Usage
### Basic Setup
```javascript theme={null}
import { RNDailyTransport } from "@pipecat-ai/react-native-daily-transport";
import { PipecatClient } from "@pipecat-ai/client-js";
// Create and configure the client
let pipecatClient = new PipecatClient({
transport: new RNDailyTransport(),
enableMic: true,
enableCam: false,
});
// Connect to your bot
await pipecatClient.startBotAndConnect({
endpoint: `${process.env.PIPECAT_API_URL || "/start"}`,
});
```
## API Reference
### Constructor Options
The `DailyTransportConstructorOptions` extends the `DailyFactoryOptions` type that is accepted by the underlying Daily instance. These options are passed directly through to the Daily constructor. See the [Daily RN API Reference](https://docs.daily.co/reference/rn-daily-js/daily-call-client/properties) for a complete list of options.
The Pipecat React Native SDK leverages the Pipecat JavaScript SDK for seamless integration with React Native applications.
For detailed information, please reference to the [Javascript SDK docs](/client/js/transports/daily).
**Just ensure you use the appropriate transport layer for React Native.**
## More Information
Simple Chatbot Demo
`RNDailyTransport`
`@pipecat-ai/react-native-daily-transport`
# SmallWebRTCTransport
Source: https://docs.pipecat.ai/client/react-native/transports/small-webrtc
A lightweight WebRTC transport for peer-to-peer connections with Pipecat
`SmallWebRTCTransport` enables peer-to-peer WebRTC connections between clients and your Pipecat application. It implements bidirectional audio and video streaming using WebRTC for real-time communication.
This transport is intended for lightweight implementations, particularly for local development and testing. It expects your Pipecat server to include the corresponding [`SmallWebRTCTransport` server-side](/api-reference/server/services/transport/small-webrtc) implementation.
## Usage
### Basic Setup
```javascript theme={null}
import { PipecatClient } from "@pipecat-ai/client-js";
import {
RNSmallWebRTCTransport,
SmallWebRTCTransportConstructorOptions,
} from '@pipecat-ai/react-native-small-webrtc-transport';
import { DailyMediaManager } from '@pipecat-ai/react-native-daily-media-manager/src';
const options: SmallWebRTCTransportConstructorOptions = {
mediaManager: new DailyMediaManager()
};
const pcClient = new PipecatClient({
transport: new RNSmallWebRTCTransport(options),
enableCam: false, // Default camera off
enableMic: true, // Default microphone on
callbacks: {
// Event handlers
},
});
const connectParams: APIRequest = {
endpoint: baseUrl + '/start'
};
await client?.startBotAndConnect(connectParams);
```
## API Reference
The Pipecat React Native SDK leverages the Pipecat JavaScript SDK for seamless integration with React Native applications.
For detailed information, please reference to the [Javascript SDK docs](/client/js/transports/small-webrtc).
**Just ensure you use the appropriate transport layer for React Native.**
## More Information
Real-time video transformation example
`@pipecat-ai/small-webrtc-transport`
# Components
Source: https://docs.pipecat.ai/client/react/components
Ready-to-use React components for Pipecat applications
The Pipecat React SDK provides several components for handling audio, video, and visualization in your application.
## PipecatClientProvider
The root component for providing Pipecat client context to your application. It also includes built-in conversation state management, so any descendant component can use the [`usePipecatConversation`](/client/react/hooks#usepipecatconversation) hook to access messages without adding a separate provider.
```jsx theme={null}
{/* Child components can use usePipecatConversation, usePipecatClient, etc. */}
```
**Props**
A singleton instance of `PipecatClient`
## PipecatClientAudio
Creates a new `