Custom Voices per Agent

Overview

In a typical voice setup, the main agent owns the TTS service and all agents share the same voice. But you can move TTS into each agent’s pipeline so that each agent speaks with a distinct voice. This makes handoffs more natural — the user hears a different voice when they’re transferred to a different agent.

Architecture

The key change is in where TTS lives: Default (shared TTS):

Main agent:  transport.in → STT → context_agg → BusBridge → TTS → transport.out
LLM agent:   bridge_in → LLM → bridge_out     (text frames flow through bus)

Per-agent TTS:

Main agent:  transport.in → STT → context_agg → BusBridge → transport.out  (no TTS!)
LLM agent:   bridge_in → LLM → TTS → bridge_out    (audio frames flow through bus)

With per-agent TTS, text-to-speech happens inside each agent’s pipeline. The bus carries audio frames instead of text frames.

Per-agent TTS requires all agents to run in the same process (local bus). Streaming audio frames across a network bus adds too much latency for real-time conversations. For distributed setups, remote agents can achieve distinct voices by joining the same audio session directly through their own transport instead.

Implementation

LLM agent with TTS

Pass a custom pipeline= to the LLMWorker constructor that adds a TTS service after the LLM:

import os

from pipecat.pipeline.pipeline import Pipeline
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.workers.llm import LLMWorker


class AgentWithVoice(LLMWorker):
    def __init__(self, name: str, *, llm: OpenAILLMService, voice_id: str):
        tts = CartesiaTTSService(
            api_key=os.environ["CARTESIA_API_KEY"],
            settings=CartesiaTTSService.Settings(voice=voice_id),
        )
        super().__init__(
            name,
            llm=llm,
            pipeline=Pipeline([llm, tts]),
            bridged=(),
        )

The pipeline Pipeline([llm, tts]) runs the LLM and then speaks its output locally. Because bridged=() is set, the resulting audio is shipped to the main agent over the bus and played by its transport. You never pass bus= to the constructor — the worker gets its bus when registered with the runner.

Agents with different voices

Build agents with different voice IDs:

def build_greeter() -> AgentWithVoice:
    llm = OpenAILLMService(
        api_key=os.environ["OPENAI_API_KEY"],
        settings=OpenAILLMService.Settings(system_instruction="You are a greeter..."),
    )
    return AgentWithVoice(
        "greeter",
        llm=llm,
        voice_id="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",  # Jacqueline
    )


def build_support() -> AgentWithVoice:
    llm = OpenAILLMService(
        api_key=os.environ["OPENAI_API_KEY"],
        settings=OpenAILLMService.Settings(system_instruction="You are support..."),
    )
    return AgentWithVoice(
        "support",
        llm=llm,
        voice_id="a167e0f3-df7e-4d52-a9c3-f949145efdab",  # Blake
    )

Main agent without TTS

Build the main transport agent’s pipeline without TTS, since each LLM agent handles its own. Audio comes back from the children over the bus and is played by the main agent’s transport:

from pipecat.bus import BusBridgeProcessor
from pipecat.pipeline.worker import PipelineParams, PipelineWorker

MAIN_NAME = "acme"

stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])

context = LLMContext()
aggregators = LLMContextAggregatorPair(
    context,
    user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)

bridge = BusBridgeProcessor(
    bus=runner.bus,
    worker_name=MAIN_NAME,
    name=f"{MAIN_NAME}::BusBridge",
)

pipeline = Pipeline(
    [
        transport.input(),
        stt,
        aggregators.user(),
        bridge,
        # No TTS here -- agents handle their own
        transport.output(),
        aggregators.assistant(),
    ]
)

main = PipelineWorker(
    pipeline,
    name=MAIN_NAME,
    params=PipelineParams(enable_metrics=True, enable_usage_metrics=True),
)

await runner.add_workers(build_greeter(), build_support(), main)

Announcing transfers

With per-agent TTS, agents can announce a transfer in their own voice before handing off. Pass messages= to activate_worker to speak before the transfer, and pass args= with the messages the next agent should receive:

from pipecat.services.llm_service import FunctionCallParams
from pipecat.workers.llm import LLMWorkerActivationArgs, tool


@tool(cancel_on_interruption=False)
async def transfer_to_agent(self, params: FunctionCallParams, agent: str, reason: str):
    """Transfer the user to another agent.

    Args:
        agent (str): The agent to transfer to.
        reason (str): Why the user is being transferred.
    """
    await self.activate_worker(
        agent,
        messages=[
            {"role": "developer", "content": f"Tell the user about the transfer ({reason})."}
        ],
        args=LLMWorkerActivationArgs(messages=[{"role": "developer", "content": reason}]),
        deactivate_self=True,
        result_callback=params.result_callback,
    )

The user hears “Let me connect you with our support team” in the greeter’s voice, then the support agent starts speaking in a different voice.

Documentation Index

​Overview

​Architecture

​Implementation

​LLM agent with TTS

​Agents with different voices

​Main agent without TTS

​Announcing transfers

Overview

Architecture

Implementation

LLM agent with TTS

Agents with different voices

Main agent without TTS

Announcing transfers