Documentation Index
Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
In a typical voice setup, the main agent owns the TTS service and all agents share the same voice. But you can move TTS into each agent’s pipeline so that each agent speaks with a distinct voice. This makes handoffs more natural — the user hears a different voice when they’re transferred to a different agent.
Architecture
The key change is in where TTS lives:
Default (shared TTS):
Main agent: transport.in → STT → context_agg → BusBridge → TTS → transport.out
LLM agent: bridge_in → LLM → bridge_out (text frames flow through bus)
Per-agent TTS:
Main agent: transport.in → STT → context_agg → BusBridge → transport.out (no TTS!)
LLM agent: bridge_in → LLM → TTS → bridge_out (audio frames flow through bus)
With per-agent TTS, text-to-speech happens inside each agent’s pipeline. The bus carries audio frames instead of text frames.
Per-agent TTS requires all agents to run in the same process (local bus). Streaming audio frames across a network bus adds too much latency for real-time conversations. For distributed setups, remote agents can achieve distinct voices by joining the same audio session directly through their own transport instead.
Implementation
LLM agent with TTS
Pass a custom pipeline= to the LLMWorker constructor that adds a TTS service after the LLM:
import os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.workers.llm import LLMWorker
class AgentWithVoice(LLMWorker):
def __init__(self, name: str, *, llm: OpenAILLMService, voice_id: str):
tts = CartesiaTTSService(
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(voice=voice_id),
)
super().__init__(
name,
llm=llm,
pipeline=Pipeline([llm, tts]),
bridged=(),
)
The pipeline Pipeline([llm, tts]) runs the LLM and then speaks its output locally. Because bridged=() is set, the resulting audio is shipped to the main agent over the bus and played by its transport. You never pass bus= to the constructor — the worker gets its bus when registered with the runner.
Agents with different voices
Build agents with different voice IDs:
def build_greeter() -> AgentWithVoice:
llm = OpenAILLMService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(system_instruction="You are a greeter..."),
)
return AgentWithVoice(
"greeter",
llm=llm,
voice_id="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc", # Jacqueline
)
def build_support() -> AgentWithVoice:
llm = OpenAILLMService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(system_instruction="You are support..."),
)
return AgentWithVoice(
"support",
llm=llm,
voice_id="a167e0f3-df7e-4d52-a9c3-f949145efdab", # Blake
)
Main agent without TTS
Build the main transport agent’s pipeline without TTS, since each LLM agent handles its own. Audio comes back from the children over the bus and is played by the main agent’s transport:
from pipecat.bus import BusBridgeProcessor
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
MAIN_NAME = "acme"
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
context = LLMContext()
aggregators = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
bridge = BusBridgeProcessor(
bus=runner.bus,
worker_name=MAIN_NAME,
name=f"{MAIN_NAME}::BusBridge",
)
pipeline = Pipeline(
[
transport.input(),
stt,
aggregators.user(),
bridge,
# No TTS here -- agents handle their own
transport.output(),
aggregators.assistant(),
]
)
main = PipelineWorker(
pipeline,
name=MAIN_NAME,
params=PipelineParams(enable_metrics=True, enable_usage_metrics=True),
)
await runner.add_workers(build_greeter(), build_support(), main)
Announcing transfers
With per-agent TTS, agents can announce a transfer in their own voice before handing off. Pass messages= to activate_worker to speak before the transfer, and pass args= with the messages the next agent should receive:
from pipecat.services.llm_service import FunctionCallParams
from pipecat.workers.llm import LLMWorkerActivationArgs, tool
@tool(cancel_on_interruption=False)
async def transfer_to_agent(self, params: FunctionCallParams, agent: str, reason: str):
"""Transfer the user to another agent.
Args:
agent (str): The agent to transfer to.
reason (str): Why the user is being transferred.
"""
await self.activate_worker(
agent,
messages=[
{"role": "developer", "content": f"Tell the user about the transfer ({reason})."}
],
args=LLMWorkerActivationArgs(messages=[{"role": "developer", "content": reason}]),
deactivate_self=True,
result_callback=params.result_callback,
)
The user hears “Let me connect you with our support team” in the greeter’s voice, then the support agent starts speaking in a different voice.