Pipeline Lifecycle
StartFrame
The first frame pushed into a pipeline, initializing all processors. Every processor receives this before any DataFrames or ControlFrames arrive.Input audio sample rate in Hz.
Output audio sample rate in Hz.
Whether user interruptions are allowed. Deprecated since 0.0.99: use
interruption strategies instead.
Enable performance metrics collection from processors.
Enable tracing for pipeline execution.
Enable usage metrics (token counts, API calls) from services.
List of interruption strategies for the pipeline. Deprecated since 0.0.99.
When
True, only report time-to-first-byte for the initial response rather
than every response.Optional tracing context for distributed tracing integration.
CancelFrame
Stops the pipeline immediately, skipping any queued non-SystemFrames. Use this when you need to abort without waiting for pending work to drain. For example, when the user has left the session.Optional reason for the cancellation.
Errors
ErrorFrame
Carries an error notification, typically pushed upstream so earlier processors can react.Human-readable error message.
Whether this error is fatal and requires the bot to shut down.
The processor that raised the error.
The underlying exception, if one was caught.
FatalErrorFrame
An unrecoverable error requiring the bot to shut down. Thefatal field is always True.
Inherits from ErrorFrame.
Processor Pause/Resume (Urgent)
These are theSystemFrame variants of FrameProcessorPauseFrame and FrameProcessorResumeFrame. As SystemFrames, they flow through the high-priority input queue rather than the process queue, so they are not blocked by paused state or buffered frames. This makes FrameProcessorResumeUrgentFrame the correct way to resume a processor externally — the ControlFrame variant (FrameProcessorResumeFrame) would get stuck behind any DataFrames that queued up during the pause. See Control Frames for the full explanation.
FrameProcessorPauseUrgentFrame
Pauses a processor immediately, without waiting for queued frames to drain first.The processor to pause.
FrameProcessorResumeUrgentFrame
Resumes a paused processor immediately, releasing buffered frames. Use this instead ofFrameProcessorResumeFrame when the processor may have frames queued up.
The processor to resume.
Interruptions
InterruptionFrame
Interrupts the pipeline, discarding pending DataFrames and ControlFrames. Typically triggered when the user starts speaking during a bot response.User Speaking State
UserStartedSpeakingFrame
Indicates that a user turn has begun. By this point, transcriptions are usually already flowing through the pipeline.Whether this event was emulated rather than detected by VAD. Deprecated since
0.0.99.
UserStoppedSpeakingFrame
Marks the end of a user turn. The bot’s response is triggered separately by the turn detection system.Whether this event was emulated rather than detected by VAD. Deprecated since
0.0.99.
UserSpeakingFrame
Emitted by the VAD processor while the user is actively speaking. Useful for UI feedback or suppressing idle timeouts.UserMuteStartedFrame
Broadcast when one or more user mute strategies activate. User mute temporarily suppresses user input while the bot is speaking to prevent interruptions. While muted, theLLMUserAggregator drops incoming user frames (InputAudioRawFrame, TranscriptionFrame, InterimTranscriptionFrame, UserStartedSpeakingFrame, UserStoppedSpeakingFrame, VAD signals, and InterruptionFrame). Lifecycle frames (StartFrame, EndFrame, CancelFrame) are never muted.
UserMuteStoppedFrame
Broadcast when all active user mute strategies deactivate, allowing user input to be processed again.VAD Events
These frames are emitted directly by the Voice Activity Detection (VAD) processor and carry timing metadata. Higher-level speaking-state frames (UserStartedSpeakingFrame, UserStoppedSpeakingFrame) are derived from these.
VADUserStartedSpeakingFrame
VAD confirmed that speech has started.Timestamp in seconds when speech onset was detected.
Wall-clock time when the frame was created.
VADUserStoppedSpeakingFrame
VAD confirmed that speech has ended.Timestamp in seconds when speech ended.
Wall-clock time when the frame was created.
SpeechControlParamsFrame
Notifies processors that VAD or turn detection parameters have changed at runtime.Updated VAD parameters.
Updated turn detection parameters.
Bot Speaking State
BotStartedSpeakingFrame
Emitted by the output transport when the bot begins speaking. Broadcast in both directions so processors on either side of the transport can react.BotStoppedSpeakingFrame
Emitted by the output transport when the bot finishes speaking. Also broadcast in both directions.BotSpeakingFrame
Emitted continuously while the bot is speaking. Processors can use this to suppress idle timeouts or drive visual indicators.Connection Status
BotConnectedFrame
The bot has joined the transport room. Only relevant for SFU-based transports: Daily, LiveKit, HeyGen, and Tavus.ClientConnectedFrame
A client or participant has connected to the transport.Input Frames
Input frames carry raw data from transport sources into the pipeline. AsSystemFrames, they are never discarded during interruptions. Incoming user data must always be processed.
InputAudioRawFrame
Raw audio received from the transport. Inherits theaudio, sample_rate, num_channels, and num_frames fields from the AudioRawFrame mixin.
Inherits from AudioRawFrame.
UserAudioRawFrame
Audio from a specific user in a multi-participant session. Inherits fromInputAudioRawFrame.
Identifier for the user who produced this audio.
InputImageRawFrame
Raw image received from the transport. Inheritsimage, size, and format from the ImageRawFrame mixin.
Inherits from ImageRawFrame.
UserImageRawFrame
An image from a specific user, optionally tied to a pending image request. Inherits fromInputImageRawFrame.
Identifier for the user who produced this image.
Optional text associated with the image.
Whether to append this image to the LLM context.
The original request frame that triggered this image capture.
InputTextRawFrame
Text received from the transport, such as a user typing in a chat interface. Inherits thetext field from TextFrame.
Inherits from TextFrame.
DTMF Input
InputDTMFFrame
A DTMF keypress received from the transport. Inherits thebutton field from the DTMFFrame mixin.
Inherits from DTMFFrame.
OutputDTMFUrgentFrame
A DTMF keypress for immediate output, bypassing the normal frame queue. Inherits fromDTMFFrame.
Transport Messages
InputTransportMessageFrame
A message received from an external transport. The message format is transport-specific.The transport message payload.
OutputTransportMessageUrgentFrame
An outbound transport message that bypasses the normal queue for immediate delivery.The transport message payload.
Function Calling
FunctionCallsStartedFrame
Signals that one or more function calls are about to begin executing.Sequence of function calls that will be executed.
FunctionCallCancelFrame
Signals that a function call was cancelled, typically due to user interruption when the function’scancel_on_interruption flag is set.
Name of the function that was cancelled.
Unique identifier for the cancelled function call.
User Interaction
UserImageRequestFrame
Requests an image from a specific user, typically to capture a camera frame for vision processing.Identifier for the user to capture from.
Optional text prompt associated with the image request.
Whether to append the resulting image to the LLM context.
Specific video source to capture from.
Function name if this request originated from a tool call.
Tool call identifier if this request originated from a tool call.
Callback to invoke with the captured image result.
STTMuteFrame
Mutes or unmutes the STT service. While muted, incoming audio is not sent to the STT provider.True to mute, False to unmute.UserIdleTimeoutUpdateFrame
Updates the user idle timeout at runtime. Set to0 to disable idle detection entirely.
New idle timeout in seconds.
0 disables detection.Diagnostics
MetricsFrame
Performance metrics collected from processors. Emitted when metrics reporting is enabled viaStartFrame.
List of metrics data entries.
Service Metadata
ServiceMetadataFrame
Base metadata frame broadcast by services at startup, providing information about service capabilities and configuration.Name of the service that emitted this metadata.
STTMetadataFrame
Metadata from an STT service, including latency characteristics used for turn detection tuning. Inherits fromServiceMetadataFrame.
P99 latency in seconds for time-to-final-segment. Used by turn detectors to
calibrate wait times.
RTVI
Frames for the Real-Time Voice Interface (RTVI) protocol, which bridges clients and the pipeline. These frames handle custom messaging between the client and server.RTVIServerMessageFrame
Sends a server message to the connected client.The message data to send to the client.
RTVIClientMessageFrame
A message received from the client, expecting a server response viaRTVIServerResponseFrame.
Unique identifier for the client message.
The message type.
Optional message data from the client.
RTVIServerResponseFrame
Responds to anRTVIClientMessageFrame. Include the original client message frame to ensure the response is properly correlated. Set the error field to respond with an error instead of a normal response.
The original client message this response is for.
Response data to send to the client.
Error message. When set, the client receives an
error-response instead of a
server-response.Task Frames
Task frames provide a system-priority mechanism for requesting pipeline actions from outside the normal frame flow. They are converted into their corresponding standard frames when processed.TaskSystemFrame
Base class for system-priority task frames.CancelTaskFrame
Requests immediate pipeline cancellation. Converted to aCancelFrame when processed by the pipeline.
Inherits from TaskSystemFrame.
Optional reason for the cancellation request.
InterruptionTaskFrame
Requests a pipeline interruption. Converted to anInterruptionFrame when processed.
Inherits from TaskSystemFrame.