Documentation Index
Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
Use this file to discover all available pages before exploring further.
UIWorker extends LLMContextWorker with the ability to see and act on whatever the user is looking at. It connects an LLM to the client GUI over the RTVI UI channel: it receives the screen as accessibility snapshots, reacts to the user’s UI events, and acts on the page by sending commands back to the client.
A UIWorker is the delegate side of a voice/UI split. A voice layer (the main pipeline’s LLM, or a separate LLMWorker) handles speech and hands screen-relevant work to the UIWorker. Because the worker auto-injects the latest screen state into its LLM context, the conversational voice LLM stays small and screen-unaware — keeping each LLM’s context focused and efficient.
PipelineWorker connects a UIWorker to the client automatically when RTVI is enabled (the default) — no extra wiring. A working subclass needs only an LLM and a @tool that calls respond_to_job().
The client streams the screen as
ui-snapshot messages and the worker drives
it with ui-command / ui-job-group messages. See The RTVI
Standard for the wire protocol and
UIWorker patterns for the delegation and
parallel-handling patterns end to end.Configuration
Inheritsname, llm, active, bridged, and defer_tool_frames from LLMWorker, plus:
Optional pre-built
LLMContext. Seeded messages are part of the mutable
history and are cleared on each keep_history=False reset; put durable
instructions in the LLM’s system_instruction instead.Optional assistant-aggregator parameters, e.g. to enable context summarization
for
keep_history=True workers.When
True (the default), append each UI event to the context as a
<ui_event> developer message. Override render_ui_event() to change the
content, or set False to disable.When
True (the default), append the latest <ui_state> snapshot to the
context before every inference (via the LLM’s on_before_process_frame hook).
Set False to inject manually with inject_ui_state().When
False (the default), the context is cleared at the start of every job,
so each turn sees only the current <ui_state> and query — best for the
stateless-delegate role. When True, history accumulates across jobs so the
LLM can resolve multi-turn references (“the next one”, “the Pro version”), at
the cost of more tokens. Pair with context summarization to prune history.Wire-format guide appended to the LLM’s
system_instruction so it can parse
the <ui_state> / <ui_event> messages. Defaults to UI_STATE_PROMPT_GUIDE;
pass a string to override or None to disable. Living in system_instruction,
it survives context resets.Properties
Inherits all properties fromLLMContextWorker (including context, user_aggregator, assistant_aggregator, llm).
current_job
None when idle. Set when a respond turn starts and cleared when the job completes. Lets @tool methods inspect the in-flight job without threading the message through every call.
UI commands
These helpers send commands to the client. They are plain methods, not LLM tools: compose them inside a custom@tool body, or use ReplyToolMixin for the standard shape. Each is a convenience wrapper around send_command with a typed payload model from pipecat.processors.frameworks.rtvi.models.
send_command
BusUICommandMessage; when RTVI is enabled, PipelineWorker translates it into an RTVIUICommandFrame on the pipeline. Client-side handlers subscribed to RTVIEvent.UICommand (or React’s useUICommandHandler) dispatch on the command name.
| Parameter | Type | Default | Description |
|---|---|---|---|
name | str | App-defined command name (e.g. "toast", "navigate", or any app-specific name). | |
payload | Any | None | A pydantic BaseModel or dataclass (converted to a dict), a dict (forwarded as-is), or None (forwarded as {}). |
scroll_to
ref is a snapshot ref (e.g. "e42") from the latest <ui_state>.
highlight
select_text
start_offset..end_offset character sub-range when both are given.
click
disabled targets.
set_input_value
replace=True (the default) the field is overwritten; with replace=False the value is appended.
Responding to jobs
AUIWorker answers via a built-in single-flight respond job. When a requester dispatches self.job("ui", name="respond", payload={"query": "..."}), the worker clears its context (unless keep_history=True), injects the current <ui_state>, appends the query as a user message, and runs one LLM turn. A @tool ends the turn by calling respond_to_job().
respond_to_job
- default — the job responds with
{"answer": answer}for the requester’s voice LLM to phrase. tts_speak=True—answeris spoken verbatim by the requester’s TTS (and added to its context) while the job respondsNoneso the voice LLM doesn’t also speak.
answer completes the turn silently — useful for the parallel-handling pattern where a separate voice layer owns speech. No-op when no job is in flight.
| Parameter | Type | Default | Description |
|---|---|---|---|
answer | str | None | None | The worker’s answer — spoken verbatim or handed to the voice LLM. |
tts_speak | bool | False | Speak answer verbatim via the requester’s TTS instead of returning it. |
status | JobStatus | JobStatus.COMPLETED | Completion status. |
render_query
payload["query"]. Override to read a different payload shape; the returned string is appended to the context as a user message before the LLM runs.
render_ui_state
<ui_state> block (Playwright-MCP-style indented text with stable element refs). When the snapshot carries a text selection, a nested <selection ref="...">...</selection> block is appended so the LLM can resolve deictic references. Returns an empty string if no snapshot has been received. Override to customize the rendered form.
inject_ui_state
<ui_state> block to the LLM context manually. No-op when no snapshot has been received. Use this when auto_inject_ui_state=False.
render_ui_event
<ui_event name="..."> tag with a JSON-encoded payload. Override to customize the injected content.
Job groups
AUIWorker can fan work out to peer workers and surface the work to the client as a cancellable progress card. These are distinct from the inherited worker-to-worker job_group (which is invisible to the client).
ui_job_group
ui-job-group envelopes (group_started → job_update* → job_completed × N → group_completed). Use as an async with context manager to consume worker events inline.
| Parameter | Type | Default | Description |
|---|---|---|---|
*worker_names | str | Names of the workers to send the job to. | |
name | str | None | None | Optional job name for routing to named @job handlers. |
payload | dict | None | None | Optional structured data describing the work. |
timeout | float | None | None | Optional timeout (seconds) covering both the ready-wait and execution. |
cancel_on_error | bool | True | Whether to cancel the group if a worker errors. |
label | str | None | None | Human-readable label the client uses to title the in-flight card. |
cancellable | bool | True | Whether the client may cancel the group via ui-cancel-job-group. |
start_ui_job_group
ui_job_group with the same parameters. Dispatches the group in the background and returns the job_id immediately (the lifecycle still forwards to the client). Use it when a @tool wants to kick off work and unblock the voice worker.
Handling UI events
@ui_event
PipecatClient.sendUIEvent(event, payload), the matching handler runs in its own task. The handler receives the BusUIEventMessage (read message.payload for the event data).
Two handlers can’t share the same event name on the same subclass. Overrides
in subclasses take precedence over base-class definitions.
ReplyToolMixin
ReplyToolMixin exposes a single bundled reply tool covering the full standard action set, for subclasses that don’t need a custom tool schema. Compose it ahead of UIWorker:
answer (enforced by the schema, so the model can’t omit the terminator) plus optional visual and state-changing actions. It’s called exactly once per turn:
| Field | Type | Description |
|---|---|---|
answer | str (required) | The spoken reply in plain language. |
scroll_to | str | None | Snapshot ref to scroll into view. |
highlight | list[str] | None | Snapshot refs to flash briefly. |
select_text | str | None | Snapshot ref to place the page’s text selection on. |
fills | list[dict] | None | {"ref", "value"} objects to write into inputs. |
click | list[str] | None | Snapshot refs to click in order. |
scroll_to → highlight → select_text → fills → click → speak the answer. The answer is delivered as verbatim TTS (respond_to_job(answer, tts_speak=True)). Apps that want a minimal schema, app-specific commands, or the requester’s voice LLM to phrase the reply should write their own @tool reply on the UIWorker subclass instead.