> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# eval

> Run behavioral evals against a Pipecat agent, individually or as a suite

Run scenario-based behavioral evals. `pipecat eval run` tests scenarios against an already-running agent; `pipecat eval suite` spawns the agents listed in a manifest and runs their scenarios concurrently. Both exit `0` when everything passes and `1` otherwise.

If `pipecat-ai[cli]` is a dependency of your project, run these commands with `uv run pipecat eval`. They're also available as `python -m pipecat.evals`.

See the [Pipecat Evals guide](/pipecat/evals/overview) for concepts, the scenario format, and manifests.

## eval run

Run one or more scenarios against an already-running agent (started with `-t eval`).

**Usage:**

```shell theme={null}
pipecat eval run [OPTIONS] SCENARIOS...
```

**Arguments:**

<ParamField path="SCENARIOS..." type="path" required>
  One or more scenario YAML files.
</ParamField>

**Options:**

<ParamField path="--bot-url" type="string" default="ws://localhost:7860">
  WebSocket URL of the agent's eval transport.
</ParamField>

<ParamField path="--verbose / -v" type="flag">
  Print a line for each turn and expectation as it resolves.
</ParamField>

<ParamField path="--audio / -a" type="flag">
  Record each scenario's conversation audio (audio-mode scenarios).
</ParamField>

<ParamField path="--record-dir" type="string" default="recordings">
  Directory for `--audio` recordings: `<record-dir>/<scenario>.wav`.
</ParamField>

<ParamField path="--cache-dir" type="string">
  Directory for cached synthesized user audio. Defaults to
  `<user-cache-dir>/pipecat/tts`.
</ParamField>

<ParamField path="--no-cache" type="flag">
  Disable the user-audio cache: re-synthesize every turn (no reads or writes).
</ParamField>

<ParamField path="--timeout / -t" type="integer" default="60">
  Default per-expectation timeout in seconds, for expectations without their own
  `within_ms`.
</ParamField>

<ParamField path="--logs-dir" type="string" default=".">
  Directory for each scenario's logs: `<logs-dir>/<scenario>.eval.log` (plus
  `.debug.log` under `--debug`).
</ParamField>

<ParamField path="--debug / -d" type="flag">
  Also save `<scenario>.debug.log` with the harness's full per-pipeline logs.
</ParamField>

<ParamField path="--stop-bot" type="flag">
  Cancel the agent's pipeline (exit it) after the run. By default the agent is
  left running so it can serve more scenarios.
</ParamField>

<ParamField path="--trigger-disconnect" type="flag">
  Fire the bot's `on_client_disconnect` callback when the eval client
  disconnects. Bots often cancel their pipeline there, so it's off by default. A
  scenario's `trigger_disconnect:` field opts in on its own.
</ParamField>

## eval suite

Spawn the agents in a manifest and run their scenarios concurrently. Everything except the `suite:` list can be set in the manifest or overridden on the command line (the command line wins).

**Usage:**

```shell theme={null}
pipecat eval suite [OPTIONS] MANIFEST_PATH
```

**Arguments:**

<ParamField path="MANIFEST_PATH" type="path" required>
  Manifest YAML listing agents and their scenarios.
</ParamField>

**Options:**

<ParamField path="--pattern / -p" type="string">
  Only run bots whose path contains this substring.
</ParamField>

<ParamField path="--scenario / -s" type="string">
  Only run this scenario name.
</ParamField>

<ParamField path="--name / -n" type="string">
  Run subdirectory name under `runs_dir`. Defaults to a timestamp.
</ParamField>

<ParamField path="--runs-dir" type="path">
  Output base, overriding the manifest's `runs_dir`. A `<name>/` subdirectory
  with `logs/` and `recordings/` is created under it. Defaults to `eval-runs`.
</ParamField>

<ParamField path="--bots-dir" type="path">
  Override the manifest's `bots_dir` (bot paths are relative to it).
</ParamField>

<ParamField path="--scenarios-dir" type="path">
  Override the manifest's `scenarios_dir`.
</ParamField>

<ParamField path="--concurrency / -c" type="integer">
  Override the manifest's `concurrency` (how many runs execute at once).
</ParamField>

<ParamField path="--base-port" type="integer">
  Override the manifest's `base_port` (default `7900`). Each run gets \`base\_port

  * index\`.
</ParamField>

<ParamField path="--cache-dir" type="string">
  Override the manifest's `cache_dir` for cached synthesized user audio.
</ParamField>

<ParamField path="--no-cache" type="flag">
  Disable the user-audio cache: re-synthesize every turn (no reads or writes).
</ParamField>

<ParamField path="--timeout / -t" type="integer" default="60">
  Default per-expectation timeout in seconds, for expectations without their own
  `within_ms`.
</ParamField>

<ParamField path="--spawn" type="string">
  Override the manifest's spawn template. Default: `"{python} {bot} -t eval   --port {port}"`.
</ParamField>

<ParamField path="--python" type="string">
  Override the Python interpreter used to spawn each agent.
</ParamField>

<ParamField path="--audio / -a" type="flag">
  Record conversation audio.
</ParamField>

<ParamField path="--debug / -d" type="flag">
  Also save `<run>.debug.log` with the harness's full per-pipeline logs.
</ParamField>

## Examples

```shell theme={null}
# Run one scenario against a running agent
pipecat eval run scenarios/capital_question.yaml

# Run a batch of scenarios, verbosely
pipecat eval run scenarios/*.yaml -v

# Run a full suite
pipecat eval suite manifest.yaml

# Only the support agent, 8 runs at a time, named output dir
pipecat eval suite manifest.yaml -p support -c 8 -n nightly
```
