> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# MarkdownTextFilter

> Converts Markdown-formatted text to TTS-friendly plain text while preserving structure

## Overview

`MarkdownTextFilter` transforms Markdown-formatted text into plain text that's suitable for text-to-speech (TTS) systems. It intelligently removes formatting elements while preserving the content structure, including proper spacing and list formatting.

This filter is especially valuable for LLM-generated content, which often includes Markdown formatting that would sound unnatural if read aloud by a TTS system.

## Constructor

```python theme={null}
filter = MarkdownTextFilter(params=InputParams())
```

<ParamField path="params" type="InputParams">
  Configuration parameters for the filter
</ParamField>

### Input Parameters

Configure the filter behavior with these options:

<ParamField path="enable_text_filter" type="bool" default="True">
  Whether the filter is active (when False, text passes through unchanged)
</ParamField>

<ParamField path="filter_code" type="bool" default="False">
  Whether to remove code blocks from the output
</ParamField>

<ParamField path="filter_tables" type="bool" default="False">
  Whether to remove Markdown tables from the output
</ParamField>

<ParamField path="filter_repeated_sequences" type="bool" default="True">
  Whether to remove repeated sequences of 5 or more identical characters from the output
</ParamField>

## Features

The filter handles these Markdown elements:

* **Basic Formatting**: Removes `*italic*`, `**bold**`, and other formatting markers
* **Code**: Removes inline code ticks and optionally removes code blocks
* **Lists**: Preserves numbered lists while removing Markdown formatting
* **Tables**: Optionally removes Markdown tables
* **Repeated Characters**: Optionally removes sequences of 5+ repeated characters
* **Whitespace**: Carefully preserves meaningful whitespace for natural speech
* **HTML**: Removes HTML tags and converts entities to their plain text equivalents

## Usage Examples

### Basic Usage with TTS Service

```python theme={null}
from pipecat.utils.text.markdown_text_filter import MarkdownTextFilter
from pipecat.services.cartesia.tts import CartesiaTTSService

# Create the filter
md_filter = MarkdownTextFilter()

# Use with TTS service
tts = CartesiaTTSService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    voice_id="voice_id_here",
    text_filters=[md_filter]
)
```

### Custom Configuration

```python theme={null}
# Create filter that removes code blocks and tables
md_filter = MarkdownTextFilter(
    params=MarkdownTextFilter.InputParams(
        filter_code=True,
        filter_tables=True
    )
)
```

### Preserving Repeated Characters

```python theme={null}
# Keep repeated characters intact (e.g., for phone extension numbers like 22222)
md_filter = MarkdownTextFilter(
    params=MarkdownTextFilter.InputParams(
        filter_repeated_sequences=False
    )
)
```

## What Gets Removed

| Markdown Feature                   | Example                  | Result       |
| ---------------------------------- | ------------------------ | ------------ |
| Bold                               | `**important**`          | `important`  |
| Italic                             | `*emphasized*`           | `emphasized` |
| Headers                            | `## Section`             | `Section`    |
| Code (inline)                      | `` `code` ``             | `code`       |
| Code blocks (when enabled)         | ` ```python\ncode\n``` ` | ` `          |
| Tables (when enabled)              | `\|A\|B\|\n\|--\|--\|`   | ` `          |
| HTML tags                          | `<em>text</em>`          | `text`       |
| Repeated characters (when enabled) | `22222`                  | *(removed)*  |

## Notes

* Preserves sentence structure and readability
* Maintains whitespace that affects speech prosody
* Handles streaming text with partial Markdown elements
* Efficiently converts HTML entities to plain text characters
* Smart handling of code blocks and tables with state tracking
* Integrates directly with TTS services in the Pipecat pipeline