Overview

MarkdownTextFilter transforms Markdown-formatted text into plain text that’s suitable for text-to-speech (TTS) systems. It intelligently removes formatting elements while preserving the content structure, including proper spacing and list formatting.

This filter is especially valuable for LLM-generated content, which often includes Markdown formatting that would sound unnatural if read aloud by a TTS system.

Constructor

filter = MarkdownTextFilter(params=InputParams())
params
InputParams

Configuration parameters for the filter

Input Parameters

Configure the filter behavior with these options:

enable_text_filter
bool
default:"True"

Whether the filter is active (when False, text passes through unchanged)

filter_code
bool
default:"False"

Whether to remove code blocks from the output

filter_tables
bool
default:"False"

Whether to remove Markdown tables from the output

Features

The filter handles these Markdown elements:

  • Basic Formatting: Removes *italic*, **bold**, and other formatting markers
  • Code: Removes inline code ticks and optionally removes code blocks
  • Lists: Preserves numbered lists while removing Markdown formatting
  • Tables: Optionally removes Markdown tables
  • Whitespace: Carefully preserves meaningful whitespace for natural speech
  • HTML: Removes HTML tags and converts entities to their plain text equivalents

Usage Examples

Basic Usage with TTS Service

from pipecat.utils.text.markdown_text_filter import MarkdownTextFilter
from pipecat.services.cartesia.tts import CartesiaTTSService

# Create the filter
md_filter = MarkdownTextFilter()

# Use with TTS service
tts = CartesiaTTSService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    voice_id="voice_id_here",
    text_filter=md_filter
)

Custom Configuration

# Create filter that removes code blocks and tables
md_filter = MarkdownTextFilter(
    params=MarkdownTextFilter.InputParams(
        filter_code=True,
        filter_tables=True
    )
)

What Gets Removed

Markdown FeatureExampleResult
Bold**important**important
Italic*emphasized*emphasized
Headers## SectionSection
Code (inline)`code`code
Code blocks (when enabled)```python\ncode\n```
Tables (when enabled)|A|B|\n|--|--|
HTML tags<em>text</em>text
Repeated characters!!!!!!!!

Notes

  • Preserves sentence structure and readability
  • Maintains whitespace that affects speech prosody
  • Handles streaming text with partial Markdown elements
  • Efficiently converts HTML entities to plain text characters
  • Smart handling of code blocks and tables with state tracking
  • Integrates directly with TTS services in the Pipecat pipeline