MarkdownTextFilter
Converts Markdown-formatted text to TTS-friendly plain text while preserving structure
Overview
MarkdownTextFilter
transforms Markdown-formatted text into plain text that’s suitable for text-to-speech (TTS) systems. It intelligently removes formatting elements while preserving the content structure, including proper spacing and list formatting.
This filter is especially valuable for LLM-generated content, which often includes Markdown formatting that would sound unnatural if read aloud by a TTS system.
Constructor
Configuration parameters for the filter
Input Parameters
Configure the filter behavior with these options:
Whether the filter is active (when False, text passes through unchanged)
Whether to remove code blocks from the output
Whether to remove Markdown tables from the output
Features
The filter handles these Markdown elements:
- Basic Formatting: Removes
*italic*
,**bold**
, and other formatting markers - Code: Removes inline code ticks and optionally removes code blocks
- Lists: Preserves numbered lists while removing Markdown formatting
- Tables: Optionally removes Markdown tables
- Whitespace: Carefully preserves meaningful whitespace for natural speech
- HTML: Removes HTML tags and converts entities to their plain text equivalents
Usage Examples
Basic Usage with TTS Service
Custom Configuration
What Gets Removed
Markdown Feature | Example | Result |
---|---|---|
Bold | **important** | important |
Italic | *emphasized* | emphasized |
Headers | ## Section | Section |
Code (inline) | `code` | code |
Code blocks (when enabled) | ```python\ncode\n``` | |
Tables (when enabled) | |A|B|\n|--|--| | |
HTML tags | <em>text</em> | text |
Repeated characters | !!!!!!! | ! |
Notes
- Preserves sentence structure and readability
- Maintains whitespace that affects speech prosody
- Handles streaming text with partial Markdown elements
- Efficiently converts HTML entities to plain text characters
- Smart handling of code blocks and tables with state tracking
- Integrates directly with TTS services in the Pipecat pipeline