Overview
HeyGenVideoService
integrates with HeyGen to create interactive AI-powered video avatars that respond naturally in real-time conversations. The service handles bidirectional audio/video streaming, avatar animations, voice activity detection, and conversation interruptions to deliver engaging conversational AI experiences with lifelike visual presence.
API Reference
Complete API documentation and method details
HeyGen Docs
Official HeyGen API documentation and guides
Example Code
Working example with interactive avatar
Installation
To use HeyGen services, install the required dependency:HEYGEN_API_KEY
- Your HeyGen API key
Sign up for a HeyGen account at HeyGen Platform to
get your API key and access interactive avatars.
Frames
Input
TTSAudioRawFrame
- Text-to-speech audio for avatar to speakUserStartedSpeakingFrame
- Triggers avatar listening animationUserStoppedSpeakingFrame
- Stops avatar listening stateEndFrame
- Signals end of conversation
Output
OutputImageRawFrame
- Generated avatar video framesOutputAudioRawFrame
- Avatar’s synchronized audio outputUserStartedSpeakingFrame
- Forwarded user speech eventsUserStoppedSpeakingFrame
- Forwarded user speech events
Service Features
- Interactive Avatars: Real-time conversational avatars with natural expressions
- Voice Activity Detection: Intelligent listening animations and interruption handling
- Real-time Streaming: Low-latency bidirectional audio/video communication
- Natural Conversations: Smooth interruption handling for fluid interactions
- Avatar Animations: Contextual animations based on conversation state
Usage Example
Avatar Configuration
Avatar Selection
HeyGen provides various pre-built avatars or you can use custom avatars:Default Configuration
If no session request is provided, the service uses theShawn_Therapist_public
avatar by default.
Integration Patterns
With Daily Transport
HeyGen works seamlessly with Daily for video conferencing applications:With WebRTC Transport
For peer-to-peer video communication:Additional Notes
- Real-time Optimization: Designed for low-latency conversational interactions
- Network Requirements: Video streaming requires sufficient bandwidth for quality delivery
- Processing Requirements: Ensure adequate server resources for real-time video processing
- Session Management: Automatically handles avatar lifecycle and conversation state
- Audio Synchronization: Maintains perfect lip-sync with generated speech
- Error Handling: Robust error recovery for uninterrupted conversations