Create voice agents with OpenAI audio models and Pipecat
gpt-4o
model operating in text-to-text mode has the strongest instruction following and function calling performance.gpt-4o-audio-preview
and the OpenAI Realtime API are currently beta products.gpt-4o-transcribe
, gpt-4o-mini-transcribe
OpenAISTTService
(reference docs)/v1/audio/transcriptions
(docs)gpt-4o
, gpt-4o-mini
, gpt-4o-audio-preview
OpenAILLMService
(reference docs)/v1/chat/completions
(docs)gpt-4o-realtime-preview
, gpt-4o-mini-realtime-preview
OpenAIRealtimeBetaLLMService
(reference docs)gpt-4o-mini-tts
OpenAITTSService
(reference docs)/v1/audio/speech
(docs)