OpenAI introduces three modern AI voice fashions tailor-made for real-time duties like reasoning, translation, and transcription. These instruments empower builders to construct superior voice-enabled functions.
Enhanced Voice Interplay in AI Apps
Common customers of ChatGPT already take pleasure in voice interactions alongside textual content inputs. The newest fashions broaden this functionality, focusing on developer integration to create seamless voice experiences.
OpenAI states these fashions “unlock a brand new class of voice apps for builders.” Builders leverage them for process execution, scenario explanations like journey delays, and pure conversations in native languages.
The Three New Voice Fashions
- GPT-Realtime-2: Delivers GPT-5-class reasoning for advanced requests and fluid conversations. It checks a number of sources, adapts tone to consumer enter, employs superior reasoning, and handles specialised phrases in fields like healthcare.
- GPT-Realtime-Translate: Supplies stay translation from over 70 enter languages into 13 output languages, matching the speaker’s tempo with out delays.
- GPT-Realtime-Whisper: Affords streaming speech-to-text transcription in actual time, excellent for stay captions, assembly notes, and summaries to make merchandise really feel responsive and pure.
Pricing and Availability
All fashions combine by way of OpenAI’s Realtime API. Pricing contains:
- GPT-Realtime-2: $32 per million enter tokens, $64 per million output tokens.
- GPT-Realtime-Translate: $0.034 per minute.
- GPT-Realtime-Whisper: $0.017 per minute.
Builders can take a look at them in OpenAI’s Playground. A devoted immediate permits GPT-Realtime-2 integration with Codex for agentic coding.

