OpenAI Upgrades Speech AI – More Accurate and Better Tone Control
- Kattiya Jantas
- Mar 21
- 2 min read
OpenAI continues to advance its speech AI technology with the launch of new models: gpt-4o-mini-tts, gpt-4o-transcribe, and gpt-4o-mini-transcribe. These models improve natural speech synthesis and enhance transcription accuracy, even in noisy environments.

More Realistic AI Speech with Enhanced Tone Control
One of the key highlights of OpenAI’s latest development is "gpt-4o-mini-tts", a Text-to-Speech (TTS) model that generates more natural and lifelike speech than ever before.
What makes it even more special is its ability to control tone and speaking style. Developers can instruct the AI to speak in various styles, such as:
“Sound like an eccentric scientist”
“Use a calm voice like a meditation teacher”
“Speak with a smooth, professional tone”
This means businesses can create voice chatbots capable of responding with emotions, significantly enhancing user experience.
More Accurate AI Transcription with Fewer Errors & Hallucinations
OpenAI has also introduced gpt-4o-transcribe and gpt-4o-mini-transcribe, new Speech-to-Text (STT) models that replace Whisper, the previous transcription AI.
🔹 Higher accuracy: Better recognition of different accents and speech patterns, even in noisy environments.
🔹 Reduced hallucinations: Whisper sometimes generated inaccurate transcriptions by "hallucinating" words, but the new models minimize this issue.
🔹 Supports more languages: Although some errors persist in certain languages like Tamil, Telugu, Malayalam, and Kannada.
These improvements will greatly benefit businesses relying on speech transcription, such as call centers, podcasts, and video content creators.
OpenAI Shifts Policy – No More Open-Source Transcription Models
Unlike Whisper, which was available as an open-source model, OpenAI’s new models will not be open-source. Due to their larger size and higher computational demands, OpenAI has decided against free public access.
The company stated that releasing AI models as open-source requires careful consideration, and they are evaluating the best approach for the future.
AI Gets Closer to Human-Level Performance
More natural speech – AI can express emotions and adjust its tone.
More accurate transcription – Better handling of accents and background noise.
Fewer AI hallucinations – Reduced transcription errors.
No open-source access – Available only via OpenAI’s API.
For developers and businesses looking for better speech AI and transcription solutions, this upgrade marks another significant step toward human-like AI.
What do you think AI-powered speech should be used for? Share your thoughts in the comments!
Source: TechCrunch
Commentaires