top of page

OpenAI Upgrades Speech AI – More Accurate and Better Tone Control

  • Writer: Kattiya Jantas
    Kattiya Jantas
  • Mar 21
  • 2 min read

OpenAI continues to advance its speech AI technology with the launch of new models: gpt-4o-mini-tts, gpt-4o-transcribe, and gpt-4o-mini-transcribe. These models improve natural speech synthesis and enhance transcription accuracy, even in noisy environments.


OpenAI Upgrades Speech AI – More Accurate and Better Tone Control
OpenAI Upgrades Speech AI – More Accurate and Better Tone Control

More Realistic AI Speech with Enhanced Tone Control

One of the key highlights of OpenAI’s latest development is "gpt-4o-mini-tts", a Text-to-Speech (TTS) model that generates more natural and lifelike speech than ever before.

What makes it even more special is its ability to control tone and speaking style. Developers can instruct the AI to speak in various styles, such as:


  • “Sound like an eccentric scientist”

  • “Use a calm voice like a meditation teacher”

  • “Speak with a smooth, professional tone”

This means businesses can create voice chatbots capable of responding with emotions, significantly enhancing user experience.


More Accurate AI Transcription with Fewer Errors & Hallucinations

OpenAI has also introduced gpt-4o-transcribe and gpt-4o-mini-transcribe, new Speech-to-Text (STT) models that replace Whisper, the previous transcription AI.


🔹 Higher accuracy: Better recognition of different accents and speech patterns, even in noisy environments.

🔹 Reduced hallucinations: Whisper sometimes generated inaccurate transcriptions by "hallucinating" words, but the new models minimize this issue.

🔹 Supports more languages: Although some errors persist in certain languages like Tamil, Telugu, Malayalam, and Kannada.


These improvements will greatly benefit businesses relying on speech transcription, such as call centers, podcasts, and video content creators.


OpenAI Shifts Policy – No More Open-Source Transcription Models

Unlike Whisper, which was available as an open-source model, OpenAI’s new models will not be open-source. Due to their larger size and higher computational demands, OpenAI has decided against free public access.

The company stated that releasing AI models as open-source requires careful consideration, and they are evaluating the best approach for the future.


AI Gets Closer to Human-Level Performance

  • More natural speech – AI can express emotions and adjust its tone.

  • More accurate transcription – Better handling of accents and background noise.

  • Fewer AI hallucinations – Reduced transcription errors.

  • No open-source access – Available only via OpenAI’s API.


For developers and businesses looking for better speech AI and transcription solutions, this upgrade marks another significant step toward human-like AI.

What do you think AI-powered speech should be used for? Share your thoughts in the comments!


Source: TechCrunch

Commentaires


download (9).png

Success Can't Wait

 Let’s Talk Business!

408/52 Phahonyothin Place Building, 12th Floor, Bangkok, Thailand 10400

Phone : +6684-018-8850
Email : admin@creativeme.co.th

  • Facebook
  • LinkedIn
Contact us

Your content has been submitted

An error occurred. Try again later

© Copyright 2025 Creative ME Co.,Ltd All reserved

bottom of page