OpenAI Presents the Whisper API, a New Tool for Speech-To-Text Translation and Transcription.

Coinciding with the introduction of the ChatGPT API, OpenAI has released the Whisper API, a hosted version of the Whisper speech-to-text model that was made available to the public in September.

Whisper is an automatic speech recognition system that, according to OpenAI, can perform “robust” transcription in multiple languages and translate between those languages and English at a cost of $0.006 per minute. Formats like M4A, MP3, MP4, MPEG, MPGA, WAV, and WEBM are all supported.

Numerous companies have created state-of-the-art speech recognition systems, which are at the heart of products by industry leaders like Google, Amazon, and Meta. According to Greg Brockman, president and chairman of OpenAI, Whisper’s improved recognition of unique accents, background noise, and technical jargon is the result of its training on 680,000 hours of multilingual and “multitask” data collected from the web.

As Brockman explained in a video call with TechCrunch yesterday afternoon, the release of a model was not sufficient to spur the development of an entire developer ecosystem. While the same large model is available as open source, we’ve optimised the Whisper API to the nth degree. Incredibly quick and easy to use.

In agreement with Brockman, there are many obstacles in the way of widespread use of voice transcription software in business settings. A survey conducted by Statista in 2020 found that the main barriers preventing businesses from adopting tech like tech-to-speech were concerns over accuracy, accent- or dialect-related recognition issues, and cost.

However, there are some restrictions on what you can do with Whisper, especially in terms of “next-word” prediction. OpenAI warns that Whisper’s transcriptions may contain words that weren’t actually spoken because the system was trained on a large amount of noisy data and is simultaneously trying to predict the next word in audio and transcribe the audio recording. Also, the error rate increases for speakers of languages that are under-represented in the training data, suggesting that Whisper’s performance is not consistent across languages.

Unfortunately, the last part is not novel to the field of speech recognition. Even the best systems have been plagued by bias for a long time; a Stanford study in 2020 found that Amazon, Apple, Google, IBM, and Microsoft systems made about 19% fewer errors with white users than with Black users.

Despite this, OpenAI thinks that Whisper’s transcription capabilities will be used to enhance other apps, services, products, and tools. There is already an in-app virtual speaking companion powered by the Whisper API, and it is used by the AI-driven language learning app Speak.

OpenAI, backed by Microsoft, stands to make a lot of money in the speech-to-text market if it can break into it in a big way. It has been estimated that the market could be worth $5.4 billion by 2026, up from $2.2 billion in 2021.

“Our vision is that we really want to be this universal intelligence,” Brockman explained. “We really want to, very flexibly, be able to take in whatever kind of data you have — whatever kind of task you want to accomplish — and be a force multiplier on that attention,” says the company’s CEO.

Related Articles:

Google, Microsoft and 15 other technology companies headed by Indian-origin executives

Intuity Medical gets new CEO, George Zamanakos, months after ditching IPO plans

Google, Microsoft and 15 other technology companies headed by Indian-origin executives

OpenAI Presents the Whisper API, a New Tool for Speech-To-Text Translation and Transcription.

Up next

How To Check Iphone Purchase Date

Author

Parvesh

Leave a Reply Cancel reply

Gurman Says That watchOS 10 Will Make “Significant Changes” to The Way the Apple Watch Works.

The Chinese Customs Office Has Found a Lot of Solid-State Storage Hidden in An Electric Scooter.

Why Apple’s Headset Could Revitalise the Lagging Virtual Reality Market

What’s the Best Way to Get a Pokémon GO Joystick for iOS & Android in 2023?

FCI Recruitment 2024: Eligibility, Selection Process & Apply Online

CRA’s $8,000 Tax Benefit in 2024, Eligibility and Apply Online

RRB NTPC Recruitment 2024 Notification Schedule, Apply For 35,281 Posts

Accenture Work from Home, Earn ₹45k – 55k Monthly From Data Entry Jobs in India

OpenAI Presents the Whisper API, a New Tool for Speech-To-Text Translation and Transcription.

Up next

Author

Parvesh

Leave a Reply Cancel reply

You May Also Like