(2024-10-29) Bjarnason The Risks Of OpenAI's Whisper Audio Transcription Model

Baldur Bjarnason: The risks of OpenAI's Whisper audio transcription model. This weekend a story from ABC News on issues with audio transcription machine learning models did the rounds. “Researchers say an AI-powered transcription tool used in hospitals invents things no one ever said”. The report highlights a number of very serious and real issues, but in the process glosses over a few details that might be important. ((2024-10-25) Researchers Say An AI-powered Transcription Tool Used In Hospitals Invents Things No One Ever Said)

there are two core conclusions to take away from it.

The Nabla service itself seems flawed

First, the Nabla service, an audio transcription and summarisation service that targets medical professionals, is specifically using Whisper in a context that OpenAI itself recommends against....OpenAI’s Whisper model also seems flawed..

seems to show a 1-2% hallucination rate depending on speech types.

In the study each audio segment represents, roughly, a sentence. This would mean, according to the study’s results, that about 1 or 2 of every 100 sentences would seem to contain a fabrication

1% is very easy to miss, especially because these models tend towards plausible fabrications, but it could be catastrophic at scale depending on the industry in question.

What’s worrying is the analysis seems to show that a good chunk of the fabrications, or 40%, are outright harmful.

this would mean that if this tech is rolled out widely in sensitive industries such as healthcare, even with some safeguard, it would be very likely to result in the serious harm or even death for a non-trivial number of people.

We found no evidence of hallucinations in competing speech recognition systems such as Google Speech-to-Text (tested in April 2023) or the latest Google Chirp model (tested in December 2023)...We similarly identified exactly 0 comparable hallucination concerns among the same 187 audio segments from Amazon, Microsoft, AssemblyAI, and RevAI speech-to-text services.

The phrase “as such, we believe hallucinations to currently be an OpenAI-specific concern” deserves a call-out as I think that might end up being a recurring pattern in the future. Whatever the “AI” industry does, OpenAI seems to be doing with less care, less safety, and more haphazard governance.


Edited:    |       |    Search Twitter for discussion