Researchers Say AI Transcription Tool Used In Hospitals Invents Things (apnews.com) 32
Longtime Slashdot reader AmiMoJo shares a report from the Associated Press: Tech behemoth OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near "human level robustness and accuracy." But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text -- known in the industry as hallucinations -- can include racial commentary, violent rhetoric and even imagined medical treatments. Experts said that such fabrications are problematic because Whisper is being used in a slew of industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos.
The full extent of the problem is difficult to discern, but researchers and engineers said they frequently have come across Whisper's hallucinations in their work. A University of Michigan researcher conducting a study of public meetings, for example, said he found hallucinations in eight out of every 10 audio transcriptions he inspected, before he started trying to improve the model. A machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisper transcriptions he analyzed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper. The problems persist even in well-recorded, short audio samples. A recent study by computer scientists uncovered 187 hallucinations in more than 13,000 clear audio snippets they examined. That trend would lead to tens of thousands of faulty transcriptions over millions of recordings, researchers said. Further reading: AI Tool Cuts Unexpected Deaths In Hospital By 26%, Canadian Study Finds
The full extent of the problem is difficult to discern, but researchers and engineers said they frequently have come across Whisper's hallucinations in their work. A University of Michigan researcher conducting a study of public meetings, for example, said he found hallucinations in eight out of every 10 audio transcriptions he inspected, before he started trying to improve the model. A machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisper transcriptions he analyzed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper. The problems persist even in well-recorded, short audio samples. A recent study by computer scientists uncovered 187 hallucinations in more than 13,000 clear audio snippets they examined. That trend would lead to tens of thousands of faulty transcriptions over millions of recordings, researchers said. Further reading: AI Tool Cuts Unexpected Deaths In Hospital By 26%, Canadian Study Finds
Testing Methodology? (Score:5, Insightful)
So what testing methods did OpenAI use to ensure this product would meet the appropriate mean time between faults for a medical environment?
Validation Methodolgy. (Score:3, Insightful)
So what testing methods did OpenAI use to ensure this product would meet the appropriate mean time between faults for a medical environment?
What medical environment, accepted this pathetic bullshit after finding the first three reports full of imaginary medical “problems”?
Fault the controlled environment that should have never accepted a PT Barnum grade attempt at selling enhancing snake oil.
Not news. (Score:5, Funny)
They've been using that AI in the billing department for years.
setting a low bar (Score:2)
near "human level robustness and accuracy."
That's damning with faint praise. Have you met some people?
Re: (Score:2)
What, again? (Score:1)
Re: (Score:2)
Re: (Score:2)
Not a dupe? (Score:4, Informative)
This is not a dupe, it's a transcription of https://tech.slashdot.org/stor... [slashdot.org]
Re: (Score:2)
This is not a dupe, it's a transcription of https://tech.slashdot.org/stor... [slashdot.org]
Might be interesting to play a telephone game with these LLM transcription services. See what every new hallucination brings. Then perform a triple modular redundancy transcription and see if that can succeed without error.
A possible mitigation (Score:1)
Privacy rises above the other considerations. While the problem exists, it can't be solved by keeping the original audio.
Traditional TTS can be used to provide a transcription with more obvious failures. What they should do is produce both the LLM and TTS transcriptions, and then compare the two and highlight differences so that they can be made known.
At that point we won't have solved the issue, but we will be one step closer to the solution, and we will know where likely errors in the transcription proces
Re: (Score:2)
TTS is Text To Speech, the opposite of dictation/speech-to-text
Readers Say AI Editors Used In /. Reposts Things (Score:4, Informative)
Seriously, it hasn't been even 3 days.
Dupe Dupe Dupe (Score:3)
It's like deja vu all over again....
https://tech.slashdot.org/stor... [slashdot.org]
You don't say? (Score:2)
medical people are captive of asinine procedures (Score:2)
Doctors don't think, they just follow procedures now.
The medical administrators have "MBA'd" the operation: They outsource their brain and all operational functions to these all-in-one corporate systems, and when they get cryptojacked, all hospital staff are slack jawed wondering what to do.
Best advice is don't get sick, because your needs are
Re: (Score:2)
Most doctors and nurses aren't all that happy about this situation either. The last time my mom went in for a surgery was a day that the board was going to walk through the hospital and investigate procedure. The doctors were absolutely puckered. Every little thing had to be perfect, or else. It was ridiculous, seeing a hospital run like any other big business. It wasn't about taking care of the people that day. It was about presenting well for the board. We're well past the point where patient care takes p
Re: (Score:2)
It's been said for a long time that machine-like thinking drains us of our intuition and other intelligences, especially the ones which are more in touch with contextual realities. Many things which are in essence good, like DEI movements, are done in a machine-like, blind, robocop "put down the weapon", self defeating way, because people aren't allowed to express intuitive contextual perceptions.
Re: (Score:2)
but it seems clear you're correct that people don't have intuition or creativity.. it's a type of learned helplessness...
a guy at my health club says his kids have no idea where they live, you couldn't give them directions, because they rely on google maps for instance.
Smart tools make you dumber... and create a dependency.
That's why I sometimes joke I program with sticks and stones
Re: (Score:2)
But since when did outsourcing your whole operation make sense? Not much need for management when someone else is doing all the thinking. What these educated morons need to see is the downsides of the monoculture. They are supposed
Re: (Score:2)
MBAs will be the death of us all. Some of us more quickly than others. They'll kill the world for one more quarter if increased profits if we don't curb-stomp them out of our system. But apparently we're stuck in full worship mode when it comes to the worthless bastards. Profit above all. Greed is God.
Corporate BS Generator (Score:2)
What did we expect? When the sound recording quality drops, the model just wants to continue going with the usual corporate BS narrative because that's what it was trained on/for.
duplicate news (Score:1)
Malfunction (Score:2)
LLMs do not hallucinate, as that is something that requires some kind of intelligence. LLMs malfunction, which is what this is doing.
Re: (Score:2)
It's not even a malfunction, it's what they were designed to do. LLMs are not reasoning machines, they are language processing machines. The snag is that the models have no good ways to fix these unusual results without adding even more layers, internal correction loops (ie, compare multiple answers and choose what is best), or other complications. Then they're no longer LLMs but the LLMs are now just one component of a larger model: which to me is a good thing, use LLM as a building block instead of ramp
I prefer hallucinations vs doctor’s handwrit (Score:1)
Hang on a minute... (Score:2)
Well, I'll be touched by a BBC presenter! What a surprise!
Researchers = people who want attention (Score:2)
I still can't get over this exert:
"Researchers aren't certain why Whisper and similar tools hallucinate, but software developers said the fabrications tend to occur amid pauses, background sounds or music playing."
Did these "researchers" just ignore confidence scores and turn up the temperature of the model to 11? It is after all one of those articles cheerleading regulation. I'm sure that will lead to perfect STT.
"The prevalence of such hallucinations has led experts, advocates and former OpenAI employee
The "AI randomly starts to bull**** problem" (Score:2)
[ ] Solved
[X] Not Solved