Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Medicine AI

Researchers Say AI Transcription Tool Used In Hospitals Invents Things (apnews.com) 32

Longtime Slashdot reader AmiMoJo shares a report from the Associated Press: Tech behemoth OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near "human level robustness and accuracy." But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text -- known in the industry as hallucinations -- can include racial commentary, violent rhetoric and even imagined medical treatments. Experts said that such fabrications are problematic because Whisper is being used in a slew of industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos.

The full extent of the problem is difficult to discern, but researchers and engineers said they frequently have come across Whisper's hallucinations in their work. A University of Michigan researcher conducting a study of public meetings, for example, said he found hallucinations in eight out of every 10 audio transcriptions he inspected, before he started trying to improve the model. A machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisper transcriptions he analyzed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper. The problems persist even in well-recorded, short audio samples. A recent study by computer scientists uncovered 187 hallucinations in more than 13,000 clear audio snippets they examined. That trend would lead to tens of thousands of faulty transcriptions over millions of recordings, researchers said.
Further reading: AI Tool Cuts Unexpected Deaths In Hospital By 26%, Canadian Study Finds

Researchers Say AI Transcription Tool Used In Hospitals Invents Things

Comments Filter:
  • by Drethon ( 1445051 ) on Tuesday October 29, 2024 @09:15AM (#64902531)

    So what testing methods did OpenAI use to ensure this product would meet the appropriate mean time between faults for a medical environment?

    • So what testing methods did OpenAI use to ensure this product would meet the appropriate mean time between faults for a medical environment?

      What medical environment, accepted this pathetic bullshit after finding the first three reports full of imaginary medical “problems”?

      Fault the controlled environment that should have never accepted a PT Barnum grade attempt at selling enhancing snake oil.

  • Not news. (Score:5, Funny)

    by msauve ( 701917 ) on Tuesday October 29, 2024 @09:16AM (#64902539)
    >AI Transcription Tool Used In Hospitals Invents Things

    They've been using that AI in the billing department for years.
  • by Anonymous Coward
    Clearly they're also posting dupes [slashdot.org].
  • Not a dupe? (Score:4, Informative)

    by billybob2001 ( 234675 ) on Tuesday October 29, 2024 @09:19AM (#64902547)

    This is not a dupe, it's a transcription of https://tech.slashdot.org/stor... [slashdot.org]

    • This is not a dupe, it's a transcription of https://tech.slashdot.org/stor... [slashdot.org]

      Might be interesting to play a telephone game with these LLM transcription services. See what every new hallucination brings. Then perform a triple modular redundancy transcription and see if that can succeed without error.

  • by Anonymous Coward

    Privacy rises above the other considerations. While the problem exists, it can't be solved by keeping the original audio.

    Traditional TTS can be used to provide a transcription with more obvious failures. What they should do is produce both the LLM and TTS transcriptions, and then compare the two and highlight differences so that they can be made known.

    At that point we won't have solved the issue, but we will be one step closer to the solution, and we will know where likely errors in the transcription proces

  • by TigerPlish ( 174064 ) on Tuesday October 29, 2024 @09:22AM (#64902561)

    Seriously, it hasn't been even 3 days.

  • by JustAnotherOldGuy ( 4145623 ) on Tuesday October 29, 2024 @09:24AM (#64902567) Journal

    It's like deja vu all over again....

    https://tech.slashdot.org/stor... [slashdot.org]

  • We tested Copilot for some reason. It mishears one word and goes off to some weird places in the transcription of meetings, which is typically longer than the meeting itself. It has no idea what's important or what we're talking about. It pretty much just makes every sentence a bullet point then invents a bunch of BS we didn't even say.
  • Medical people are well educated idiots. They spend all their time on ipads and computer screens trying to figure out which button to press or which field fo fill in.
    Doctors don't think, they just follow procedures now.

    The medical administrators have "MBA'd" the operation: They outsource their brain and all operational functions to these all-in-one corporate systems, and when they get cryptojacked, all hospital staff are slack jawed wondering what to do.

    Best advice is don't get sick, because your needs are
    • Most doctors and nurses aren't all that happy about this situation either. The last time my mom went in for a surgery was a day that the board was going to walk through the hospital and investigate procedure. The doctors were absolutely puckered. Every little thing had to be perfect, or else. It was ridiculous, seeing a hospital run like any other big business. It wasn't about taking care of the people that day. It was about presenting well for the board. We're well past the point where patient care takes p

      • by Bongo ( 13261 )

        It's been said for a long time that machine-like thinking drains us of our intuition and other intelligences, especially the ones which are more in touch with contextual realities. Many things which are in essence good, like DEI movements, are done in a machine-like, blind, robocop "put down the weapon", self defeating way, because people aren't allowed to express intuitive contextual perceptions.

        • for sure, I'm saying " the smarter the tool, the dumber the operator".. feel free to quote me...
          but it seems clear you're correct that people don't have intuition or creativity.. it's a type of learned helplessness...
          a guy at my health club says his kids have no idea where they live, you couldn't give them directions, because they rely on google maps for instance.
          Smart tools make you dumber... and create a dependency.
          That's why I sometimes joke I program with sticks and stones ... I only use bluefish and na
      • I think you are quite right about the profit motive overriding all other reasonable concerns. I can also accept that the front line health care workers in general are dismayed at having to work within these systems. We all are trapped in someone else's maze, not saying I'm exempt.

        But since when did outsourcing your whole operation make sense? Not much need for management when someone else is doing all the thinking. What these educated morons need to see is the downsides of the monoculture. They are supposed
        • MBAs will be the death of us all. Some of us more quickly than others. They'll kill the world for one more quarter if increased profits if we don't curb-stomp them out of our system. But apparently we're stuck in full worship mode when it comes to the worthless bastards. Profit above all. Greed is God.

  • What did we expect? When the sound recording quality drops, the model just wants to continue going with the usual corporate BS narrative because that's what it was trained on/for.

  • It's not interesting, and it has already been posted. https://tech.slashdot.org/stor... [slashdot.org]
  • LLMs do not hallucinate, as that is something that requires some kind of intelligence. LLMs malfunction, which is what this is doing.

    • It's not even a malfunction, it's what they were designed to do. LLMs are not reasoning machines, they are language processing machines. The snag is that the models have no good ways to fix these unusual results without adding even more layers, internal correction loops (ie, compare multiple answers and choose what is best), or other complications. Then they're no longer LLMs but the LLMs are now just one component of a larger model: which to me is a good thing, use LLM as a building block instead of ramp

  • Manual inputs will also be prone to mistakes, especially since time per patient has been shrinking constantly with no improvement in sight. Freeing up medics’ attention to do other things might help correct such mistakes. Medical services is at least 50% bureaucracy, any help there will do miracles.
  • ...are they complaining that using a service based on a generative LLM, which sole function is to make things up according to input text, is making things up?

    Well, I'll be touched by a BBC presenter! What a surprise!
  • I still can't get over this exert:

    "Researchers aren't certain why Whisper and similar tools hallucinate, but software developers said the fabrications tend to occur amid pauses, background sounds or music playing."

    Did these "researchers" just ignore confidence scores and turn up the temperature of the model to 11? It is after all one of those articles cheerleading regulation. I'm sure that will lead to perfect STT.

    "The prevalence of such hallucinations has led experts, advocates and former OpenAI employee

Over the shoulder supervision is more a need of the manager than the programming task.

Working...