Hospitals use transcription tool powered by hallucination-prone OpenAI model

A few months ago, my doctor showed off an AI transcription tool he used to record and summarize his patients’ meetings. In my case, the summary was fine, but the researchers cited by ABC News I’ve found that’s not always the case with OpenAI’s Whisper, which powers a tool many hospitals use; sometimes he just makes things up completely.

Whisper is used by a company. called nabla for a medical transcription tool that estimates it has transcribed 7 million medical conversations, according to ABC News. More than 30,000 doctors and 40 health systems use it, the outlet writes. Nabla is reportedly aware that Whisper may be hallucinating and is “addressing the issue.”

A group of researchers from Cornell University, the University of Washington and others found in a study that Whisper hallucinated in about 1 percent of the transcripts, making up entire sentences with sometimes violent feelings or meaningless phrases during silences in the recordings. The researchers, who collected audio samples from TalkBank’s AphasiaBank as part of the study, noted that silence is particularly common when someone with a language disorder called aphasia speaks.

One of the researchers, Allison Koenecke of Cornel University, published examples like the following in a thread about the study.

The researchers found that the hallucinations also included made-up medical conditions or phrases you might expect from a YouTube video, such as “Thanks for watching!” (OpenAI reportedly used to transcribe over a million hours of YouTube videos to train GPT-4.)

The study was presented in June at the FAccT conference of the Association for Computing Machinery in Brazil. It is not clear if it has been peer reviewed.

OpenAI spokesperson Taya Christianson sent a statement via email to The edge:

We take this issue seriously and continually work to improve, including reducing hallucinations. For use of Whisper on our API platform, our usage policies prohibit use in certain high-risk decision-making contexts, and our model card for open source usage includes recommendations against use in high-risk domains. We thank the researchers for sharing their findings.

Hospitals use transcription tool powered by hallucination-prone OpenAI model– BC

Leave a Comment Cancel Reply