ChatGPT-4 Beat Doctors at Diagnosing Illness, Study Finds (nytimes.com) 102
Dr. Adam Rodman, a Boston-based internal medicine expert, helped design a study testing 50 licensed physicians to see whether ChatGPT improved their diagnoses, reports the New York TImes. The results? "Doctors who were given ChatGPT-4 along with conventional resources did only slightly better than doctors who did not have access to the bot.
"And, to the researchers' surprise, ChatGPT alone outperformed the doctors." [ChatGPT-4] scored an average of 90 percent when diagnosing a medical condition from a case report and explaining its reasoning. Doctors randomly assigned to use the chatbot got an average score of 76 percent. Those randomly assigned not to use it had an average score of 74 percent.
The study showed more than just the chatbot's superior performance. It unveiled doctors' sometimes unwavering belief in a diagnosis they made, even when a chatbot potentially suggests a better one.
And the study illustrated that while doctors are being exposed to the tools of artificial intelligence for their work, few know how to exploit the abilities of chatbots. As a result, they failed to take advantage of A.I. systems' ability to solve complex diagnostic problems and offer explanations for their diagnoses. A.I. systems should be "doctor extenders," Dr. Rodman said, offering valuable second opinions on diagnoses.
"The results were similar across subgroups of different training levels and experience with the chatbot," the study concludes. "These results suggest that access alone to LLMs will not improve overall physician diagnostic reasoning in practice.
"These findings are particularly relevant now that many health systems offer Health Insurance Portability and Accountability Act-compliant chatbots that physicians can use in clinical settings, often with no to minimal training on how to use these tools."
"And, to the researchers' surprise, ChatGPT alone outperformed the doctors." [ChatGPT-4] scored an average of 90 percent when diagnosing a medical condition from a case report and explaining its reasoning. Doctors randomly assigned to use the chatbot got an average score of 76 percent. Those randomly assigned not to use it had an average score of 74 percent.
The study showed more than just the chatbot's superior performance. It unveiled doctors' sometimes unwavering belief in a diagnosis they made, even when a chatbot potentially suggests a better one.
And the study illustrated that while doctors are being exposed to the tools of artificial intelligence for their work, few know how to exploit the abilities of chatbots. As a result, they failed to take advantage of A.I. systems' ability to solve complex diagnostic problems and offer explanations for their diagnoses. A.I. systems should be "doctor extenders," Dr. Rodman said, offering valuable second opinions on diagnoses.
"The results were similar across subgroups of different training levels and experience with the chatbot," the study concludes. "These results suggest that access alone to LLMs will not improve overall physician diagnostic reasoning in practice.
"These findings are particularly relevant now that many health systems offer Health Insurance Portability and Accountability Act-compliant chatbots that physicians can use in clinical settings, often with no to minimal training on how to use these tools."
Dunning-Kruger effect (Score:5, Insightful)
So the AI was 90% accurate, but most of the time doctors didn't trust it so when ahead with their own incorrect diagnosis? One thing I want to know is how bad the 10% that the AI missed were .. like major blunders or what? Also, what about the 26% that the doctors missed .. how severe was the error? Anyone read the actual study? (Yes I know it's linked, but I'm an a slashdotter.)
Re:Dunning-Kruger effect (Score:5, Insightful)
In a binary classification task, there are two numbers that should be reported, false negatives and true positives, or alternatively recall and precision, or alternatively the confusion matrix, etc.
The point is that comparisons of classifiers (humans doctors or AI) are impossible on a linear scale, and anyone who reports results on a linear scale is biased. The math says so.
Re:Dunning-Kruger effect (Score:5, Insightful)
In a binary classification task, there are two numbers that should be reported, false negatives and true positives, or alternatively recall and precision, or alternatively the confusion matrix, etc.
This makes no sense. The answers were graded by a panel of expert doctors. It wasn't a binary classification task, there were multiple answers to each question.
Re: (Score:2, Informative)
In a multiple class problem (say N possible answers) there is a NxN confusion matrix, so even more numbers that must be reported to compare two classifiers. Also, a multiple class problem can always be represented as a sequence of binary classifications, so there is really no loss of generality.
In all cases, accuracy alone is not a useful way to compare two N-way classifiers or even rank a collection of them.
Re: (Score:2)
Re: Dunning-Kruger effect (Score:5, Informative)
How is it not a classifier? It takes a bunch of encoded information about signs and symptoms, and then attempts to identify the condition associated with them. Take a large amount of fuzzy information and tell you one (or a few) things that you could be looking at is pretty much exactly the definition of a classifier.
Re: (Score:2)
Re: (Score:2)
Absolutely - you know - things that don't fit the description above. For example, generating huge fuzzy outputs from relatively small inputs, like image generation.
Re: (Score:2)
Re: (Score:2)
Ironic is the subject line.
Re:Dunning-Kruger effect (Score:5, Insightful)
Also, what percentage of the 10% were blazingly wrong bull**** answers, AKA "hallucinations"?
As in "70 YO male, lifetime smoker, presents with a persistent cough and severe shortness of breath" = "gangrenous foot, immediate amputation required to save patient"
Re: (Score:3, Interesting)
Also, what percentage of the 10% were blazingly wrong bull**** answers, AKA "hallucinations"?
If they were, then that makes the human doctors look even worse.
If the incorrect ChatGPT diagnoses were reasonable, the doctors likely made the same errors, and got an additional 14% wrong.
But if the incorrect ChatGPT diagnoses were blazing wrong bull****, the doctors should've easily corrected them, and got an additional 24% wrong.
Re: (Score:2)
Also, what percentage of the 10% were blazingly wrong bull**** answers, AKA "hallucinations"?
If they were, then that makes the human doctors look even worse.
If the incorrect ChatGPT diagnoses were reasonable, the doctors likely made the same errors, and got an additional 14% wrong.
But if the incorrect ChatGPT diagnoses were blazing wrong bull****, the doctors should've easily corrected them, and got an additional 24% wrong.
Yeah, it only has to be better than the humans ... which seems to be a lower bar than expected.
Re: (Score:3)
Yeah, it only has to be better than the humans ... which seems to be a lower bar than expected.
It is a low bar indeed. So many people look at doctors as infallible. Meh. They are just people, and I'd venture a guess that the popular meme of their infallibility leads many to believe that any decision they make is correct.
I seldom go to see doctors, and as an example, my last visit during the plague year was for cellulitis. Doctor was apparently miffed that I self diagnosed, and wanted to prove to me it was something else. Then was a bit more agitated that my diagnosis was correct.
Ego based diagno
Re:Dunning-Kruger effect (Score:4, Interesting)
You have lots of things right there. Especially about understanding that a doctor is a human and that you are responsible for your own body. That means that you are trying to use them as an expert to get advice rather than expecting them to fix you.
I think a useful approach for IT guys going to the doctor is to think what you want in a good bug report. You want the full information and you don't want any suggested diagnosis until you heard everything. Someone comes to you with their disk full, you don't want to miss the fact that there's someone else writing to the disk at much greater speed than it should be. You don't want to hear "my disk is full, expand it for me". You want to hear "I have a 100TB disk, every time I clean out the logs and delete a month from the history it frees up 10TB, but then fills up again in two days".
There's a reason the doctors moto isn't "cure them all". It's "do no harm". If the doctor gives the wrong cure - chops off the wrong leg trying to get rid of your cancer for example - that can be much worse than if they did nothing and left it for a different doctor to do the right cure. Nowadays they even changed the process for joint replacement so that the patient gets given a marker pen and writes something like "replace this knee" on their own leg.
Re: (Score:2)
Nowadays they even changed the process for joint replacement so that the patient gets given a marker pen and writes something like "replace this knee" on their own leg.
Yup - My SO had some shoulder operations over the last decade ending up with a full reverse replacement. Each time, they used the sharpie trick. A very good idea. It was a super success, her surgeon invited other residents in to observe during the follow up appointments.
I think that some Doctors get annoyed with me because I approach them as an equal, not them an oracle of wisdom. Bruises da ego, I suppose.
Re: (Score:2)
If they were, then that makes the human doctors look even worse.
You are kind of assuming that ChatGPT only improved human doctors. It could be that there were some diagnoses that the human doctors got right, but ChatGPT provided a convincing but completely wrong justification for a different diagnosis. I wouldn't also rule out the uncanny valley effect. Probably the doctors quite quickly understand that the AI doesn't know what it's talking about, just spouts from a kind of hidden script. They come to actively distrust the opinion of the AI and find it offputting.
In a r
Re: (Score:2)
In a real sense, this probably says more bad things about AI in general than about these particular doctors.
No, it says quite a lot about the doctors.
Diagnostic medicine is the art of finding the signal in an absurd amount of noise.
If your answer to noisy signal is to discard it, then I sure as fuck don't want you treating me. Go kill someone else, thanks.
Re: (Score:2)
It's a good thing for algorithms, when they're wrong, to be spectacularly wrong. It makes it easier to identify the mistakes. Your guy with a cough is probably going to ask for a second opinion when they want to aputate his foot.
Theoretically this would also be good for the human physicians, but they tend to argue very effectively in their own favour.
Re:Dunning-Kruger effect (Score:5, Insightful)
Apart from this, doctors can be "intelligently wrong", by giving a diagnose which is not chiseled in stone and starting a treatment that would also help related illnesses. How often has your doctor said "call me when things get worse" as he has sent you home with a prescription?
Doctors do not want 100% accuracy, as the amount of work to get the last percents right is huge and they have other patients to treat. They want accuracy that is good enough.
Re:Dunning-Kruger effect (Score:5, Informative)
Part of the reason for this is that there are important concepts about being accurate and doing no harm. False positives and false negatives can be devastating. A false cancer diagnosis, for example, can ruin a patient's life, with substantial financial and psychological impact.
There's also the related important statistical concepts of rates of occurrence. If the accuracy of a test is 99% correct with a 1% false positive rate, but the underlying rate of occurrence is very low, that 1% of false positives can lead to an overwhelming number of misdiagnoses. Not only do those carry significant unnecessary burden for the patient, they create a similar burden for the healthcare system.
So, a doctor saying, "call me if it gets worse," is often thinking that it's very likely you have the flu, and much less likely that you have dengue fever. The conservative course will be the right one in the vast majority of the time. That idea is summed up in the saying famous within healthcare, "when you hear hooves, think horses, not zebras." Giving the patient an opportunity for re-review provides a path for treating the horse cases while providing a path to handle the zebras as well.
An important part of the cited test, which I'd like to read, is if it presented cases with normal rates of occurrence.
Re: (Score:2, Insightful)
Re: (Score:2)
Sounds to me like his middle name is "Quack Quack," and he's chief snake oil salesman for some multi-level marketing company.
Re: (Score:2)
I dunno. I guess that he'd probably get short shrift from most MLM groups. They have better ethics than that. He should try somewhere more dubious, for example the US government that is to come.
Re: (Score:2)
He used to party with RFK jr.
Re:Dunning-Kruger effect (Score:5, Interesting)
This is not the first time this has been tried. Remember IBM Watson? It had better stats that this thing here, but unfortunately when it was wrong, it would have occasionally killed the patient. Hence the application scenario was scrapped. I bet it is the same here.
Re: (Score:2)
It had better stats that this thing here
No, it didn't.
At some cancers, it matched this. At most, it trailed wildly behind.
but unfortunately when it was wrong, it would have occasionally killed the patient.
You made this up. This is a risk for oncology in general.
Hence the application scenario was scrapped.
It most certainly fucking was not.
IBM Watson was, and is, still used.
However, it simply isn't that good, and is slowly being replaced by better things.
Why do you misinform? Were you touched in a naughty place by an LLM?
Re: (Score:2)
You have really not followed what happened, did you? And then you have to be an AdHominem pushing asshole about it, which to me just indicates you have nothing solid.
Here is a reference for you: https://spectrum.ieee.org/how-... [ieee.org]
There are many more. Watson _failed_ in the medical space and it was due to, among other things, what today is called "hallucination".
It is also possible you are lacking another source of information I have: Several invitation-only talks with attached workshops by Watson developers a
Re: (Score:2)
You have really not followed what happened, did you?
Sure have. Everything I said was accurate, unlike your bullshit regurgitated from headlines attached to articles that you couldn't be bothered to read.
And then you have to be an AdHominem pushing asshole about it
Using the phrase ad hominem to mean "insulting" doesn't make you sound smart. It makes you sound like you're trying to sound smart.
There was no argumentation via ad hominem here, so let that phrase rest.
which to me just indicates you have nothing solid.
Ah, finally, an actual fallacy.
If that's how you judge the correctness of people, I can tell you right now that you're very often wrong.
Here is a reference for you: https://spectrum.ieee.org/how- [ieee.org]... [ieee.org]
Cute opinion piece,
Re: (Score:2)
To rescue your bullshit claims, go ahead and show us where it has killed people
I see your problem: You cannot read. Because I never once claimed it did kill people.
Re: (Score:2)
What you said was:
It had better stats that this thing here, but unfortunately when it was wrong, it would have occasionally killed the patient.
Which is something you pulled entirely from your ass.
Literally- you made it the fuck up.
Now slink the fuck away.
Re:Dunning-Kruger effect (Score:5, Informative)
*) The test didn't seem to subtract from the score for wrong answers. The LLM couldn't have given a wildly wrong diagnosis and not have been penalized (that's what I understand from the paper).
*) The test is set up to give the LLM a bit of an advantage with the Jenning's effect [go.com], that is, not all the humans were able to finish all the test cases, and they were instructed to be slow and accurate instead of being fast.
You might ask, "How did this paper pass peer review? What is wrong with you?" And the answer is they weren't trying to test LLMs against doctors. They were trying to test how well doctors worked when augmented by LLMs. They had a little side note about LLMs vs doctors, but they are fully aware (and clearly state) that this doesn't mean LLMs are better than doctors.
The main point is testing how well augmented doctors perform. The paper does good science (afaict) investigating this question. All the hype comes from the news article, and it is fake.
*tl;dr the article is hype, the paper is good.
Re: (Score:2)
Like previous attempts at doing this, trying to augment the causative understanding of doctors with ridiculously superior classification is a no-brainer, because it's the part that doctors, being human, statistically suck at.
People create this mythology around human pattern matching ability- but the fact is, we're flat out terrible at objectively matching. There's little evolutionary need for science. If a false pattern match turns into any kind of be
Re: (Score:2)
Augmentation of doctors with LLMs seems obvious to me. Like previous attempts at doing this, trying to augment the causative understanding of doctors with ridiculously superior classification is a no-brainer, because it's the part that doctors, being human, statistically suck at.
Ok, that's an intuition you've had that leads to a reasonable hypothesis. Sounds good.
Under the scrutiny provided by the study at hand, the hypothesis is unsupported. Maybe another study will provide better results.
Re: (Score:2)
Under the scrutiny provided by the study at hand, the hypothesis is unsupported. Maybe another study will provide better results.
I agree. All it really shows is that doctors continue to suck, and we need to find a way to prop up their very human failures so that their very excellent medical understanding can shine.
Re: (Score:2)
Re: (Score:2)
I think most people would disagree that "doctors suck,"
Hard to argue that doctors don't have a captive audience ;)
where is that coming from?
It's nothing personal- it's just numbers.
In general, MDs simply aren't very good [nih.gov] at their job.
Now that's not to say "I could do better" or whatever. And certainly some excel far more than the median, but the median is quite bad. Half of all doctors are even worse than that.
There's no need to worship these people. We need to invent the tools to make them better at what they do.
Re: (Score:2)
Re: (Score:2)
One thing I want to know is how bad the 10% that the AI missed were....
That's a good question. But I think a more pertinent one is, "how does a statistical likelihood of one letter following another one lead to an accurate diagnosis?" To me, the most likely answer is two fold:
1) Lots of medical training data of diagnoses by humans. Without humans, LLMs are worthless. This is what AI proponents tend to sweep under the rug. Without continuous human output (data) to serve as LLM input, the LLMs will fall apart since LLM output cannot be used as LLM input without severe degradatio
Re: (Score:2)
There are dozens of ways AI can surpass humans even though humans taught it. Example 1, When a patient comes in with certain symptoms or test results the doctor may tell them don't worry its not cancer. But then eventually it does turn out to be cancer. The AI learns if a patient has certain symptoms it’s associated with cancer so it can detect it earlier. Doctors usually make the mistaken diagnosis in the early stage, but then later you can't ignore it. The doctor may be in the habit of telling a pat
Re: (Score:2)
When a result seems very mysterious, it's often a good idea to consider whether your starting assumptions are true. Often you will discover that they are not.
Re: (Score:2)
understanding is not possible for an LLM
This is a religious argument.
You can't back it up with any kind of objective measure.
You are, otherwise correct, that current training methodologies do in fact require human output.
Re: (Score:3)
Article worth a read to see the flaws. As you say, no analysis on the severity of mis-diagnosis by any party. Also, median doc e
Doctors have used AI for decades (Score:3)
Re: (Score:1)
When I hear something like this about AI, the first thing I conclude is someone has heavily fudged the test to give the AI an unfair advantage, like they only chose inexperienced doctors to participate, or the test studies they were reviewing were also in the AI's training data so the real question is why it was only 90% accurate.
Study on ChatGPT-4 ... (Score:5, Insightful)
The doctors were fed information about the patients that was already suitable for giving to ChatGPT ... not required to gather the information themselves
So the largest part of the job of Doctor was omitted, and replaced with data tailored for machines
The researchers gave little or no instruction on how to use ChatGPT, but then compared the results to them using it with all their ChatGPT skills ...
Study finds that people who know how to get the best out of ChatGPT use it well ... and Doctors when taken out of their normal environment do not do as well ...
Re: Study on ChatGPT-4 ... (Score:4, Informative)
Re: Study on ChatGPT-4 ... (Score:4, Insightful)
Re: (Score:2)
I think that was the point of it being used as tool by the doctor, chatGPT itself cannot get the patient history, but, from what I understood, comparing it being used by a doctor as an assistant or being feed the history and using its result as showed that the doctors were not using (or thrusting) the tool, because in such a low N 74% to 76% means there was no difference.
I don't think training the doctors to use it is the issue, because using a LLM is as straight forward as it gets, it is just writing it do
Re:Study on ChatGPT-4 ... (Score:5, Interesting)
AI isn't the relevant problem here. (Score:1, Interesting)
The problem is that doctors are making elementary errors, failing to verify, and putting ego and large numbers of consultations a day over and above the wellbeing of patients.
That, to me, is gross malpractice.
The correct answer is not necessarily more AI, but that might well be the end result. The correct answer is to require doctors to recertify through such test cases and withdrawing a license to practice if the success rate is under 90%.
AI is, ultimately, just using differential diagnosis, because that's
Re: (Score:3)
Re: (Score:2)
Re:AI isn't the relevant problem here. (Score:4, Interesting)
The problem is that doctors are making elementary errors, failing to verify, and putting ego and large numbers of consultations a day over and above the wellbeing of patients.
TFA does not contain enough information to draw that conclusion.
The correct answer is not necessarily more AI
AI will be part of the solution.
TFA says that ChatGPT reduced misdiagnoses from 26% to 24%. Two percent might not seem like much, but in a $5 trillion industry, it's a lot.
Doctors will do much better if they're trained to use AI technology. It should be incorporated into medical school curriculum.
Re: (Score:3)
Doctors will do much better if they're trained to use AI technology. It should be incorporated into medical school curriculum.
Exactly this. There has to be a correct procedure. Probably 1) examine patient record observations into (electronic) notes 2) do the diagnosis yourself 3) feed notes and current diagnosis to AI system and get it to suggest alternatives with some kind of probabilities and links to official statements of those diagnoses 4) rethink and re-examine everything with new knowledge.
By bringing AI in late in the process you allow the human and AI bias to be independent and ensure clean verification and training data.
Re: (Score:2)
Re: (Score:2)
I think it's a case of sloppy headline writing. AI isn't better than doctors at diagnosing, it's better at guessing *from text descriptions* than doctors are. Doctors normally examine the patient, generate multiple hypotheses and then test those hypotheses with further diagnostic procedures.
What was actually evaluated. (Score:5, Insightful)
Re: (Score:2)
What it shows is that people, to get proper treatment, need direct contact patient with a doctor.
The study described in TFA does not show that.
The error rate of doctors with direct contact was not compared to those without.
Re: What was actually evaluated. (Score:1)
Re: (Score:2)
Who gave the reference diagnosis then? The one 100% accurate?
The reference diagnosis is determined retrospectively from the patient outcome.
Re: (Score:2)
The error rate of doctors with direct contact was not compared to those without.
The error rate of doctors with direct contact was also not compared to the AI error rate. It wasn't part of the study at all. Which makes the study pretty useless.
Re: (Score:2)
I just looked up the paper, and that's exactly what it shows. This study was conducted online. The doctors never saw the patients.
And where did they get the "true" diagnosis the doctors were supposed to match? From the doctor who actually examined the patient and treated them.
Re: (Score:1)
What is shows is that people, to get proper treatment, need direct contact patient with a doctor. This is what doctors are taught, and expected to do. LLM or online consultation will not replace that.
Wrong. AI can be taught the entire medical history. Everything we know about medicine. Ever. And you don’t go to see a human doctor to talk to them about your diagnosis. You go to the doctor and both of you “talk” to the results of the tests you took. Which again, is something that can be automated. AI can also be taught how what “low” or “high” means when reading a blood report. Just like the human does.
Test. Review Results. Diagnose. If a $50K car diagn
Re:What was actually evaluated. (Score:4, Interesting)
Test. Review Results. Diagnose. If a $50K car diagnostic scanner can do that, I donâ(TM)t see why AI canâ(TM)t in medicine.
A car diagnostic scanner does not simply plug in and diagnose the vehicle, except for a small subset of tests. At best they have guided diagnosis and the technician has to perform various tests. This is exactly like the scenario in which the doctor is using a software agent to assist with diagnosis because in both cases, you need a trained professional to operate the equipment and perform the final diagnosis. They have to know enough to fact-check the machine, just like I know enough to recognize when Google gives me a completely bullshit answer to a technical question. Google doesn't know, though; their answer is written as if it were correct whether it is or not. The same is true of every one of these tools.
Re: (Score:1)
The car diagnostic scanner, was once untrusted too. Until it wasn’t.
The same will eventually be true of AI. Once AI learns “high” and”low” parameters and is trained on what to do next (not unlike the highly-trained human following the expert machine), it won’t have to re-learn it. Better yet, it won’t ever forget it. Unlike human brains do.
Your concerns, have an expiration date.
Re: (Score:3)
The car diagnostic scanner, was once untrusted too. Until it wasnâ(TM)t.
The people who know something about automotive diagnostics still don't trust it, which is how we can tell you know fuck-all about this subject.
Re: (Score:2)
Test. Review Results. Diagnose. If a $50K car diagnostic scanner can do that, I donâ(TM)t see why AI canâ(TM)t in medicine.
That's easy if you can run a complete set of tests and don't have false positive results from those tests. In real life, test results are ambiguous, better tests are expensive, doctors have to accept inputs like "it hurts when I do this" as a starting point, patients don't want to admit how little they exercise or how badly they eat, and so on.
Re: (Score:2)
Mechanics have to do the same thing! If I run a test on the Sprinter and it says there's a short in the EGR wiring I have to figure out if there actually is a short, or if it's actually a failed ground making it look like that, or if the EGR has seized up with soot and the motor that drives it is stalling out... The whole idea that you can just plug into even a car and have the scanner spit out the answer is horse shit. The scanner is used in conjunction with the manual and it has a whole series of troubles
Re: (Score:2)
You are exactly correct.
I am responding out of some sympathy. Other responses to your post don't seem to quite get it.
For instance:
The study described in TFA does not show that. The error rate of doctors with direct contact was not compared to those without.
Well, technically he is correct.
But I think he missed what you were saying.
The study used 105 validated cases where real doctors had made correct diagnoses which are the basis for a standard training set for computer assisted diagnosis. From those 105 standard resource cases, the authors found 50 that best fit the intent of their study, and then they whittled it down to 6 to a
Re: (Score:2)
This is a really common flaw in studies of this sort. They present the same data to the doctor and the AI, the AI does a better job of diagnosis, and they declare, "AI is better than doctors." You see papers like that all the time. But it's not true, because doctors do much more than read case reports before diagnosing real patients. Since the AI can't talk to the patient, observe their behavior, or conduct an exam, they make it "fair" by not letting the doctor do it either.
failure to understand procedure (Score:4, Insightful)
I don't trust these conclusions *at all*.
AI, and machine learning, as performed by computer scientists, completely miss the meaning of data and protocol.
In machine learning/AI, a computer scientist will try to achieve the highest possible AUC. This is frequently seen when a dataset of 1,000,000 tests (99% controls, 1% cases) yields the best results when predicted as ANYTHING -> CONTROL. For a doctor, the 1% cases are the difficult part, not the 99% of controls.
A doctor should operate by a hierarchy of diagnoses. If you show up at the clinic with a bleeding ass, would you like the doctor to aim for maximum prediction score (there's a 95% chance it's nothing) or would you like your doctor to ass-ume the worst and schedule a colonoscopy for you? I would rather the second option, something the AI, and the people organizing this study, completely miss.
Re: (Score:3)
Billions (Score:1)
Re: Billions (Score:2)
Re: (Score:2)
AI is just a tool (Score:2)
Pattern Recognition (Score:3)
For me, the takeaway is, LLMs can be a useful adjunct to diagnosis by a physician to help identify a range of possible causes, not replace a physician.
The real lesson healthcare CEOs may learn is "we can replace physicians with non-physicians and increase profits because Chat-GPT is better and cheaper than physicians."
Would you rather? (Score:1)
Re: (Score:3)
I would like to get treatment from a compassionate human with access to a experienced robot.
When the compassionate human fucks up in a very human way and attempts to dismiss the life-ending mistake with a compassionate apology, society will find the legal arguments against using humans in the future, quite compelling.
Human liability, will become all that matters from a risk mitigation perspective if we allow the current legal system to continue. And we will.
Re: (Score:2)
The diagnosis can benefit from AI.
The treatment can come from the human.
The AI can help monitor the efficacy and side effects of the treatment.
The human can determine where to go from there.
Though I will point out that a big part of being an effective doctor, especially in areas like surgery, one, by necessity, must lose certain elements of their compassion and empathy for humans. Cutting people open and removing bits of them, for example.
Re: (Score:2)
My mother was a very nice pleasant person with a bit of a temper. After being ignored for 5 hours in a hospital corridor, the nurse told her she needed to get up so they could run a test. She told the nurse to go to hell.
Nurse went to doctor, told doctor patient was being unreasonable, so they hit her with a dose of an anti-psychotic medication called "Haldol"-- which the hospital's EMR system had documented my mother as being particularly sensitive to. She had an unusual, but not unheard of, reaction to
Having dealt with doctors (Score:1)
And their mush-brained indifference and arrogance, I think a potato has a better chance at diagnosing than a doctor.
You've Got Leprocy! (Score:3)
You've Got Leprocy!
The Gambler, the Nun and the LLM (Score:2)
That's where AI should be good (Score:3)
I would bet that a closer study shows: Much higher failure rate when patterns are less clear (for example, if you diagnose based on skin discolouration, black patients will often be misdiagnosed by less experienced doctors, and by an AI).
Not trustworthy without restricted training (Score:1)
AI has a proven track record of making shit up. Medical AI could be possible IF and ONLY if they restrict the training data to trusted and verified medical sources. Otherwise you'll get medical "opinions" from some rando on reddit or tumblr mixed into the underlying model
Glaring flaw (Score:2)
The Wisdom of Crowds (Score:2)
Given the AI would have been trained on human diagnosis, the answers it spits out should be a sort of vague merging of all the inputs. So if you had all those doctors doing all those diagnoses together, on average they'd collectively do as well as the AI.
To me, this suggests that AI isn't the answer; what you want is an expert system assembled step-by-step by humans with clear chains of reasoning that are easy to update as new medical knowledge becomes available.
If you want to put an AI chatbot front end o
Re: (Score:2)
You've descibed an AI expert system. They exist, and they work okay, but they can be brittle. Setting up those "clear chains of reasoning" by hand is a big job and it's easy to miss something.
By "AI" I assume you actually mean something like GPT 4, a different kind of AI system, which is what they u
The one sector it really needs to be pushed (Score:2)
poobah (Score:2)
Did the chatbot look at and interview the patient? Or was this 100% on paper?
Even the authors didn't think LLMs should diagnose (Score:1)
Nothing we haven't heard before (Score:2)
Now throw something novel or undiscovered at it (Score:2)
Sure. Most of this is settled science and humans sometimes overvalue their intuition and expertise.
Now throw something new at it. Have it discover a new diagnosis ("I don't know what this is"), a systemic autoimmune syndrome, a form of cancer never seen before.
I can guarantee you one thing it never said. "I don't know what this is."
Driving with ChatGPT is driving in the rear-view mirror. It's great for middle of the bell curve stuff, certainly better than humans, but it will never make progress.
Physicians are selected for memorization (Score:2)
A great example is O
Not even a broken arm IME (Score:5, Interesting)
I had to argue with him to get an X-ray. When radiology came back showing the obvious mess in there, he defended himself because I reported no little pain in general and didn't jump when he manipulated the area.
I learned a valuable lesson at only 19: I'm on my own. Fast forward 35 years, and another doctor almost killed me by grossly over prescribing (6X recommended) a drug that it turns out I was allergic to. If I wasn't in California, I'd have sued his ass off. But two attorneys set me straight: California's somewhat recent tort reform made it next to impossible to succeed in most malpractice suits. So I learned another lesson: I'm on my own, and no one gives a shit.