Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
Medicine AI Microsoft

Microsoft's New AI Tool Outperforms Doctors 4-to-1 in Diagnostic Accuracy (wired.com) 62

Microsoft's new AI diagnostic system achieved 80% accuracy in diagnosing patients compared to 20% for human doctors, while reducing costs by 20%, according to company research published Monday. The MAI Diagnostic Orchestrator queries multiple leading AI models including OpenAI's GPT, Google's Gemini, Anthropic's Claude, Meta's Llama, and xAI's Grok in what the company describes as a "chain-of-debate style" approach.

The system was tested against 304 case studies from the New England Journal of Medicine using Microsoft's Sequential Diagnosis Benchmark, which breaks down each case into step-by-step diagnostic processes that mirror how human physicians work. Microsoft CEO of AI Mustafa Suleyman called the development "a genuine step toward medical superintelligence."

Microsoft's New AI Tool Outperforms Doctors 4-to-1 in Diagnostic Accuracy

Comments Filter:
  • Great we heard the same shit five years ago
    • Not a very constructive FP with a vacuous Subject, too. Were you just seized by the uncontrollable urge to FP something?

      I have three linked takes.

      The first take is that diagnosis is quite difficult. I think that is partly a matter of excessive specialization to deal with the overload of medical knowledge, but one of the negative repercussions is that many doctors avoid diagnoses. Also related to the flawed economic model, but it's relatively safe (and too profitable) to treat the symptoms without worrying t

    • by drnb ( 2434720 ) on Monday June 30, 2025 @01:44PM (#65486456)

      Great we heard the same shit five years ago

      Software assisting doctors, radiologists, etc has been going on for decades. For example bringing that suspicious "blob" in the medical imagery to the attention of the radiologist or doctor.

      There have also been "AI" expert systems for medical diagnosis for decades. Again, assisting, not replacing doctors.

      There have also been "AI" expert systems for medications, in particular drug interactions. Again, assisting, not replacing doctors and nurses.

      These new "AI" systems will likely continue as the previous "AI" systems, assisting, not replacing.

      • Taking shortcuts. I don't know what the current state is but the last time I looked which admittedly was when it was called machine learning instead of AI the system was fed a bunch of data consisting of things humans had already diagnosed and initially it looked like it was doing an amazing job until somebody pointed out that it had figured out that the slides that had the diseased parts also happened to have some framing that the slides with the healthy parts didn't have and that the AI was just using tha
        • by drnb ( 2434720 )
          "AI" is kind of a big bucket marketing phrase that tons of things get tossed into. It's been that way for decades. Possibly from the start. ML is just the latest tool, it saves the developer some time with respect to having to develop custom algorithms to address an "AI" topic problem. Computer Vision for example, doing "AI" with human developed algorithms for decades. ML is an awesome tool to add to the mix, but the cost? Not knowing how the decision was made?

          I don't know if the following story is real
      • There have also been "AI" expert systems for medical diagnosis for decades. Again, assisting, not replacing doctors.

        This. Been hearing this for a long time.

        Text books didn't replace doctors, expecting someone to self-diagnose from a textbook sounds unreliable. Self-diagnosing from a expert system too. An AI isn't much different, just a search engine with a medical index. I see it as the textbook, it might point you in a direction, but it doesn't have experience.

        • by drnb ( 2434720 )

          There have also been "AI" expert systems for medical diagnosis for decades. Again, assisting, not replacing doctors.

          This. Been hearing this for a long time.

          Text books didn't replace doctors, expecting someone to self-diagnose from a textbook sounds unreliable. Self-diagnosing from a expert system too. An AI isn't much different, just a search engine with a medical index. I see it as the textbook, it might point you in a direction, but it doesn't have experience.

          In the area of software development, it's pretty much a replacement for looking things up in a textbook too. What algorithm is better for this sort of data, getting sample code for some well known and studied algorithm. Sometimes it can recognize a problem than can be solved by gluing together several such well known and document algorithms. Which reading industry literature can probably inform you of as well.

      • Software assisting doctors, radiologists, etc has been going on for decades.

        I see a similar tension between humans and machines for calling baseball balls and strikes. The machines are pretty accurate and far more consistent. There may be a few umpires that are better than the machines, but the luck of the draw determines which umpire you get for a given game.

        The umpire union is influential, and obviously the umpires don't want to lose their jobs. The compromise right now is to have the machines call every pitch but only inform the umpires secretly. The umpires can use or ignore th

    • Take all studies like this with a grain of salt. A doctor doesn't diagnose a patient by reading a case study. They do it by talking to the patient, examining them, deciding what tests to order, etc. This is a contrived comparison that has little connection to how doctors actually work.

      • Note that no actual patients were diagnosed, so it's impossible to say that the AI is better than actual doctors in diagnosing real live human beings.

        Case studies are cases that are deliberately selected to be not like what doctors see every day (because why would doctors want to read about what they see every day?). But actual doctors have to diagnose what they see every day. If the AI is trained on case studies where the patient has an incredibly rare disease that hits one person in twenty million, the AI

        • If the AI is trained on case studies where the patient has an incredibly rare disease that hits one person in twenty million, the AI will be biased to find outré and unusual diseases, and miss "this patient has the flu."

          AI> Maybe it's lupus?...

        • NEJM exceptional case examples from memory: 1) patient presents with spectacularly high cholesterol, is eventually discovered to compulsively eat over 6 dozen eggs per day; 2) cluster of terrible cases where brain imaging revealed large scale destruction of brain, from domoic acid contamination of shellfish; 3) epidemiological tracing of tuberculosis transmission from an infected person on an airline flight, including seat maps showing locations of index case and infected persons.
      • > Take all studies like this with a grain of salt. A doctor doesn't diagnose a patient by reading a case study. They do it by talking to the patient, examining them, deciding what tests to order, etc. This is a contrived comparison that has little connection to how doctors actually work.

        LLM's can do DDX's based on getting the primary complaint and asking follow up questions. It will then provide what tests to order etc. While it can't do a physical exam, they are better than doctors at all other aspects.

    • Great we heard the same shit five years ago

      I won't be impressed until the AI has better recommendations than 4 out of 5 dentists.

    • Great we heard the same shit five years ago

      Only five? I worked with people over 20 years ago who did this. No AI was sued out of existence or lost its "license." No doctors were replaceable.

  • is what was the difference between the LLMs and the doctors? Especially when they claim the LLMs did things step by step as those doctors did.

    Were the doctors under the usual work pressure, fatigued, etcetera?

  • by SouthSeb ( 8814349 ) on Monday June 30, 2025 @01:34PM (#65486424)

    In spite of Suleyman's exaggeration, this is actually a good use case of "AI" exactly because it's not about intelligence.

    Diagnostics are a very algorithmic activity and requires the memory of an immeasurable repertoire of medical literature. Doctors could make use of this tool to quickly narrow and accelerate it, saving precious time and resources for patients.

  • by backslashdot ( 95548 ) on Monday June 30, 2025 @01:34PM (#65486428)

    Quality is more important than quantity. Who was missing the important diagnostic? As in, if the AI missed diagnosing people with cancer versus humans missing all the flu diagnosis. Which would you rather have?

    Note, I haven't read the article .. just going by the headline. Just pointing out that just because the "error rate" of humans is higher doesn't mean humans are less useful than AI.

    • Quality is more important than quantity.

      Not in modern bean-counter business management. All the product/service has to do is pass the bar of "good enough" and how much of it you can accomplish is the primary metric after that. For many businesses now the good enough bar is just what the competitor offers. As long as they are not significantly better in the same price class there's no reason to do better.

      And since healthcare is a for-profit venture most places, that's what's used. How many people can we get in/out the door in a day?

    • by Tailhook ( 98486 ) on Monday June 30, 2025 @01:54PM (#65486488)

      Quality

      This presumes we have quality. Do you believe that, without doubt? I don't. I have a lifetime of anecdotal evidence of failures by doctors, personally and among family, friends and others. Without (hopefully) inviting a deluge of corroboration, I can assure you the people reading this now can bury us in such stories.

      Beyond that, we are in desperate need of lower cost solutions for medicine. You're free to attribute the extreme costs we see however you wish, but finger pointing won't fix it: the powers and interests involved aren't listening. What is needed is a disruption, and this looks like a real possibility. I, at least, don't immediately dismiss it with AMA FUD.

      • I wasn't advocating dismissing it at all. Just that we have to make sure all the metrics are correct and comparisons within full context before ditching something wholesale. I'm saying make sure we get it right, that's all.

        • by Tailhook ( 98486 )

          I'm saying make sure we get it right

          I am saying I have no patience for the drearily predictable "quality" and "safety" FUD. There are severe problems in healthcare. Bad enough to risk neglecting our worship of medical authority. Bad enough to risk suffering possible unknown failures as an alternative to our chronic known failures.

          • by MrNaz ( 730548 )

            While I accept your point, I feel it necessary to add that the problems you are referring to, excessive cost and poor quality, are American problems. The rest of the civilised world has low cost or free healthcare and doctors that aren't ground into apathy by the capitalist machine.

            So the rest of the world would like to be cautious because we like what we have. Unlike Americans we do have something to lose.

  • by marcle ( 1575627 ) on Monday June 30, 2025 @01:35PM (#65486432)

    For one thing, doctors only 20% accurate? I know they make lots of mistakes, but that figure seems suspiciously low, and the article (and links) seem light on the specifics.

    After all, this is Microsoft tooting their own horn, and of course we believe every word. /s

    • More than likely, the problem is the lack of randomness in the cases selected for the study. Maybe they picked cases that were difficult for doctors, instead of cases that are *typical* for doctors.

    • They used Dr. Nick Riviera as the comparison.

      "now, the symptoms you describe point to 'bonus eruptus', it's a terrible disorder where the skeleton tries to leap out the mouth and escape the body."

    • For one thing, doctors only 20% accurate? I know they make lots of mistakes, but that figure seems suspiciously low,

      It's low because they were tested on puzzle cases that are deliberately selected to be hard.

      It's like saying most people are ok at commonplace arithmetic in everyday life. So how come their accuracy rate is only 20% in solving puzzles in The Scientific American Book of Mathematical Puzzles?

      • It's like saying most people are ok at commonplace arithmetic in everyday life. So how come their accuracy rate is only 20% in solving puzzles in The Scientific American Book of Mathematical Puzzles?

        The general population has been shown effectively innumerate.

        If you're a programmer, you're likely part of that population. I'd hate to leave anyone out.

        WTF is The Scientific American Book of Mathematical Puzzles?

        Yes, I looked it up, but that was my initial reaction. Why would Internet douchebag require strangers to not only know some obscure thing and also test and score better in it? What next? Do I need to know My Little Pony trivia, too? I'd hate to be ostracized by such a troll . . . (*gasp*)

  • Here, this one goes in your butt, and this one in your mouth..... no wait, THIS one goes in your butt and THIS one in your mouth. Don't forget to drink your Brawndo.

  • I only skimmed the article, but am I the only person who thinks that, if we had a situation or field of diagnosis where doctors were only getting it right 20% of the time, we would throw some research/education/analysis at it? Because 20% correct (or 80% incorrect) seems kinda concerning and I would think would lead to a lot of brouhaha or lawsuits?
    Maybe it's just me.

    • What do you call a doctor who graduated last in their class?

      Doctor.

    • by dgatwood ( 11270 )

      I only skimmed the article, but am I the only person who thinks that, if we had a situation or field of diagnosis where doctors were only getting it right 20% of the time, we would throw some research/education/analysis at it? Because 20% correct (or 80% incorrect) seems kinda concerning and I would think would lead to a lot of brouhaha or lawsuits? Maybe it's just me.

      I'm assuming this is based on edge cases, e.g. medical images where cancer was just barely starting to appear, situations where lupus is mistaken for rheumatoid arthritis, etc., in which case the human rate of correct diagnosis could indeed be very low, precisely because they were chosen from cases where humans had made mistakes before.

      If that is the case, then the question becomes whether the model is over-trained on these edge cases and would generate false positives, would miss obvious diagnoses, etc.

      • situations where lupus is mistaken for rheumatoid arthritis,

        It's never lupus.* Until it is.

        * I wanted to post a video of House saying it's not lupus, but for some reason YT is now requiring me to sign to confirm I'm not a bot. Sorry for not posting the reference.
    • I think the devil is in the details.

      Usually, the process of diagnosing a disease or problem, consists of a series of doctor visits, with follow-up tests, ruling out one possible condition at a time until the correct diagnosis is found.

      Maybe the 20% number refers to getting it exactly right on the first try. Maybe it's more about the selection of cases that were not random. But I agree, the percentages are suspect.

  • This tool could really ease the load on the healthcare system, especially public health programs and emergency rooms.
  • The system was tested against 304 case studies from the New England Journal of Medicine

    Just to check, the training of the relevant "AI" system was audited to make sure it did not see these cases during training?

    Then again, if the doctors had a subscription to the NEJM then including in the training data would be fair.

  • ...doctors don't make diagnoses based solely on text, they examine the patient.
    After years of practice, doctors develop useful diagnostic instincts.
    Also, the "304 case studies from the New England Journal of Medicine" are probably incomplete, written quickly to satisfy legal requirements, and may omit key insights.

    There is a LOT more to medicine than just text

  • I'm not really interested in paying wired to read it.

  • Sure, that will do it.

  • I bet there is a near 100% chance the LLMs had all of these medical journals and the exact cases in their training data... Give a doctor the same "advantage" (re-diagnosing old cases) and I bet he will perform at better than 80%.

    This is just a contrived AI "test" from the company that is desperate to sell you copilot. Like p-hacking.

  • by ihadafivedigituid ( 8391795 ) on Monday June 30, 2025 @02:34PM (#65486656)
    My experience, going back to the 70s, is that I have to be super assertive to doctors because I've had too many life-threatening fuckups and other nonsense.

    Obvious shit like: my arm is broken and displaced, ER doc wanted to send me off with Tylenol and no X-ray. Or: yo-yo fever up to 104 degrees for three days, severe body pain etc etc ... doc says eh, you have some bug that's going around. I insist, doc takes chest x-ray showing large pneumonia spot deep in a lung. I could go on and on--I am only alive in spite of doctors, I swear.
  • Read the actual article, what they did, was tune a system to get the best results possible from ~80 case studies based on rules they devised for success.

    That is not, even a little, the same as having it evaluate patients.

    These articles are just exhausting. The tech is cool, but no, they don't have an 80% accuracy in diagnosing patients compared to a doctor's 20%.

  • according to company research published Monday.

    says MS funded study on how fantastic it performs

  • Given how long it takes our health system in the UK to diagnose anything, I can see how many people would be sceptical about going to an AI doctors but many would give in out of sheer desperation.

    If you're not familiar with how anything outside of your GP works (which is pretty much everything), any specialist care requires a referral. Because of massive backlogs everywhere, your wait to be seen for the first time by a specialist is usually anywhere between 3 and 12 months. Then, after the first visit they

  • Is that saying AI is good or that Doctors are BAD?
  • And yet I still distrust the AI diagnosis versus the South Asian meat bag with the degree from University of Rwanda.

  • Testing against the "New England Journal of Medicine" is a very poor test. Since, likely the LLMs were trained on that data. It would be much more interesting to test diagnosed patients. Have the LLMs diagnose and a doctor and then check which is correct.

    Testing an LLM against training data is not a good test for the real world.

  • It seems like this could lead to a big advantage for telemedicine and potentially be much cheaper for the customer. "getting to the diagnosis and getting to that diagnosis very cost effectively", I like that.

    "replicates the way human physicians diagnose disease—by analyzing symptoms, ordering tests, and performing further analysis until a diagnosis is reached"

    Some things obviously can't be properly seen via webcam and may not be a candidate for this but for many ailments it could work well, at least a

  • My father had a saying about when you hear hoof beats, its not Zebras. You look for horses first.

    There is not one type of accuracy,but two. Chance of false positives and chance of false negatives. Most of the time you care more about false positives (hey, this test says you have deadly disease when you don't), rather than false negatives (sorry we failed to catch the fact that you have the disease).

    Example: Deadly disease is rare - only happens 4% of the time. Out of 1000 people, 40 people actually have

  • I have never had the same diagnosis from 2 doctors. I had a room of doctors completely disagree on a treatment. Some said bone death imminent without medicine while others thought it will resolve itself. Medicine is definitely not an exact science and needs serious changes in attitude to start healing people instead of focussing on how to charge more money.

For large values of one, one equals two, for small values of two.

Working...