Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
AI Medicine

AI Surpasses Doctors In Spotting Early Breast Cancer Signs In NHS Trial 57

An AI tool named Mia, tested by the NHS, successfully detected signs of breast cancer in 11 women which had been missed by human doctors. The BBC reports: The tool, called Mia, was piloted alongside NHS clinicians and analyzed the mammograms of over 10,000 women. Most of them were cancer-free, but it successfully flagged all of those with symptoms, as well as an extra 11 the doctors did not identify. At their earliest stages, cancers can be extremely small and hard to spot. The BBC saw Mia in action at NHS Grampian, where we were shown tumors that were practically invisible to the human eye. But, depending on their type, they can grow and spread rapidly.

Barbara was one of the 11 patients whose cancer was flagged by Mia but had not been spotted on her scan when it was studied by the hospital radiologists. Because her 6mm tumor was caught so early she had an operation but only needed five days of radiotherapy. Breast cancer patients with tumors which are smaller than 15mm when discovered have a 90% survival rate over the following five years. Barbara said she was pleased the treatment was much less invasive than that of her sister and mother, who had previously also battled the disease. Without the AI tool's assistance, Barbara's cancer would potentially not have been spotted until her next routine mammogram three years later. She had not experienced any noticeable symptoms.
"These results are encouraging and help to highlight the exciting potential AI presents for diagnostics. There is no question that real-life clinical radiologists are essential and irreplaceable, but a clinical radiologist using insights from validated AI tools will increasingly be a formidable force in patient care." said Dr Katharine Halliday, President of the Royal College of Radiologists.
This discussion has been archived. No new comments can be posted.

AI Surpasses Doctors In Spotting Early Breast Cancer Signs In NHS Trial

Comments Filter:
  • by dgatwood ( 11270 ) on Saturday March 23, 2024 @02:16AM (#64338225) Homepage Journal

    There is no question that real-life clinical radiologists are essential and irreplaceable.

    "You keep using that word. I do not think it means what you think it does."

    What this shows is that software can do at least this part of the job better than humans, which is to say that at least for this purpose, software at least arguably could already replace the person, ignoring legal risks associated with doing so, union grievances, and other non-medical concerns.

    More to the point, it stands to reason that eventually, AI will likely be able to do every part of that job better than humans. After all, this is the sort of task that computer vision is particularly good at. There will definitely have to be humans in the loop for a while, if only to confirm that it isn't hallucinating, but it is probably just a matter of time before that job is automated away. Whether that will take five years or a hundred, I couldn't say, but I suspect it is closer to the former than the latter.

    Saying that they are "essential and irreplaceable" seems like it is more a statement intended to keep people from dropping out of those programs in medical school and driving up the cost of labor in the short term before the tech can fully replace them, and to keep people from unionizing and putting up road blocks to AI adoption, rather than any sort of actual statement of fact.

    • This does seem inevitable given enough time, but in the case of doctors, this really does seem like a tool that's not gonna be hitting anywhere near their the center of their expertise for a long, long time. This is a niche usage, really, though AI is already improving productivity and accuracy in some ways. In medical research, protein folding has been absolutely revolutionized by new AI algorithms in the last several years.

      A lot of people don't know about this, but in 2020, most of the older doctors in

      • by shilly ( 142940 ) on Saturday March 23, 2024 @06:38AM (#64338519)

        1. It is not accurate, even remotely, to describe diagnostic radiography as niche. It's a core part of medicine, and even if you just narrow it to breast cancer, given the scale of the NHS breast cancer screening program, it's a big chunk of NHS resource

        2. It is not accurate, even remotely, to say that "most older doctors" died during Covid. Heroic, yes. More than half dead? Absolutely not true. There has been a post-covid productivity crunch for medicine, but that's really an acceleration of existing trends, nothing new

        • 1. Niche is maybe not the wrong word, but I mean it's a small part of what doctors do.

          2. Going by what a doctor at a hospital told me. He said the old guys "all died off" and that the whole industry was struggling a lot to replace their expertise. Maybe he was exaggerating, but a lot of them died. Maybe he meant really old ones, I'm not aware of specific numbers, are you?

          • by Anonymous Coward

            2. Going by what a doctor at a hospital told me. He said the old guys "all died off" and that the whole industry was struggling a lot to replace their expertise. Maybe he was exaggerating, but a lot of them died. Maybe he meant really old ones, I'm not aware of specific numbers, are you?

            Covid death rates among healthcare workers were several times less than the general population.
            https://www.ncbi.nlm.nih.gov/p... [nih.gov]

            The pandemic did a number on the industry generally due to fewer people seeking medical treatment due to misguided policy choices and patient fear of hospital acquired infection.

            Posting as AC as this is off-topic.

            • 2. Going by what a doctor at a hospital told me. He said the old guys "all died off" and that the whole industry was struggling a lot to replace their expertise.

              This is one of the problems with anecdote. It's possible that this hospital had a higher proportion of health-care worker deaths than most. Luckily, we now live In a World Where we can sometimes address questions using actual data. Excess Mortality Among US Physicians During the COVID-19 Pandemic [jamanetwork.com]:

              From March 2020 through December 2021, there were 4511 deaths (representing 622 [95% CI, 476-769] more deaths than expected) among a monthly mean (SD) of 785631 (8293.5) physicians.... There were 43 (95% CI, 33-53) excess deaths per 100000 person-years.

              There was a strong age gradient among active physicians providing direct patient care, with excess deaths per 100 000 person-years of 10 (95% CI, 3-17) in the youngest group and 182 (95% CI, 98-267) in the oldest group (Figure, A). Within all age groups, physicians had substantially lower excess mortality than the general population (Figure, A). Nonactive physicians had the highest excess deaths per 100000 person-years (140; 95% CI, 100-181) compared with active physicians providing direct patient care (27; 95% CI, 18-35) and active physicians not providing direct patient care (22; 95% CI, –8 to 51) but a substantially lower excess mortality rate than the general population (294; 95% CI, 292-296).

              • Hmm, he claimed it was like that everywhere, not just his hospital. Also, he was referring to old physicians, which is where the deaths would have been concentrated, especially since they are considerably less numerous as a percentage of that population. And especially if hospital precautions were particularly effective for younger people.

                So I don't see how that blurb necessarily conflicts with that doctor's claim.

      • not gonna be hitting anywhere near their the center of their expertise for a long, long time.

        Not true at all. Radiology is a core medical specialty, and this research confirms that AI does it better in every way.

        The AI identified ALL of the positives that the human radiologists found, plus more they missed.

        The only things human radiologists add to the process are errors and delay.

        The only reasons we still have human radiologists are inertia, vested interests, and legal liability.

        • Well, the article didn't say anything about false positives, which has been a problem with such AI in the past. Frankly, the article was short on information and sounded like marketing.

          Anyway, doctors do a heck of a lot more than analyze scans, and this doesn't even remove doctors from that responsibility, it just helps. Dealing with patients and diagnosis is what I consider their core competency, and that's what AI sucks at the most.

        • by jvkjvk ( 102057 )

          What is the false positive rate? You say they identified all the positives plus more, but if they identify 100% as positive that would also occur.

          If the false positive rate is too high, this is actually worse care, overall, than radiologists, despite higher detection rates.

          • https://www.amazon.com/Mammogr... [amazon.com]
            "'This book gives plenty of examples of ad hominem attacks, intimidation, slander, threats of litigation, deception, dishonesty, lies and other violations of good scientific practice. For some years I kept a folder labeled Dishonesty in breast cancer screening on top of my filing cabinet, storing articles and letters to the editor that contained statements I knew were dishonest. Eventually I gave up on the idea of writing a paper about this collection, as the number of examp

            • "G-BOMBS: The anti-cancer foods that should be in your diet right now"
              https://www.drfuhrman.com/blog... [drfuhrman.com]
              "Looking for the biggest bang for your caloric buck? Remember the acronym G-BOMBS, which stands for Greens, Beans, Onions, Mushrooms, Berries and Seeds. These foods fuel your body with protective micronutrients and phytochemicals that support your immune defenses and have a wide range of health-promoting effects. And hereâ(TM)s a bonus: Theyâ(TM)re delicious!"

              For anyone worried about any type of

            • From the stories I've seen, false positives have been the main problem so far with AI diagnosis. The question is, are they worse than human doctors, or better? This story is so vague and mentions nothing about the false positive rate.

              If the rate is 9 out of 10 for humans WRT mammography, then it seems AI would help a lot. In any case, AI will improve over time. Humans won't.

          • I didn't say "they identified all the positives plus more." I said the article tells us nothing about the false positives. But they are the main issue with AI here, up till this point, even if humans are worse overall. I fully expect AI will eclipse human accuracy in time, especially if it can take into account other information, since human doctors are so time constrained they just can't take all factors into account because they can't bothered to read a patient's entire medical history every single tim

    • by evanh ( 627108 )

      Spotting this stuff is just one small skill. It's certainly not a job.

      Great that these tools can do something useful finally ... after 50 years of trying.

    • What this shows is that software can do at least this part of the job better than humans, which is to say that at least for this purpose, software at least arguably could already replace the person

      No. The software can replace the task, that is a very big difference to replacing the person. For that the software would need to replace *all* their tasks. Clinical radiologists do more than sit and stare at mammograms. It's a wide ranging field made up of not identifying cancers in certain images, but also making decisions on how to scan, what technique to use, how to get a better view if something is uncertain, and recommending courses of treatment.

      They also look for more than just what they are told, it

    • What this shows is that software can do at least this part of the job better than humans, which is to say that at least for this purpose, software at least arguably could already replace the person, ignoring legal risks associated with doing so, union grievances, and other non-medical concerns.

      That is not what the article claims and the reported results of the trial clearly do not support this conclusion.

      In a cancer detection problem, there are two types of errors. The first kind of error is detecting c

      • Yep. This. Thanks for taking the time to lay it out so clearly!
      • What is difficult is to make a machine which makes *both* errors as small as possible.

        It doesn't make sense to directly optimize two numbers. https://en.wikipedia.org/wiki/... [wikipedia.org] Generally it's best to get it down to a single number. In this case, it could be done by estimating the percentage of women who receive tests who have cancer (using historical information) and then minimize the probability that the test is wrong.

        A better approach would be to realize the two types of errors are not equivalent.

        • It doesn't make sense to directly optimize two numbers.

          That is of course the point.The detection problem is two dimensional, and has the structure of a partially ordered set. Creating an arbitrary surrogate function of the two numbers merely changes the true problem into an easier special case with a strong bias. You're free to invent something that matches your personal preferences, but it won't convince anyone else I'm afraid.

          The utility/cost function approach is great in theory because it is the sim

          • Creating an arbitrary surrogate function of the two numbers merely changes the true problem into an easier special case with a strong bias. You're free to invent something that matches your personal preferences, but it won't convince anyone else I'm afraid.

            Typically it's not arbitrary, but some might say it's cruel to put a value on human life. However, it is necessary to make a decision. In terms of this discussion, this is an issue that both humans and AI (or more accurately the people that design th

            • 1) Arbitrary is precisely that: the choice is an individual one, nothing more, nothing less. It's not a question of cruelty either, it's precisely an issue of disagreements which cannot logically be reconciled: the proposals form a partially ordered set [wikipedia.org]

              2) Absolutely. The performance of human detectors is also partially ordered. In fact, *all* binary classifiers for this problem are partially ordered, the theory is well established and goes back to the 1940s. That is one reason nobody wastes time trying

              • Absolutely. The performance of human detectors is also partially ordered. In fact, *all* binary classifiers for this problem are partially ordered, the theory is well established and goes back to the 1940s. That is one reason nobody wastes time trying to find "the best human detector".

                So your problem is not with the AI but with multivariate optimization. This issues goes back before the 1940s. Vilfredo Pareto died in 1923, but it was likely studied before that. And there is a lot of research on these t

    • by dvice ( 6309704 )

      I think there are several points that people miss
      1. AI is getting better at getting better, and that is happening really fast. Every year, it takes less time for AI to learn a new skill from zero to better than human level. This can be seen from this graph: https://ourworldindata.org/ima... [ourworldindata.org]
      Currently Deepmind is working with multimodal systems that will take these improvements into the next level, because you can train just parts of the AI individually.
      2. If a machine can do a single task in a job that takes

    • Itâ(TM)s probably a statement that comes from the experience that women have with breast radiologists, who also biopsy the lesions they find, discuss really difficult diagnoses & options, and may come to care for them longitudinally. Many radiologists perform image guided procedures, but breast radiologists typically have the most regular, ongoing care of women in breast cancer screening and cancer treatment. That being said, your local hospital would grind to a halt without image guided biopsy, dr
    • It's just kowtowing to the powerful.

    • There have been a system that did much better than clinical radiologists before .... it was a collection of pigeons that were trained to react to what looked like cancerous cells on mammograms ...

      This is something humans are not good at, looking at hundreds of slides to spot tiny anomalies when most of them will not have any ,..

  • If we stop doing something and replace real experts with AI, then we will have no benchmark to measure how good AI to compare with.
    • You just need a dozen experts. Not ten thousands.
      • But you need tens of thousands of experts to cultivate a community of practitioners that can do research, have insights, critically examine current & prior practices, & come up with new ways of looking at problems & solving them. Maybe AI can assist with these under certain circumstances but they can't replace experts.
    • If we stop doing something and replace real experts with AI, then we will have no benchmark to measure how good AI to compare with.

      The benchmark is how the disease progresses.

      If the AI says it's cancer, a biopsy can confirm.

      If the AI says it's not cancer and the tumor grows, then it was wrong.

      We know the false-positive and false-negative error rates for human radiologists, so we can compare them to the AI.

  • by GrumpySteen ( 1250194 ) on Saturday March 23, 2024 @04:01AM (#64338321)

    It's easy to catch more cancers than doctors do if you crank up the settings and flag everything that even remotely looks like cancer.

    I found what appears to be the study [nih.gov] and Mia flagged 13.0% of the scans vs a human flagging just 5.4%. That's a lot of false positives that had to be examined by humans to find 11 real positives.

    Mia might be a useful tool someday, but the press on this is putting way too positive a spin on the results.

    • Conveniently you leave out that initially the false positive was 47% but plummeted to 13% after a software upgrade. Here's a question, what was the false positive rate of humans the first time they were checking for cancer?

      It's almost as if when you're doing something for the first time you'll make mistakes and have to learn from those mistakes to get better at doing it.

    • Uh, which is worse .. being told you have cancer when you actually don't ... or being told you don't have cancer when you do? (And inevitably finding out later that oops it's untreatable now.)

      • Actually both are very bad results, one might be worse than the other (for the person involved), but screening programmes aren't viable or acceptable if the false positive rate is high and then the follow-up for that is potentially devastating emotionally and involves lots of even more invasive tests. Lookup the widely accepted Wilson criteria for screening, which includes the need for tests to be specific as well as sensitive.

      • by jsonn ( 792303 )
        Many tumors are benign as they grow extremely slow (if at all) and are well isolated. Half of all women, for example, have Fibrocystic changes that show up in this screening that are completely harmless and non-cancerous. The diagnosis is intrusive and slow, resulting in a lot of stress. Even ignoring the cost in a failed healthcare system like the USA, false positives have a very real material impact. This is why breast cancer screenings outside risk groups have been reduced a lot since the 80s - the kinds
    • by shilly ( 142940 )

      There are two potential arguments to be made regarding false positives:
      1. Every false positive causes harm, eg invasive tests, pointless treatment, emotional harm, etc. So we want false positive rates as low as possible
      2. Every false positive cuts productivity, because time and effort is spent on this case instead of others

      On the former, no question, high false positive rates are unequivocally a bad thing. But the balance between sensitivity and specificity is tricksy, and there's plenty of harm caused by f

      • It's scenario specific, but in general if you had to choose between false positives and false negatives, false positives are better. In most cases the treatment of a stage I cancer is mild whereas the cost of ignored cancer is nearly always catastrophic. Let's not forget most cancers caught in stage I have a 95% cure rate, but if caught in stage IV they are 95% incurable.

        • by shilly ( 142940 )

          It really isn't as simple as that for whole populations and breast cancer screening. To understand the balance of benefits and harms, you have to determine the correct age cohorts, screening intervals, etc. Tx for breast cancer is aggressive whatever the stage and it's a very scary disease to be told you've got. Plenty of women opt for radical lumpectomies or mastectomies, which are obviously major and traumatic surgeries. And the relative number of lives saved is lower than many expect. On balance, it's st

    • Indeed. False positives are a real killer. And it seems there are a lot, making this essentially unusable.

      Lying by misdirection.

      • In most scenarios, very few people would die from a treatment regime aimed at stage I cancer (assuming they won't bother with a human confirmation of the AI flagged tumor). It would be annoying, scary (mitigated somewhat because you inform them there's a chance of false positive), and a waste of time for sure but the resulting death rate and overall expenses won't match the harm caused by false negatives. People given false negative would be in much greater risk of dying.

        • by gweihir ( 88907 )

          That is not the issue. The issue is that with a false positive rate this high (and I am talking about the improved rate they report after tuning the system), massiv effort for result validation becomes necessary. Otherwise that false positive rate simply overwhelms the treatment possibilities and then people die because they have to wait too long to be identified as a true positive. In addition, a false positive rate this high qualifies directly as medical malpractice, which opens the door for civil liabili

  • Let's see if I have to choose between AI saving my life vs. some dude having a job. I'm sorry but I'm choosing the AI. Won't mind paying that dude something to site home and watch Netflix .. but you're going to have to repeal the second amendment if you think that second class healthcare is an acceptable price for keeping people employed. Figure out politically how people can get the AI's paycheck.

    • This would be more likely to flag you as having cancer than a human, and so you would be subjected to unnecessary surgery and treatment ...

      I prefer a combination of both, get the AI to do the grunt work, and let the human do what they are good at checking the small number flagged buy the AI in detail

  • by az-saguaro ( 1231754 ) on Saturday March 23, 2024 @05:42AM (#64338435)

    Once again, a public media outlet is reinterpreting a technical paper for hearts-minds-money.
    https://www.bbc.com/news/techn... [bbc.com]
    Although in fairness, the outlet is the BBC, so maybe not money, but some sort of feel-good do-good reporter-earns-a-brownie-point sensationalistic story.

    The real story at is at
    https://www.ncbi.nlm.nih.gov/p... [nih.gov]
    It is super cut-and-dry statistics, not clearly written, took three or four reads to understand some of the contradictory or poorly clarified things.
    And, for me, it smacks of a lot of statistical double-speak. THAT IS NOT to imply it is false or misleading.. It seems to be a very well done study, accurate, honest, but the Brits, or at least "those guys" on the paper ought to learn to write English with some clarity.

    Here is the article abstract:

    Impact of Different Mammography Systems on Artificial Intelligence Performance in Breast Cancer Screening
    Abstract
    Artificial intelligence (AI) tools may assist breast screening mammography programs, but limited evidence supports their generalizability to new settings. This retrospective study used a 3-year dataset (April 1, 2016–March 31, 2019) from a U.K. regional screening program. The performance of a commercially available breast screening AI algorithm was assessed with a prespecified and site-specific decision threshold to evaluate whether its performance was transferable to a new clinical site. The dataset consisted of women (aged approximately 50–70 years) who attended routine screening, excluding self-referrals, those with complex physical requirements, those who had undergone a previous mastectomy, and those who underwent screening that had technical recalls or did not have the four standard image views. In total, 55916 screening attendees (mean age, 60 years ± 6 [SD]) met the inclusion criteria. The prespecified threshold resulted in high recall rates (48.3%, 21929 of 45444), which reduced to 13.0% (5896 of 45444) following threshold calibration, closer to the observed service level (5.0%, 2774 of 55916). Recall rates also increased approximately threefold following a software upgrade on the mammography equipment, requiring per–software version thresholds. Using software-specific thresholds, the AI algorithm would have recalled 277 of 303 (91.4%) screen-detected cancers and 47 of 138 (34.1%) interval cancers. AI performance and thresholds should be validated for new clinical settings before deployment, while quality assurance systems should monitor AI performance for consistency.

    They wanted to see if an AI assistive tool, trained on the data at one site, could work well or be retrained for another site. IT WAS NOT to test the validity or accuracy of AI in making better breast cancer diagnoses.

    Of note is the idea of screening and recall rates. Mammogram screens are done for women who have no reason to think there is disease, just getting the preventive screens that might catch a sub-clinical lesion in its early stages. Mammograms by themselves are not diagnostic, just suggestive. On a positive study, suspicious lesions are "recalled", woman notified and invited back for further diagnosis-making studies such as biopsy.

    In the U.S., proper recall rate is considered about 12%. Anything less, and suspicious lesions are being overlooked that the average radiologist should have picked up. More than that, and non-malignant changes are being flagged resulting in unnecessary additional testing. In England, the expected recall rate is 9% using their mandatory "double reading with arbitration" method which is basically a two-out-of-three majority read which improves accuracy over the 12% single reader benchmark.

    The AI was a commercial product made by Canon Medical Research Europe.
    The study is comparing the AI predictions against a data set of already worked-up and confirmed patients and dia

    • To add to the confusion, "recall" has a specific meaning in this context, and a specific and entirely different meaning in AI [wikipedia.org]: "Recall (also known as sensitivity) is the fraction of relevant instances that were retrieved," also known as true positive rate = (number of true positives) / ( true positives + false negatives )
    • The real story at is at https://www.ncbi.nlm.nih.gov/p [nih.gov]... It is super cut-and-dry statistics, not clearly written, took three or four reads to understand some of the contradictory or poorly clarified things.

      Well that directly contradicts the BBC article since they say

      The Mia trial is just one early test, by one product in one location. The University of Aberdeen independently validated the research, but the results of the evaluation have not yet been peer reviewed.

      I'm sure there's lots of published a

  • at finding "invisible" gorillas?
    https://www.ncbi.nlm.nih.gov/p... [nih.gov]

  • According to a January 10, 2024 report in The Guardian, the UK has some of the worst cancer survival rates in the developed world.
    https://www.theguardian.com/so... [theguardian.com]

  • My unprofessional medical opinion is that this success is totally based on the number of bare breasts in Google Search results.

A committee takes root and grows, it flowers, wilts and dies, scattering the seed from which other committees will bloom. -- Parkinson

Working...