Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Stats Science

Misleading Results From Widely-Used Machine-Learning Data Analysis Techniques (bbc.com) 23

Long-time Slashdot reader kbahey writes: The increased reliance on machine-learning techniques used by thousands of scientists to analyze data, is producing results that are misleading and often completely wrong, according to the BBC.

Dr. Genevera Allen from Rice University in Houston said that the increased use of such systems was contributing to a "crisis in science".

She warned scientists that if they didn't improve their techniques they would be wasting both time and money. Her research was presented at the American Association for the Advancement of Science in Washington.


This is the oft-discussed 'reproducibility problem' in modern science.

The BBC writes that this irreproducibility happens when experiments "aren't designed well enough to ensure that the scientists don't fool themselves and see what they want to see in the results." But machine learning now has apparently become part of the problem.

Dr. Allen asks "If we had an additional dataset would we see the same scientific discovery or principle...? Unfortunately the answer is often probably not.â
This discussion has been archived. No new comments can be posted.

Misleading Results From Widely-Used Machine-Learning Data Analysis Techniques

Comments Filter:
  • Something I never liked about machine learning, and 'new fangled AI' in general is how opaque it is, you get fast interesting results but you can not explain how you got them or defend them directly. But GOFAI techniques are out of style right now, and it is getting worse as GOFAI systems are so much slower and resource intensive not to mention require so much more domain knowledge to set up and just can not compete with the sexy instant gratification that machine learning can give you.. or give your custo
    • I had to look up "GOFAI". Turns out it refers to symbolic reasoning type systems (good old-fasioned AI).

      y. But GOFAI techniques are out of style right now, and it is getting worse as GOFAI systems are so much slower and resource intensive not to mention require so much more domain knowledge to set up and just can not compete with the sexy instant gratification that machine learning can give you..

      Well yeah... and symbolic systems haven't delivered the results. Take for example the bit of machine learning whi

      • by jythie ( 914043 )
        True, they struggle to, as you say, deliver results. But what they do produce, you can explain and validate. ML gives you answers, quickly, which is great as long as the answers don't actually matter. That has made it great for recommendations and toys, but is really alarming when applied to more serious stuff.

        Moving further away from consumer products and services, there is the worrying limitation of ML that, well, you don't really learn anything from it. Seeing it used in research always worries me s
        • That has made it great for recommendations and toys, but is really alarming when applied to more serious stuff.

          I don't entirely agree. I mean I accept your point, but I don't agree with the conclusion in all cases. Take self driving cars for example. These are heavily dependent on ML, and certainly important.

          I reckon though we don't *need* to know why exactly it does the things it does, because humans aren't statistically very good drivers. The aim isn't to replace a perfect system with a cheaper one, it's

    • Re:Black Box (Score:4, Interesting)

      by epine ( 68316 ) on Saturday February 16, 2019 @09:00PM (#58132958)

      sexy instant gratification

      Deep neural networks barely made it through a decade-long siege of Leningrad where it became so unfashionable it was almost left to die in the snow. Is that your definition of "instant gratification"?

      Humans are equally terrible at articulating many of our fundamental skills. Even grand master chess players only manage to articulate a pedagogical narrative, and not the real thing.

      It does bug me sometimes that people forget that 90% of the reason we like our machines is they provide complementary abilities: massive databases with total recall, blinding fast arithmetic, rarely ever making an error, sub-microseconds reaction times rather than tens of milliseconds. Where we're at now is substituting mechanical systems that overlap key human competences, where the mechanical system is nowhere near as good on many dimensions, but nowhere near as erratic as human performance, either.

      Finally, wherever did this idea originate that big messy systems were going to have clean analytic decompositions?

      Back in the 1950s the excuse for this view was that when you only have a hammer, everything looks like a nail. When you're limited to a few kilobytes of memory, the computer is applicable to a few classes of extremely analytic systems, where no part is giant and messy. But actually, DNN systems for machine translation require hundreds of megabytes. Because human language is extremely messy. In NLP, the GOFAI agenda was only ever aimed at some kind of highly constrained conlang, which encapsulated a dense, proposition nucleus (completely bereft of metaphor) entirely unlike any human language ever spoken.

      At no point in the last forty years have I not regarded GOFAI as some kind of adolescent SF fantasy reified.

      Do you look at Winograd's work from 1970 and see a glass half full or a glass half empty? It was cool for its day, but as a software engineer, I always thought to myself "this dog doesn't scale". And I was right. There was no era of SHRDLU 2.0 or SHRDLU 3.0. The analytic complexity in this domain scaled far faster than the analytic ingenuity of Terry Winograd's graduate students.

      So much for Lisp. Then along came Prolog: another scaling disaster.

      Perhaps once we refine the DNN and invent the first DNN rectifier (mapping a messy world onto a clean, orthogonal conceptual world) maybe we'll finally find a good home for the kind of cleverness we once thought of as the whole AI cheese plate.

  • Not surprised (Score:4, Interesting)

    by Anonymous Coward on Saturday February 16, 2019 @04:09PM (#58132106)

    I worked as a ML researcher in a science lab. Was often asked for results they wanted rather than good methodology, which I pushed back hard on, but the lab frequently contracted out analysis and then chose which results they liked best for publication. They got a few publications in Nature. Don't trust the ML results of any science paper unless they fully present and you understand their data, methodology, and statistics, and even then take things with a grain of salt.

    • Pretty much my take. This repeatability problem has absolutely nothing to do with ML and everything to do with researchers hammering a dataset into a form that says what they wanted to say when they first started the research. Something that has been happening since before the invention of the transistor.

      Only reason this is coming to light now is the likes of sci hub made the garbage widely accessible, rather than to only those few who already knew it was probably bs before they opened it.

  • Yes, wel, that is what one does, look for patterns in the data. But the idea is that the data is a good representation of the real world, and that the patterns you find can be generalised to something useful. If you are just drawing conclusions from whatever your machine learning algorithm finds in the data, you need to look over your method, research questions and evaluation.

    (Discalimer, the article doesn't give any details, and briefly mentions astronomy and biomedical research, areas I am not too familia

  • It was a common phenomenon to be observed ever since a complex methodology existed that researchers, especially the most successfully extraverted, did not understand what they were doing, analysis-wise. But it is a relief, I suppose, for machine-learning metholodists, that they for sure find the find-what-I-want switch easily. Amen ...

Technology is dominated by those who manage what they do not understand.

Working...