Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
AI Science Technology

AI Trained On Old Scientific Papers Makes Discoveries Humans Missed (vice.com) 149

An anonymous reader quotes a report from Motherboard: In a study published in Nature on July 3, researchers from the Lawrence Berkeley National Laboratory used an algorithm called Word2Vec sift through scientific papers for connections humans had missed. Their algorithm then spit out predictions for possible thermoelectric materials, which convert heat to energy and are used in many heating and cooling applications. The algorithm didn't know the definition of thermoelectric, though. It received no training in materials science. Using only word associations, the algorithm was able to provide candidates for future thermoelectric materials, some of which may be better than those we currently use.

To train the algorithm, the researchers assessed the language in 3.3 million abstracts related to material science, ending up with a vocabulary of about 500,000 words. They fed the abstracts to Word2vec, which used machine learning to analyze relationships between words. Using just the words found in scientific abstracts, the algorithm was able to understand concepts such as the periodic table and the chemical structure of molecules. The algorithm linked words that were found close together, creating vectors of related words that helped define concepts. In some cases, words were linked to thermoelectric concepts but had never been written about as thermoelectric in any abstract they surveyed. This gap in knowledge is hard to catch with a human eye, but easy for an algorithm to spot. After showing its capacity to predict future materials, researchers took their work back in time, virtually. They scrapped recent data and tested the algorithm on old papers, seeing if it could predict scientific discoveries before they happened. Once again, the algorithm worked.
"In one experiment, researchers analyzed only papers published before 2009 and were able to predict one of the best modern-day thermoelectric materials four years before it was discovered in 2012," the report adds.
This discussion has been archived. No new comments can be posted.

AI Trained On Old Scientific Papers Makes Discoveries Humans Missed

Comments Filter:
  • by Anonymous Coward

    some of which may be better than those we currently use.

    I love to jump to "ultimate" conclusions as much as the next Trumptard but how about we test some of these candidates before we declare Mission Accomplished this time, eh? You know, see if they actually ARE better?

    THEN write the article claiming that, right?

    • by Anonymous Coward

      The AI did discover a material tested and verified as very good in 2012 when fed data only from papers pre-2009.

    • some of which may be better than those we currently use.

      I love to jump to "ultimate" conclusions as much as the next Trumptard but how about we test some of these candidates before we declare Mission Accomplished this time, eh? You know, see if they actually ARE better?

      THEN write the article claiming that, right?

      that's what we get a lot in write ups about 'AI' these days

    • Re: (Score:2, Informative)

      Comment removed based on user account deletion
      • The issue is more about how many predictions did it make in relation to the one that is confirmed. Someone has to prioritize and test the options and simply enumerating all possible permutations doesn't help. Coraletive confirmation also doesn't prove anything. Even a blind squirrel finds a nut.

  • by RightwingNutjob ( 1302813 ) on Tuesday July 09, 2019 @10:36PM (#58899690)
    that the moon is made of green cheese, but that wasn't as interesting.

    This approach would be more interesting in something like drug discovery where the mechanism by which magic happens is to try possible molecules you can synthesize to see what works. People write whole dissertations on the effect of one molecule on one mechanism in one type of cell, and more importantly...they also document when there is weak or no correlation between molecules or enzymes and results. These are all little piecemeal results but one can imagine it is a very ripe area for this kind of analysis since both positive and negative results are documented...if one doesn't just confine the search to papers that only get published to tout good news.
    • by phantomfive ( 622387 ) on Tuesday July 09, 2019 @10:41PM (#58899710) Journal
      If it helps you sort through lots of data, it can still be helpful, even if it comes up with some false positives. Word2vec helps give context for words.

      For example, if word A (or material A) is used in the context of a material with certain electrical properties, what other materials (or words) are used in a similar context? From what I can understand based on the parts I can read, it was able to recognize a category of words used in similar contexts, and a researcher was able to label this category as "periodic table elements." That isn't to say it 'understands' what periodic table elements are, just that they are words that happen in a similar context.

      Basically, this is a regular expression on steroids.
      • by Layzej ( 1976930 )

        it was able to recognize a category of words used in similar contexts, and a researcher was able to label this category as "periodic table elements." That isn't to say it 'understands' what periodic table elements are, just that they are words that happen in a similar context.

        Word vectorization is fairly nifty. With well vectorized words you should be able to subtract "man" from "king", add "woman", and end up somewhere in the neighbourhood of "queen". It can reveal information about word relationships beyond simply grouping words with a similar context.

      • "a researcher was able to label this category as "periodic table elements." I'm pretty sure it went further than that; it was able to label elements more or less by column in the periodic table (like halogens, noble gases, alkali metals, transition metals, and perhaps things in between). I'd be surprised if it couldn't.

        "Basically, this is a regular expression on steroids." I suppose you could say that, but it's much more statistically-based, i.e. it figures out *similar* but non-identical contexts, formi

  • "May" (Score:5, Insightful)

    by gurps_npc ( 621217 ) on Tuesday July 09, 2019 @10:37PM (#58899698) Homepage

    Looks like they wrote a program, got "results" - or should I say OUTPUT, but have not actually tested to see if the results actually work yet.

    Until they actually prove the program found things, rather than finding things that some humans think are worth investigating, it means nothing.

    Any idiot (such as this particular idiot) can write software to search through a database and come up with a result that looks good. It needs to be proven to actually be good or you have done nothing worthwhile.

    • Re:"May" (Score:5, Informative)

      by piojo ( 995934 ) on Tuesday July 09, 2019 @10:52PM (#58899744)

      Until they actually prove the program found things, rather than finding things that some humans think are worth investigating, it means nothing.

      From the summary:

      "In one experiment, researchers analyzed only papers published before 2009 and were able to predict one of the best modern-day thermoelectric materials four years before it was discovered in 2012," the report adds.

      So it's not perfect proof, as we can't exclude the possibility of over-fitting, but the criterion you mentioned has already been met.

      • by Anonymous Coward

        It just means that it found the correlations quicker than humans did. What it does not understand, however, is whether or not humans were actually looking for those correlations during those four years.

        Data mining is not really AI. This is just an algorithm that's able to process vast amounts of data without pesky things like eating and sleeping that humans require. It's a nice tool to augment what humans are doing, but it does not replace human scientists. The scientists are still the ones feeding it t

      • by Anonymous Coward

        âoePapers published before 2009â to predict 2012 discoveries excludes overfitting in my book, unless the 2012 work was just rehashing the papers.
        The main thing about these papers is no one reads them, especially with paywall. A professional could really take in like 5 in a month in their field, and theyâ(TM)ll never read results in other fields that overlap and apply. You NEED bots to have wide view and correlate all the pieces for presentation to pros.

        • by piojo ( 995934 )

          Papers published before 2009 to predict 2012 discoveries excludes overfitting in my book, unless the 2012 work was just rehashing the papers.

          Using the same dataset for training and testing is the problem, not the dates. It would prevent over-fitting if 2006-2007 papers were used to develop the algorithm until they could predict a 2010 discovery, then the same algorithm was shown to be able to predict a 2012 discovery based on the 2008-2009 papers. The problem is that when you tweak a stock market algorithm enough that it could have predicted the 2008 stock market crash, it won't necessarily be able to predict any other stock market crash. The pr

    • Re:"May" (Score:5, Interesting)

      by HiThere ( 15173 ) <charleshixsn.earthlink@net> on Tuesday July 09, 2019 @11:02PM (#58899750)

      Actually, one good prediction was reported in the summary. Trained on data from before 2009, it predicted a material that has since been discovered, and is good enough that it's in use.

      Computer scientists aren't generally materials engineers, so they probably don't have access to ways to synthesize and test the things that haven't been found useful. But this sounds like a good way to improve the efficiency of finding good targets to test.

      • Re:"May" (Score:4, Insightful)

        by Rockoon ( 1252108 ) on Tuesday July 09, 2019 @11:56PM (#58899918)

        Actually, one good prediction was reported in the summary

        ..out of how many predictions?

        The devil is in the details.

        • by Anonymous Coward on Wednesday July 10, 2019 @12:34AM (#58900010)
          Five.

          It was fifth in the 2009 dataset.

          Numbers two and four were also found to be thermoelectric.

          Figure 3 from the paper [springernature.com]
          b, The top five predictions from the year 2009 dataset, and evolution of their prediction ranks as more data are collected. The marker indicates the year of first published report of one of the initial top five predictions as a thermoelectric
          • Five.

            b, The top five predictions from the year 2009 dataset

            So it wasnt 5 predictions. Instead they scored more than 50 predictions, clear as day in that image you linked to. The prediction wasn't "these are the top 5" -- the predictions were very numerous, and the "score" given that makes them "top 5" isnt based on prediction but instead based on after-the-fact post-analysis of more than 50 predictions.

            This is data dredging.

            • Re: (Score:2, Informative)

              by Anonymous Coward

              The prediction wasn't "these are the top 5" -- the predictions were very numerous, and the "score" given that makes them "top 5" isnt based on prediction but instead based on after-the-fact post-analysis of more than 50 predictions.

              No, the ranking was not base on after-the-fact post-analysis.

              A total of 9,483 compounds overlap between the two datasets, of which 7,663 were never mentioned alongside thermoelectric keywords in our text corpus and can be con-sidered candidates for prediction. To obtain the predictions, we ranked each of these 7,663 compounds by the dot product of their normalized output embedding with the word embedding of ‘thermoelectric’ (see Supplementary Information sections S1 and S3 regarding the use o

        • Actually, one good prediction was reported in the summary

          ..out of how many predictions?

          The devil is in the details.

          And the training set was probably specific construed to see if the AI could guess the invention, so the choice of data could easly also create a Clever Hans effect.

      • Don't get me wrong -- it's very, very encouraging that the AI predicted that material that in fact has been discovered. I'm really excited by that result.

        But what about false positives? This algorithm could very well predict lots of things. How many of them actually get verified?

        (It must be said of course that human beings have a somewhat embarrassing rate of false positives.)

        • by HiThere ( 15173 )

          I'm rather reluctant to call this an AI. It seems to be just a classifier. Classifiers are a necessary part of an AI, but I don't think they're sufficient. It was explicitly mentioned (in the summary) that this was done purely by statistical correlation without any model of what was behind the correlation.

  • I feel the write up is missing the most important piece of information, when it spat out the list of predictions how long was it, was it a list of one item or a list of thousands of items and hence "Oh it got that prediction right, its on the list", how long is the list "Just a short list, only a couple of thousand items".

  • by SirAstral ( 1349985 ) on Tuesday July 09, 2019 @11:35PM (#58899858)

    Stop calling a matching algorithm an AI.

    finding things that match in context is nowhere near AI. AI is something else entirely. AI is something that learns itself, NOT something that finds a match and make a prediction. Humans are the ones learning, not the algorithm.

    According to Slashdot my toaster has AI because it knows when to pop my bread up before it gets burned!

    • by phantomfive ( 622387 ) on Tuesday July 09, 2019 @11:41PM (#58899872) Journal
      It is AI: it's weak AI, and doesn't pretend to be anything else. (If it did pretend to be something else, wouldn't that be a sign of intelligence?)
      • Re: (Score:2, Informative)

        by SirAstral ( 1349985 )

        no, pretending to be anything is not a sign of intelligence. We have all sorts of plants, animals, insects, etc... that are able to pretend shit and they are not busting out theories on creation.

        Stop altering the definition of what AI is. AI is specifically developing an intelligence that is equivalent to well lets have a dictionary tell you because you actually need to read one.

        https://www.merriam-webster.co... [merriam-webster.com]

        artificial intelligence noun
        Definition of artificial intelligence

        1 : a branch of computer scien

        • no, pretending to be anything is not a sign of intelligence

          It is literally the 2nd definition you quoted: "2 : the capability of a machine to imitate intelligent human behavior". As long as you get the results you want, how are you going to tell the difference between real intelligence and "pretended" intelligence ? And why would you even care about the difference ?

          • I will tell you again, pretending to be anything is not a sign of intelligence. Keep in mind we are talking about AI here, so basic intelligence is not a factor here. Insects are not intelligently mimicking things, they achieved these attributes through evolution not intelligence. Their usage may require some form of intelligent response to stimuli but they did not intelligently create the features that provided the mimicry.

            You have a fundamental misunderstanding of what is going on here.

            The Intelligence

            • by religionofpeas ( 4511805 ) on Wednesday July 10, 2019 @02:33AM (#58900206)

              A car is not intelligent just because you turned the steering wheel right and it when right.

              I never said that, did I ? Please learn to read and address only the stated argument. My argument is that a machine that reaches the same results as a human using their intelligence, is also intelligent. For instance, a self driving car that can drive any random trip from A to B just as well as a normal human driver should be considered intelligent, no matter how it's implemented. Just following simple instructions like "turn right" is not a sign of intelligence, neither in humans nor in cars.

            • Comment removed based on user account deletion
        • Comment removed based on user account deletion
        • by Shaitan ( 22585 )

          "no, pretending to be anything is not a sign of intelligence. We have all sorts of plants, animals, insects, etc... that are able to pretend shit and they are not busting out theories on creation."

          You don't think there are intelligent plants/animals/insects just because they don't spout out theories on creation in a way humans understand? Wow.

          "Algorithms are just as intelligent as bricks. When you fully understand what that means, you hopefully will figure out what I am saying."

          I understand what you are say

    • These kinds of replies happen in literally every single Slashdot story about AI. It's almost like they are posted by a simple bot.

    • by Anonymous Coward

      So you think human intelligence is some magical fairy thing that works according to no knowable mechanic?
      AI isn't some weird incomprehensible thing, it's always just going to be mechanics because all intelligence is just mechanics on a large scale.
      AI has always been things like expert systems and pattern matchers. What did you expect it to be?

    • by Shaitan ( 22585 )

      "finding things that match in context is nowhere near AI."

      Modern AI thinking is probabilistic, with a heavy dose of error-checking and bias toward false-positives to varying degrees. You are talking about a matching algorithm that is being used as a tool by this AI not the AI itself.

      "According to Slashdot my toaster has AI because it knows when to pop my bread up before it gets burned!"

      Does it do so by taking some sort of feedback of toasts and learning how to make better toast? Isn't that what you'd do if

    • I concur 100%.

      Glorified Table Lookup is NOT fucking Artificial Intelligence (A.I.)

      I call it Artificial Ignorance (a.i) at best.

  • If you spit out enough BS, sometimes it sounds like you're on to something. https://www.atrixnet.com/bs-ge... [atrixnet.com]
  • You keep using that word [in relation to AI]. I do not think it means what you think it means.
  • by Anonymous Coward

    There is a company in Switzerland, Iprova, which has been doing this for real companies for many years. Without spilling too many beans, in short they have built a huge database of scientific articles and interesting articles in many technical areas and have build the kind of semantic web the researchers did and then apply a customer requirements for inventions to the semantic web and get out hints of inventions. These are sort of "needles in haystack", i.e. then a specialist in that field must look at the

  • by guacamole ( 24270 ) on Wednesday July 10, 2019 @04:42AM (#58900398)

    We just need to use this algorithm to find out what it is.

    • We just need to use this algorithm to find out what it is.

      Actually, what we've learned over the last couple of decades strongly suggests that there is no one cure for cancer, because cancer isn't one thing. There are many different kinds of cancer, some with clearly different causal agents (many caused by bacterial or viral infections), and they respond differently to different treatments. We've made enormous progress in treatment of some kinds of cancer, less in others.

      Which isn't to say that similar machine learning techniques couldn't provide useful results

  • I grew up reading Asimovs, Clarke, Philip K. Dick. I never thought I would be living in it.
  • It predicted the past, successfully.
    What about the future?

  • "In one experiment, researchers analyzed only papers published before 2009 and were able to predict one of the best modern-day thermoelectric materials four years before it was discovered in 2012," the report adds.

    That's a rather poor result.

    Ask any researcher, and you'll probably learn after a sharp intake of breath that the standard gap between "predicted" (first enthusiastic conversation over beers with your thesis advisor) and "discovered" (aka formally staked out with a series of peer-reviewed papers i

Business is a good game -- lots of competition and minimum of rules. You keep score with money. -- Nolan Bushnell, founder of Atari

Working...