AI Trained On Old Scientific Papers Makes Discoveries Humans Missed (vice.com) 149
An anonymous reader quotes a report from Motherboard: In a study published in Nature on July 3, researchers from the Lawrence Berkeley National Laboratory used an algorithm called Word2Vec sift through scientific papers for connections humans had missed. Their algorithm then spit out predictions for possible thermoelectric materials, which convert heat to energy and are used in many heating and cooling applications. The algorithm didn't know the definition of thermoelectric, though. It received no training in materials science. Using only word associations, the algorithm was able to provide candidates for future thermoelectric materials, some of which may be better than those we currently use.
To train the algorithm, the researchers assessed the language in 3.3 million abstracts related to material science, ending up with a vocabulary of about 500,000 words. They fed the abstracts to Word2vec, which used machine learning to analyze relationships between words. Using just the words found in scientific abstracts, the algorithm was able to understand concepts such as the periodic table and the chemical structure of molecules. The algorithm linked words that were found close together, creating vectors of related words that helped define concepts. In some cases, words were linked to thermoelectric concepts but had never been written about as thermoelectric in any abstract they surveyed. This gap in knowledge is hard to catch with a human eye, but easy for an algorithm to spot. After showing its capacity to predict future materials, researchers took their work back in time, virtually. They scrapped recent data and tested the algorithm on old papers, seeing if it could predict scientific discoveries before they happened. Once again, the algorithm worked. "In one experiment, researchers analyzed only papers published before 2009 and were able to predict one of the best modern-day thermoelectric materials four years before it was discovered in 2012," the report adds.
To train the algorithm, the researchers assessed the language in 3.3 million abstracts related to material science, ending up with a vocabulary of about 500,000 words. They fed the abstracts to Word2vec, which used machine learning to analyze relationships between words. Using just the words found in scientific abstracts, the algorithm was able to understand concepts such as the periodic table and the chemical structure of molecules. The algorithm linked words that were found close together, creating vectors of related words that helped define concepts. In some cases, words were linked to thermoelectric concepts but had never been written about as thermoelectric in any abstract they surveyed. This gap in knowledge is hard to catch with a human eye, but easy for an algorithm to spot. After showing its capacity to predict future materials, researchers took their work back in time, virtually. They scrapped recent data and tested the algorithm on old papers, seeing if it could predict scientific discoveries before they happened. Once again, the algorithm worked. "In one experiment, researchers analyzed only papers published before 2009 and were able to predict one of the best modern-day thermoelectric materials four years before it was discovered in 2012," the report adds.
"Makes discoveries" - not yet dammit, not yet. (Score:2, Insightful)
some of which may be better than those we currently use.
I love to jump to "ultimate" conclusions as much as the next Trumptard but how about we test some of these candidates before we declare Mission Accomplished this time, eh? You know, see if they actually ARE better?
THEN write the article claiming that, right?
Re: "Makes discoveries" - not yet dammit, not yet. (Score:2, Informative)
The AI did discover a material tested and verified as very good in 2012 when fed data only from papers pre-2009.
Re: (Score:1)
some of which may be better than those we currently use.
I love to jump to "ultimate" conclusions as much as the next Trumptard but how about we test some of these candidates before we declare Mission Accomplished this time, eh? You know, see if they actually ARE better?
THEN write the article claiming that, right?
that's what we get a lot in write ups about 'AI' these days
Re: (Score:2, Informative)
Re: "Makes discoveries" - not yet dammit, not yet. (Score:1)
The issue is more about how many predictions did it make in relation to the one that is confirmed. Someone has to prioritize and test the options and simply enumerating all possible permutations doesn't help. Coraletive confirmation also doesn't prove anything. Even a blind squirrel finds a nut.
Re: (Score:2)
One assumes they also predicted (Score:4, Insightful)
This approach would be more interesting in something like drug discovery where the mechanism by which magic happens is to try possible molecules you can synthesize to see what works. People write whole dissertations on the effect of one molecule on one mechanism in one type of cell, and more importantly...they also document when there is weak or no correlation between molecules or enzymes and results. These are all little piecemeal results but one can imagine it is a very ripe area for this kind of analysis since both positive and negative results are documented...if one doesn't just confine the search to papers that only get published to tout good news.
Re:One assumes they also predicted (Score:5, Insightful)
For example, if word A (or material A) is used in the context of a material with certain electrical properties, what other materials (or words) are used in a similar context? From what I can understand based on the parts I can read, it was able to recognize a category of words used in similar contexts, and a researcher was able to label this category as "periodic table elements." That isn't to say it 'understands' what periodic table elements are, just that they are words that happen in a similar context.
Basically, this is a regular expression on steroids.
Re: (Score:3)
it was able to recognize a category of words used in similar contexts, and a researcher was able to label this category as "periodic table elements." That isn't to say it 'understands' what periodic table elements are, just that they are words that happen in a similar context.
Word vectorization is fairly nifty. With well vectorized words you should be able to subtract "man" from "king", add "woman", and end up somewhere in the neighbourhood of "queen". It can reveal information about word relationships beyond simply grouping words with a similar context.
Re: (Score:2)
"a researcher was able to label this category as "periodic table elements." I'm pretty sure it went further than that; it was able to label elements more or less by column in the periodic table (like halogens, noble gases, alkali metals, transition metals, and perhaps things in between). I'd be surprised if it couldn't.
"Basically, this is a regular expression on steroids." I suppose you could say that, but it's much more statistically-based, i.e. it figures out *similar* but non-identical contexts, formi
"May" (Score:5, Insightful)
Looks like they wrote a program, got "results" - or should I say OUTPUT, but have not actually tested to see if the results actually work yet.
Until they actually prove the program found things, rather than finding things that some humans think are worth investigating, it means nothing.
Any idiot (such as this particular idiot) can write software to search through a database and come up with a result that looks good. It needs to be proven to actually be good or you have done nothing worthwhile.
Re:"May" (Score:5, Informative)
Until they actually prove the program found things, rather than finding things that some humans think are worth investigating, it means nothing.
From the summary:
"In one experiment, researchers analyzed only papers published before 2009 and were able to predict one of the best modern-day thermoelectric materials four years before it was discovered in 2012," the report adds.
So it's not perfect proof, as we can't exclude the possibility of over-fitting, but the criterion you mentioned has already been met.
Re: (Score:1)
It just means that it found the correlations quicker than humans did. What it does not understand, however, is whether or not humans were actually looking for those correlations during those four years.
Data mining is not really AI. This is just an algorithm that's able to process vast amounts of data without pesky things like eating and sleeping that humans require. It's a nice tool to augment what humans are doing, but it does not replace human scientists. The scientists are still the ones feeding it t
Re: (Score:2)
Voting for Trump?
Re: "May" (Score:1)
âoePapers published before 2009â to predict 2012 discoveries excludes overfitting in my book, unless the 2012 work was just rehashing the papers.
The main thing about these papers is no one reads them, especially with paywall. A professional could really take in like 5 in a month in their field, and theyâ(TM)ll never read results in other fields that overlap and apply. You NEED bots to have wide view and correlate all the pieces for presentation to pros.
Re: (Score:2)
Papers published before 2009 to predict 2012 discoveries excludes overfitting in my book, unless the 2012 work was just rehashing the papers.
Using the same dataset for training and testing is the problem, not the dates. It would prevent over-fitting if 2006-2007 papers were used to develop the algorithm until they could predict a 2010 discovery, then the same algorithm was shown to be able to predict a 2012 discovery based on the 2008-2009 papers. The problem is that when you tweak a stock market algorithm enough that it could have predicted the 2008 stock market crash, it won't necessarily be able to predict any other stock market crash. The pr
Re:"May" (Score:5, Interesting)
Actually, one good prediction was reported in the summary. Trained on data from before 2009, it predicted a material that has since been discovered, and is good enough that it's in use.
Computer scientists aren't generally materials engineers, so they probably don't have access to ways to synthesize and test the things that haven't been found useful. But this sounds like a good way to improve the efficiency of finding good targets to test.
Re:"May" (Score:4, Insightful)
Actually, one good prediction was reported in the summary
The devil is in the details.
..out of how many predictions? (Score:5, Informative)
It was fifth in the 2009 dataset.
Numbers two and four were also found to be thermoelectric.
Figure 3 from the paper [springernature.com]
b, The top five predictions from the year 2009 dataset, and evolution of their prediction ranks as more data are collected. The marker indicates the year of first published report of one of the initial top five predictions as a thermoelectric
Re: (Score:1)
Five.
b, The top five predictions from the year 2009 dataset
So it wasnt 5 predictions. Instead they scored more than 50 predictions, clear as day in that image you linked to. The prediction wasn't "these are the top 5" -- the predictions were very numerous, and the "score" given that makes them "top 5" isnt based on prediction but instead based on after-the-fact post-analysis of more than 50 predictions.
This is data dredging.
Re: (Score:2, Informative)
The prediction wasn't "these are the top 5" -- the predictions were very numerous, and the "score" given that makes them "top 5" isnt based on prediction but instead based on after-the-fact post-analysis of more than 50 predictions.
No, the ranking was not base on after-the-fact post-analysis.
A total of 9,483 compounds overlap between the two datasets, of which 7,663 were never mentioned alongside thermoelectric keywords in our text corpus and can be con-sidered candidates for prediction. To obtain the predictions, we ranked each of these 7,663 compounds by the dot product of their normalized output embedding with the word embedding of ‘thermoelectric’ (see Supplementary Information sections S1 and S3 regarding the use o
Re: (Score:2)
Actually, one good prediction was reported in the summary
The devil is in the details.
And the training set was probably specific construed to see if the AI could guess the invention, so the choice of data could easly also create a Clever Hans effect.
Re: (Score:2)
Don't get me wrong -- it's very, very encouraging that the AI predicted that material that in fact has been discovered. I'm really excited by that result.
But what about false positives? This algorithm could very well predict lots of things. How many of them actually get verified?
(It must be said of course that human beings have a somewhat embarrassing rate of false positives.)
Re: (Score:2)
I'm rather reluctant to call this an AI. It seems to be just a classifier. Classifiers are a necessary part of an AI, but I don't think they're sufficient. It was explicitly mentioned (in the summary) that this was done purely by statistical correlation without any model of what was behind the correlation.
I feel the write up is missing the most important (Score:1)
I feel the write up is missing the most important piece of information, when it spat out the list of predictions how long was it, was it a list of one item or a list of thousands of items and hence "Oh it got that prediction right, its on the list", how long is the list "Just a short list, only a couple of thousand items".
The link from Vice takes you to the full paper. (Score:5, Informative)
Re: (Score:1)
Re:Math papers are the holy grail for this (Score:5, Interesting)
I think mathematicians have great job security due to Gödel's Incompleteness Theorem. TL/DR: math cannot be mechanized, although computers are indisputably a powerful assistant (even colleague?) to future mathematicians.
Re: (Score:3)
I think mathematicians have great job security due to Gödel's Incompleteness Theorem.
No, not really. Both human mathematicians and computer systems have the same theoretical limitations. Computers have an advantage that their practical limits can be improved.
Re: Math papers are the holy grail for this (Score:4, Interesting)
So can humans. Not in speed, but in knowledge.
To a degree, yes. But computers have the potential to improve faster and longer. Compare with chess, for example. Human chess knowledge only advances slowly. Arguably, current world #1, Magnus Carlsen, is a bit stronger than former #1, Gary Kasparov, but the difference is small either way. The progress that has been made with computer chess in the same time frame is much bigger. The Deep Blue computer that barely beat Kasparov can be easily wiped off the board using a program running on a smart phone.
Re: (Score:1)
Yes, Carlsen is objectively better than Kasparov.
A big part of this is Carlsen has better training tools, including access to every game that Kasparov ever played.
A better question is, "Is Carlsen innately better than Kasparov?" Or, alternatively, if Kasparov had access to the same training materials that Carlsen had who would be the better player at the same age?
And, the answer, is probably still Carlsen.
And the most likely reason for this is because Carlsen is drawn from a wider pool of potential chess pl
Re: (Score:1)
While it might be possible to improve computers to closer match human mode of thinking, as a tool/aid such computer would be very error-prone and not that effective.
That is, effective discovery process is drastically different from analyzing existing data. The difference is similar to sort algorithm vs. shortest
Re: (Score:2)
The key difference is that computers are deterministic but humans are not
There are deterministic systems, that will give you exactly the same output for same input and initial state. There are also systems with some random component that will give you pure noise, unrelated to anything.
The 3rd kind of system is an "oracle", something that gives you the right answer out of the blue.
If it is your claim that humans possess such an oracle, I would like to see your evidence for it, or at least a plausible mechanism how it fits in our current understanding of physics.
Re: (Score:2)
Computers think 1 + 1 is equal 2. Humans think 1 + 1 is likely 2, unlikely 3 or 1 and the correct answer has to be between 0 and 4. Human's mode of thinking is more conductive to discovery process as it consider broader set (of often incorrect) answers.
Re: (Score:2)
That's a very old fashioned way of assuming how computers work. Go look at how AlphaZero plays a game, and finds attacking plans. It plays like a super-human rather than an old fashioned computer. It is intuitive and probabilistic, with a heavy dose of error checking.
Re: (Score:2)
"Human thinking is probabilistic, with a heavy dose of error-checking and biased toward false-positives. "
Modern AI thinking is probabilistic, with a heavy dose of error-checking and bias toward false-positives to varying degrees. You aren't making your case here but maybe you just aren't aware of how modern AI functions.
ha ha ha.... Again folks... NOT AI (Score:4, Insightful)
Stop calling a matching algorithm an AI.
finding things that match in context is nowhere near AI. AI is something else entirely. AI is something that learns itself, NOT something that finds a match and make a prediction. Humans are the ones learning, not the algorithm.
According to Slashdot my toaster has AI because it knows when to pop my bread up before it gets burned!
Re:ha ha ha.... Again folks... NOT AI (Score:4, Insightful)
Re: (Score:2, Informative)
no, pretending to be anything is not a sign of intelligence. We have all sorts of plants, animals, insects, etc... that are able to pretend shit and they are not busting out theories on creation.
Stop altering the definition of what AI is. AI is specifically developing an intelligence that is equivalent to well lets have a dictionary tell you because you actually need to read one.
https://www.merriam-webster.co... [merriam-webster.com]
artificial intelligence noun
Definition of artificial intelligence
1 : a branch of computer scien
Re: (Score:3)
no, pretending to be anything is not a sign of intelligence
It is literally the 2nd definition you quoted: "2 : the capability of a machine to imitate intelligent human behavior". As long as you get the results you want, how are you going to tell the difference between real intelligence and "pretended" intelligence ? And why would you even care about the difference ?
Re: (Score:2)
Would you care about the difference if I pretended to beat the shit out of you with a baseball bat or really did it?
Yes, because the end result would be different.
If a human can solve a problem using intelligence, and a computer can solve the same problem with "pretend" intelligence, producing the same result, then I don't care about the distinction.
Intelligence is about the results, not the process. Our brains have evolved from primitive nervous cells, only because along the evolutionary pathway, the bigger and more intelligent brain had better survival results.
Re: (Score:3)
Put a different problem in front of the human and the computer. The computer, with differential equations tweaked up to match one task, will fail utterly and not even recognise the domain it's operating in has changed, where it is, or what it's doing.
Then the result is not the same is it ?
If you ask me in Japanese to fold a simple hat, a broach or a pterodactyl out of a paper weather report, I would utterly fail as well, even though a Japanese school kid could easily do that task.
Re: (Score:2)
You could show a human how to do those tasks, and it could perform them.
Assuming the sensory apparatus existed, the AI system could not.
Also, the human could, given time, learn Japanese and eventually perform the task.
I dont believe the AI system could.
Re: (Score:2)
I will tell you again, pretending to be anything is not a sign of intelligence. Keep in mind we are talking about AI here, so basic intelligence is not a factor here. Insects are not intelligently mimicking things, they achieved these attributes through evolution not intelligence. Their usage may require some form of intelligent response to stimuli but they did not intelligently create the features that provided the mimicry.
You have a fundamental misunderstanding of what is going on here.
The Intelligence
Re:ha ha ha.... Again folks... NOT AI (Score:4, Insightful)
A car is not intelligent just because you turned the steering wheel right and it when right.
I never said that, did I ? Please learn to read and address only the stated argument. My argument is that a machine that reaches the same results as a human using their intelligence, is also intelligent. For instance, a self driving car that can drive any random trip from A to B just as well as a normal human driver should be considered intelligent, no matter how it's implemented. Just following simple instructions like "turn right" is not a sign of intelligence, neither in humans nor in cars.
Re: (Score:2)
FTFY
Re: (Score:2)
put the same unmodified algorithm in a semi and it will be consistently driving through walls without ever considering what it is doing is incorrect.
That depends on how it's made and trained. It could have an accelerometer, and know that any value in excess of, say, 2g is incorrect.
Humans don't operate well outside their normal domain either. Put a car driver behind the controls of an fighter plane (or even a sail boat) and they will crash too. The difference is that humans are trained on much more domains than a self-driving car, and can transfer knowledge from one domain to another. Cross domain knowledge transfer is a hot topic in AI research, and
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
"no, pretending to be anything is not a sign of intelligence. We have all sorts of plants, animals, insects, etc... that are able to pretend shit and they are not busting out theories on creation."
You don't think there are intelligent plants/animals/insects just because they don't spout out theories on creation in a way humans understand? Wow.
"Algorithms are just as intelligent as bricks. When you fully understand what that means, you hopefully will figure out what I am saying."
I understand what you are say
Re: (Score:2)
These kinds of replies happen in literally every single Slashdot story about AI. It's almost like they are posted by a simple bot.
Re: (Score:1)
Because stupid people like you believe everything is AI when it's not, and you need someone to tell you how stupid you are.
Re: (Score:2)
And you think repeating the same old argument will work this time ?
Re: (Score:2)
Re: (Score:1)
So you think human intelligence is some magical fairy thing that works according to no knowable mechanic?
AI isn't some weird incomprehensible thing, it's always just going to be mechanics because all intelligence is just mechanics on a large scale.
AI has always been things like expert systems and pattern matchers. What did you expect it to be?
Re: (Score:2)
Re: (Score:2)
Human Intelligence is certainly not magical, but at this point in our understanding of underlying principles it is indistinguishable from magic.
Re: (Score:2)
But we don't necessarily need to understand the underlying "magic" in order to duplicate it. Mother Nature never knew what she was doing either.
All you need for progress is to find a single problem that AI can't solve correctly, and then fix that. If the problem is too hard to fix, take a small step back, and find a simpler problem. Rinse and repeat.
Re: (Score:2)
"finding things that match in context is nowhere near AI."
Modern AI thinking is probabilistic, with a heavy dose of error-checking and bias toward false-positives to varying degrees. You are talking about a matching algorithm that is being used as a tool by this AI not the AI itself.
"According to Slashdot my toaster has AI because it knows when to pop my bread up before it gets burned!"
Does it do so by taking some sort of feedback of toasts and learning how to make better toast? Isn't that what you'd do if
Re: (Score:2)
I concur 100%.
Glorified Table Lookup is NOT fucking Artificial Intelligence (A.I.)
I call it Artificial Ignorance (a.i) at best.
holisticly recaptiualize viral actions with B.S. (Score:2)
Re:holisticly recaptiualize viral actions with B.S (Score:4, Funny)
"dramatically plagiarize covalent vortals"
... I think this sums up slashdot pretty well.
"understands" (Score:1)
Re: (Score:1)
Is this the quality of slashdot now?
yes
There are already companies doing this (Score:1)
There is a company in Switzerland, Iprova, which has been doing this for real companies for many years. Without spilling too many beans, in short they have built a huge database of scientific articles and interesting articles in many technical areas and have build the kind of semantic web the researchers did and then apply a customer requirements for inventions to the semantic web and get out hints of inventions. These are sort of "needles in haystack", i.e. then a specialist in that field must look at the
The cure for cancer has already been published (Score:3)
We just need to use this algorithm to find out what it is.
Re: (Score:2)
We just need to use this algorithm to find out what it is.
Actually, what we've learned over the last couple of decades strongly suggests that there is no one cure for cancer, because cancer isn't one thing. There are many different kinds of cancer, some with clearly different causal agents (many caused by bacterial or viral infections), and they respond differently to different treatments. We've made enormous progress in treatment of some kinds of cancer, less in others.
Which isn't to say that similar machine learning techniques couldn't provide useful results
Singularity is here (Score:2)
Fantastic! (Score:2)
It predicted the past, successfully.
What about the future?
essence of grad (Score:2)
That's a rather poor result.
Ask any researcher, and you'll probably learn after a sharp intake of breath that the standard gap between "predicted" (first enthusiastic conversation over beers with your thesis advisor) and "discovered" (aka formally staked out with a series of peer-reviewed papers i