Google's DeepMind Predicts 3D Shapes of Proteins (theguardian.com) 51
Google's DeepMind is using an AI program, called AlphaFold, to predict the 3D shapes of proteins, the fundamental molecules of life. "DeepMind set its sights on protein folding after its AlphaGo program famously beat Lee Sedol, a champion Go player, in 2016," reports The Guardian. The company says "It's never been about cracking Go or Atari, it's about developing algorithms for problems exactly like protein folding." From the report: DeepMind entered AlphaFold into the Critical Assessment of Structure Prediction (CASP) competition, a biannual protein-folding olympics that attracts research groups from around the world. The aim of the competition is to predict the structures of proteins from lists of their amino acids which are sent to teams every few days over several months. The structures of these proteins have recently been cracked by laborious and costly traditional methods, but not made public. The team that submits the most accurate predictions wins. On its first foray into the competition, AlphaFold topped a table of 98 entrants, predicting the most accurate structure for 25 out of 43 proteins, compared with three out of 43 for the second placed team in the same category.
To build AlphaFold, DeepMind trained a neural network on thousands of known proteins until it could predict 3D structures from amino acids alone. Given a new protein to work on, AlphaFold uses the neural network to predict the distances between pairs of amino acids, and the angles between the chemical bonds that connect them. In a second step, AlphaFold tweaks the draft structure to find the most energy-efficient arrangement. The program took a fortnight to predict its first protein structures, but now rattles them out in a couple of hours.
To build AlphaFold, DeepMind trained a neural network on thousands of known proteins until it could predict 3D structures from amino acids alone. Given a new protein to work on, AlphaFold uses the neural network to predict the distances between pairs of amino acids, and the angles between the chemical bonds that connect them. In a second step, AlphaFold tweaks the draft structure to find the most energy-efficient arrangement. The program took a fortnight to predict its first protein structures, but now rattles them out in a couple of hours.
They took our jobs! (Score:3, Interesting)
DeepMind is moving out of the realm of curiosity (games) to things that employ people with a high degree of specialization. Google's team of 10 people produced a better result with 2 years of work than the entire academic field has been able to produce in the last 30. Granted, they had prior work to inform them. Anyway, this is interesting because this kind of development can put the PhD's in my lab out of a job - and they thought the truck drivers would be first to get automated!
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
"Since when was 25 out of 43 "aced"?
When I was in school that would have been called "failed". Just because it got 22 more correct than second place doesn't change the fact that it got less than 60% correct."
Second place had 7% correct. Winner got 800% more correct answers than 2.
That's not bad IMHO.
Re: (Score:2)
Who is a bigger idiot, these nutjobs or the one with the mission to correct every nutjob on the net? Just chill. Why waste good time chasing these nutjobs.
Best of luck buddy.
Re: (Score:2)
It only works in cases that don't matter as much (Score:2)
The method they used only works when there are a gazillion similar sequences. It doesn't work for a unique sequence. So it's not an "ab initio" method, it's a fold recognition method done by recognizing the contacts then free form folding to fit that. But it can't infer contacts without massive sequence alignments to other proteins. Thus it has great value in those cases but other methods work in all cases not just that special case.
Re: (Score:3)
You think that's bad? Radiologists are already significantly better with AI and give it a few more iterations and you'll only need a few of the best radiologists to handle the edge cases, then it's all machine learning on outliers.
Sorry about that fellowship you did - back to primary care with you - don't forget to swap out that BMW for a Prius.
Re: (Score:2)
More like swap out that Maserati for a BMW. The primary care types do pretty well too, but it's hard to match the throughput of a good radiologist.
Problem with the primary care physicians is that the part of their job that's not vulnerable to machine learning is done better by nurses. Surgeons should have job security for a while.
Re: (Score:3)
Google's team of 10 people produced a better result with 2 years of work than the entire academic field has been able to produce in the last 30
That's not a correct reading of the results. First, previous efforts are based on putative understanding about how proteins fold. Obviously, this understanding is incomplete - or the physics based methods would perform better. (Even statistical potentials like in Rosetta are physics based in important ways). Second, DeepMind isn't even on the radar in the server component of CASP. The server competition is intrinsically more difficult because it requires robust software that isn't highly dependent on user p
Research Paper Needed (Score:1)
I'm looking forward to the research paper to address key questions. What resources (training, inference) did Google use and how do they compare to the competition? Was this mostly a machine learning problem with big data, or a big data problem with some machine learning? Is there a GitHub yet?
Re: (Score:2)
An interesting question is the claim that they generate shapes ab initio, but using a neural network. I wonder how much the network has been trained to recognize existing (evolutionary dependent) protein families and their patterns vs. a new random sequence folder. The former may be just as useful in practice but may teach us a bit less about the mechanics of f
Re: (Score:2)
Re: (Score:2)
I wonder how much the network has been trained to recognize existing (evolutionary dependent) protein families and their patterns vs. a new random sequence folder.
That's why they should use the historical validation approach! Train on structures solved before 2005, then predict only novel folds solved after 2005. Perform well in that context and I'll be impressed.
The former may be just as useful in practice but may teach us a bit less about the mechanics of folding.
Unlike the physics-based and statistical potential methods, can the DeepMind approach ever contribute to understanding how proteins fold? IMHO that's an open question, and one that's critical to their presumably forthcoming publication. For example, do their features weights say something interesting about c
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:3)
Re: (Score:3)
Re: (Score:2)
I wish I had mod points for this.
Related: I wonder how well humans familiar with folding motifs and all the confounding factors present in nature would do vs. the models. While most chemists rely on modeling, NMR, and crystallography, the techs running these systems all have intuitions built up from years of generating structures.
Would some of them outperform the models in the same way Google's approach did?
-Chris
Re: (Score:2)
Results matter more than a 'peers' opinion of the results.
You misunderstand the process. Peer opinions are based on the results. They are also based on years of study leading to an appreciation of what results are actually 1) interesting and 2) useful. These are crude words for the distinction, but to illustrate, if AlphaFold were to work perfectly it would only be useful. It wouldn't improve understanding and thereby advance science beyond making some specific current task potentially easier. (Even if it might be really great for engineering).
If the training set contains all the magic rules
There's good reason
Re: (Score:2)
This, right here.
A.I. is not some kind of magic bullet that solves all problems. Far from it, since all models depend deeply upon the set of training data that gets fed to it. In this simple sine wave example, it is trivial to come up with something outside of the training data, which shows quite clearly that not all problems are well-suited for machine learning.
In terms of Alpha
Re: (Score:1)