DeepMind Uncovers Structure of 200 Million Proteins in Scientific Leap Forward (theguardian.com) 28
AI has deciphered the structure of virtually every protein known to science, paving the way for the development of new medicines or technologies to tackle global challenges such as famine or pollution. From a report: Proteins are the building blocks of life. Formed of chains of amino acids, folded up into complex shapes, their 3D structure largely determines their function. Once you know how a protein folds up, you can start to understand how it works, and how to change its behaviour. Although DNA provides the instructions for making the chain of amino acids, predicting how they interact to form a 3D shape was more tricky and, until recently, scientists had only deciphered a fraction of the 200m or so proteins known to science. In November 2020, the AI group DeepMind announced it had developed a program called AlphaFold that could rapidly predict this information using an algorithm. Since then, it has been crunching through the genetic codes of every organism that has had its genome sequenced, and predicting the structures of the hundreds of millions of proteins they collectively contain.
Last year, DeepMind published the protein structures for 20 species â" including nearly all 20,000 proteins expressed by humans -- on an open database. Now it has finished the job, and released predicted structures for more than 200m proteins. "Essentially, you can think of it as covering the entire protein universe. It includes predictive structures for plants, bacteria, animals, and many other organisms, opening up huge new opportunities for AlphaFold to have an impact on important issues, such as sustainability, food insecurity, and neglected diseases," said Demis Hassabis, DeepMind's founder and chief executive. Scientists are already using some of its earlier predictions to help develop new medicines.
Last year, DeepMind published the protein structures for 20 species â" including nearly all 20,000 proteins expressed by humans -- on an open database. Now it has finished the job, and released predicted structures for more than 200m proteins. "Essentially, you can think of it as covering the entire protein universe. It includes predictive structures for plants, bacteria, animals, and many other organisms, opening up huge new opportunities for AlphaFold to have an impact on important issues, such as sustainability, food insecurity, and neglected diseases," said Demis Hassabis, DeepMind's founder and chief executive. Scientists are already using some of its earlier predictions to help develop new medicines.
Uh Oh (Score:2)
AI has come back to "finish the job".
Re: (Score:2)
I for one welcome our welcomer-eating AI overlords.
Re: (Score:2)
Yum.
Re: (Score:1)
Now with extra Handshake Marshmallow Berries!
Re: (Score:1)
AI has come back to "finish the job".
Sarah Connor, we need you! Please come back.
Sort like what was done previously in 2005 (Score:5, Informative)
Previously, what world wide community grid made predictions about every possible gene recorded in the databases using the Rosetta algorithm. This was done starting in 2005 over a decade ago and not using Machine Learning.
https://www.technologyreview.c... [technologyreview.com]
What's changed is that there's a lot more sequence deposited in the databases. The more sequences you have the easier it becomes to spot remote sequence similarities and inferences about conserved residue-residue contacts in proteins thereby making folding prediction simple. Alpha fold relies on protein similarity heavily so it has historically been poor at predicting novel structures and instead it is better at predicting when a sequence has a kind of fold shape it has seen before in a different sequence. The earlier work didn't have the benefit of all that additional sequence so it used a method that didn't rely on discovering accidental similarity but instead predicted de-novo and thus was just as good at predicting unique structures as shared similar structures. What's also changed is that larger sized proteins can now be tackled. This is again because of additional sequence information that helps divide things into independent folding units as well as the inherently superior efficiency of the ML approaches allowing larger protein domains to be tackled. This is important because ab intio structure search scales badly (exponentially ) with protein length, so ML methods that sort of guess the answer directly rather than "fold up" the protein on a potential energy surface scale better. But in the final analysis, remember these are predictions not actual structures. One knows they cannot all be correct because many proteins have multiple conformations, as well as the predictions possibly being wrong, especially for unique new folds. But each time a new method comes along, the fraction of bad predictions tends to go down, so having new and better methods is always an improvement. Moreover once you have a prediction it's a very simple and fast calculation to check if it has a good potential energy on a well vetted potential energy surface model (e.g. Rosetta, as well as things like Amber). So getting predictions fast really accelerates this because the potential energy validation step potential removes the bad predictions.
Re: (Score:3)
And of course, even if you know how to fold a protein, that is just the primary structure, not the tertiary protein structure.
Re: (Score:3)
Wait, so they weren't taking about the "200m(eter) or so proteins known to science"? :-)
Re: '200M' versus '200m' (Score:2)
By any measure, those are some big-ass proteins!
Re: (Score:2)
Those are SI prefixes and clearly this is being used as a suffix.
Uh oh! (Score:3)
Do we stop Folding@Home? (Score:1)
Re: Do we stop Folding@Home? (Score:4, Informative)
I had the same question. Looks like AlphaFold2 only gives part of the answer to the protein folding problem:
https://www.reddit.com/r/Folding/comments/osu6y5/does_alphafold_make_fh_obsolete_i_keep_seeing_new/
If that is true, it will be interesting to see how these two projects can coordinate with each other to focus their efforts on the missing pieces.
Re: (Score:2)
If so, any suggestions for my computer's next hobby?
I have not practiced creative origami since I was in grade school. /play-on-words
Be sensible (Score:5, Insightful)
And how did we analyse and see how accurate the output was?
Re:Be sensible (Score:4)
AlphaFold has changed the game for the CASP contest. It's very close to 90% accurate, which is considered the threshold for what we can even verify.
I'm assuming these 200M haven't been verified, and it's just accepted that there's a small margin of error over x-ray crystallography and NMR.
Re: (Score:2)
I was wondering the exact same thing.
A human algorythem is applied to AI and a huge number of previously unknown protein maps are generated. Yay, right? Who will be the first to test these new maps out?
Hmm (Score:2)
“It took us quite a long time to go through this massive database of structures, but opened this whole array of new three-dimensional shapes we’d never seen before that could actually break down plastics,”
Cool. What do we do with the byproducts?
Re: (Score:2)
Depends what they are. With some luck (and/or careful choice of protein) they'll be something we can use as feedstock to make something useful, like methanol or some such.
Re: (Score:2)
> Cool. What do we do with the byproducts?
Plastic is pretty much build from long chains of C2H4 (Ethylene), which can be used to make new plastic or you can just burn it for fuel. You can also break it down into carbon and hydrogen, which you can use to make diamonds and Hinderburgs.
Exploded view drawings (Score:2)
What does this library of proteins mean? Imagine an alien coming to Earth with mission to investigate our tech. The protein library is like giving the alien exploded view drawings of all our inventions. The alien still doesn't know what a car does or what it is used for, but it can see all kinds of details about it, which can help understanding it better.
Keep in mind, that traditional method of getting this data via crystallography takes 4-6 months with average cost of $100K.Now you can just search it from
Those are predictions. How many are validated? (Score:4, Insightful)
An AI can only predict what the structure will be (well, unless it controls a robot operated lab). After making the predictions you need to validate them. I'm sure that some have been, but the summary doesn't mention that. And the link is to The Guardian, so I'm rather sure that it doesn't mention that either. (A brief glance sort of confirms that guess.)
There's an article in Nature https://www.nature.com/article... [nature.com] where they claim the results are "highly accurate". I'm not quite sure what that means. They may be claiming 95% accuracy, or perhaps 95% better than a prior attempt. Or perhaps 95% better along some specific dimension. Perhaps someone more qualified than I am in molecular biology (i.e. I'm *NOT* qualified) could explain.
Perhaps this is a better link: https://www.icr.ac.uk/blogs/th... [icr.ac.uk]
Re: (Score:2)
In any case, it's only the primary structure, not the secondary or tertiary structure.
Re: (Score:2)
The tech works well for weapons too (Score:2)
AI dreams (Score:2)
Deepmind has published 200M of its hallucinations. In other news, so has Kylie Jenner.