Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
AI Science

In a Major Scientific Breakthrough, AI Predicts the Exact Shape of Proteins (fortune.com) 62

Researchers have made a major breakthrough using artificial intelligence that could revolutionize the hunt for new medicines. The scientists have created A.I. software that uses a protein's DNA sequence to predict its three-dimensional structure to within an atom's width of accuracy. weiserfireman shares a report: The achievement, which solves a 50-year-old challenge in molecular biology, was accomplished by a team from DeepMind, the London-based artificial intelligence company that is part of Google parent Alphabet. Until now, DeepMind was best known for creating A.I. that could beat the best human players at the strategy game Go, a major milestone in computer science. DeepMind achieved the protein shape breakthrough in a biennial competition for algorithms that can be used to predict protein structures. The competition asks participants to take a protein's DNA sequence and then use it to determine the protein's three-dimensional shape. Across more than 100 proteins, DeepMind's A.I. software, which it called AlphaFold 2, was able to predict the structure to within about an atom's width of accuracy in two-thirds of cases and was highly accurate in most of the remaining one-third of cases, according to John Moult, a molecular biologist at the University of Maryland who is director of the competition, called the Critical Assessment of Structure Prediction, or CASP. It was far better than any other method in the competition, he said.
This discussion has been archived. No new comments can be posted.

In a Major Scientific Breakthrough, AI Predicts the Exact Shape of Proteins

Comments Filter:
  • how they cheated.

    • Very little information has been released on this so far.

      One very google-y way to cheat on this though - especially as RCSB [rcsb.org] continues to grow at a very rapid clip - is to start with homology. Find the closest protein to the one you have been given - after all you're starting with primary sequence - and then map it to there and refine after that. This is a widely accepted way to go about it - hence not really cheating - but also something that google would be expected to be really good at.
  • Soon ... (Score:5, Funny)

    by PPH ( 736903 ) on Monday November 30, 2020 @01:57PM (#60779250)

    ... AI will help you re-fold a road map.

  • The article makes no mention of what proteins were used for this - in particular what types of proteins. I say this as a biochemist with a particular interest in transmembrane proteins, which are much more difficult to predict structure for. We have lots of cytoplasmic proteins whose structures we have already solved; they could be great for training an algorithm however the transmembrane proteins present a different problem as they essentially fold inside-out with regards to hydrophobicity when they are crossing the membrane. Being as most transmembrane proteins have both hydrophilic and hydrophobic domains, solving them is much more difficult than their soluble counterparts that do not interact directly with a membrane.
    • by backslashdot ( 95548 ) on Monday November 30, 2020 @02:16PM (#60779324)

      This was CASP14, it's legit. It includes transmembrane proteins.

    • Here you go (Score:5, Informative)

      by heteromonomer ( 698504 ) on Monday November 30, 2020 @03:38PM (#60779652)

      CASP competition doesn't classify proteins into various types. They only categorize based on type of prediction (e.g. ab initio versus homology modeling etc). But they do list the actual proteins that were the targets for prediction. See list here.

      https://www.predictioncenter.o... [predictioncenter.org]

    • I'm more concerned that you can hide most anything behind "AI" in this context. You could take the best "non-AI" method, put a optimized arbitrary algorithm (= trained machine = AI) on top of it with some data added in, and you'd get something "better". But I don't think it is really more than the sum of its parts. i.e. it's not a predictive model, it's just a rule of thumb for adjusting the output of an actual predictive model, with real physics and chemistry in it, to get the observed structures in nat

  • ... the egg!!

  • by chill ( 34294 ) on Monday November 30, 2020 @02:04PM (#60779292) Journal

    Geek minds want to know. How does this impact Folding @ Home [foldingathome.org]?

    • I expect all the work units will soon be GPU/NPU-accelerated neural net programs, and progress will be far faster.

  • by nospam007 ( 722110 ) * on Monday November 30, 2020 @02:05PM (#60779296)

    Folding@home is dead?

    • No, assuming Deepmind shares .. folding @home will become even better and more useful making it even more critical.

    • I wouldn't say dead. But they can focus on other challenges more amenable to distributed computing and first principle based algorithms (e.g. docking, complex prediction etc). I would think Deep Mind could go after protein-protein complexes as well actually.

  • by account_deleted ( 4530225 ) on Monday November 30, 2020 @02:30PM (#60779350)
    Comment removed based on user account deletion
  • Beware... (Score:4, Insightful)

    by Thelasko ( 1196535 ) on Monday November 30, 2020 @02:51PM (#60779420) Journal
    My experience with machine learning has shown me that it's great at filling in blanks (interpolation), but it can really fall on its face at blazing new trails (extrapolation). So if the protein is similar to two proteins it has seen before, it probably does great. However, it could go very wrong on a protein that is very different from samples in the training data.
    • by lorinc ( 2470890 )

      Your experience reflects very much the theory. ML theory guaranties generalization capabilities under some assumption on the training process that we cannot have in practice. For example, the training examples have to be independent and identically distributed, which is never the case in practice be cause we tend to collect data in clumps, that is, in a non random way. That alone tends to create an imbalance between areas where we have many observations (w.r.t. the natural distribution) and areas where we d

  • So, I'm guessing that determining the structure from first principles is not possible or practical. Is that really the case?
    • You are correct. It is not practical. The current energy functions are not perfect. Even if you assume that they are good enough, the compute power needed doesn't exist yet. When I was in grad school I was of the feeling that only quantum computing can solve this problem (because in reality, protein folding is a kind of quantum computational problem, which is just collapsed into a good enough solution - or so I thought).

  • Some background is missing here. The shape of a protein is very relevant, since it determines how it interacts with other elements, i.e. its behavior and function. The shape depends on the specific sequence of aminoacids that compose the genetic sequence of the protein. However, simulating the folding of the molecules to get the final shape is a very complex and resource-intensive problem.

    IBM developed the Blue Gene [wikipedia.org] supercomputers 2 decades ago motivated, in part, by this complex simulations (the other moti

    • Blue Gene etc were being applied to fold proteins based on first principles (i.e. physics and numerical methods). Deep Mind however has side-stepped that whole process of solving through fundamental understanding and got to the solution. The good things however are:

      (1) It does use some of our fundamental learnings about protein structure.
      (2) We get to solve more applied problems, leaving the physics based methods to continue to develop, which will probably have other applications (like de novo design of cat

    • by Tablizer ( 95088 )

      However, simulating the folding of the molecules to get the final shape is a very complex and resource-intensive problem.

      It seems to me it should be relatively simple, what I am missing? You'd have a two-column "rule list" where each aminoacid joint (pair) produces "left turn 30 degrees", "right turn 62 degrees", etc. (i.e. vectors.) I imagine sometimes the structure would "bump into" itself, but handling that is just part of the simulation. What's an example of the complexity bottleneck(s)?

      • by Tablizer ( 95088 )

        Correction, 3 columns: 1) Amino-acid type "A"; 2) Amino-acid type "B"; 3) 3D "turn" vector relative to "A" (maybe with an offset distance).

        • You're not far off. However, the protein has maybe 200-300 amino acids (is it's a smallish one). Assuming you quantize your turn angles to 30 degrees (and in reality you'll need much better resolution than that) that's a search space of 12^200. That's your complexity, right there.
          • by Tablizer ( 95088 )

            I'm not clear on where the uncertainty is. The sequence of the proteins are all known, aren't they? The resulting angles between acid pairs are all known, correct? If reality doesn't match the look-up model, why?

            • No, the connection between any two pairs of amino acids has two rotatable bonds (giving the so-called phi and psi angles). These aren't freely rotatable (google "Ramachandran plot" for the gory details), but there's a lot of flexibility there for most amino acid pairs. Proteins are far from being structurally rigid: they (mostly) stay in one shape as a result of interactions between the amino acid side chains. As a result, you're back into the massive search space problem, which is what makes the reported r
  • I have to imagine anyone with the resources is throwing neural networks at every remotely promising problem, simultaneously. I think we're just seeing the early trickle, the remaining low-hanging fruit that had somehow been missed. It remains to be seen how much of that there is to pick off before the problems escalate in difficulty to another "can't touch this" level, requiring still more hardware advances.

  • Is it some kind of Fortune policy where *every link* only points to other Fortune articles? How about, IDK, a link to the actual paper, or the competition in which this occurred, or,....anything?

    A little Google-fu turns up this link on Google's AI blog: https://deepmind.com/blog/arti... [deepmind.com].

    And here are the CASP14 competition results just released: https://predictioncenter.org/c... [predictioncenter.org]

  • by Carcass666 ( 539381 ) on Monday November 30, 2020 @04:22PM (#60779860)
    AI will analyze Slashdot's code and determine how to make it properly support unicode characters, including directional single and double quotation marks.
  • by iikkakeranen ( 6279982 ) on Monday November 30, 2020 @05:01PM (#60780012)

    Laundry!

"It takes all sorts of in & out-door schooling to get adapted to my kind of fooling" - R. Frost

Working...