In a Major Scientific Breakthrough, AI Predicts the Exact Shape of Proteins (fortune.com) 62

Posted by msmash on Monday November 30, 2020 @02:48PM from the closer-look dept.

Researchers have made a major breakthrough using artificial intelligence that could revolutionize the hunt for new medicines. The scientists have created A.I. software that uses a protein's DNA sequence to predict its three-dimensional structure to within an atom's width of accuracy. weiserfireman shares a report: The achievement, which solves a 50-year-old challenge in molecular biology, was accomplished by a team from DeepMind, the London-based artificial intelligence company that is part of Google parent Alphabet. Until now, DeepMind was best known for creating A.I. that could beat the best human players at the strategy game Go, a major milestone in computer science. DeepMind achieved the protein shape breakthrough in a biennial competition for algorithms that can be used to predict protein structures. The competition asks participants to take a protein's DNA sequence and then use it to determine the protein's three-dimensional shape. Across more than 100 proteins, DeepMind's A.I. software, which it called AlphaFold 2, was able to predict the structure to within about an atom's width of accuracy in two-thirds of cases and was highly accurate in most of the remaining one-third of cases, according to John Moult, a molecular biologist at the University of Maryland who is director of the competition, called the Critical Assessment of Structure Prediction, or CASP. It was far better than any other method in the competition, he said.

In a Major Scientific Breakthrough, AI Predicts the Exact Shape of Proteins

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 62 Comments Log In/Create an Account

Comments Filter:

Google? That makes me wonder (Score:2)

by nagora ( 177841 ) writes:

how they cheated.
- Re:Google? That makes me wonder (Score:4, Insightful)
  
  by damn_registrars ( 1103043 ) writes: <damn.registrars@gmail.com> on Monday November 30, 2020 @03:18PM (#60779328) Homepage Journal
  
  Very little information has been released on this so far.
  
  One very google-y way to cheat on this though - especially as RCSB [rcsb.org] continues to grow at a very rapid clip - is to start with homology. Find the closest protein to the one you have been given - after all you're starting with primary sequence - and then map it to there and refine after that. This is a widely accepted way to go about it - hence not really cheating - but also something that google would be expected to be really good at.
  
Soon ... (Score:5, Funny)

by PPH ( 736903 ) writes: on Monday November 30, 2020 @02:57PM (#60779250)

... AI will help you re-fold a road map.

- Re: (Score:2)
  
  by JaredOfEuropa ( 526365 ) writes:
  
  A what now?
  - Comment removed (Score:5, Funny)
    
    by account_deleted ( 4530225 ) writes: on Monday November 30, 2020 @03:11PM (#60779316)
    
    Comment removed based on user account deletion
    
  - Re: (Score:2)
    
    by PPH ( 736903 ) writes:
    
    Get off my lawn.
  - Re: (Score:2)
    
    by tlhIngan ( 30335 ) writes:
    
    An offline version of Google Maps for the time when you're stuck in the middle of nowhere without a data connection.
    Or until a few years ago, when traveling (because data roaming charges were extremely high). Sure you could get the map part, but actual routing and turn by turn requires data access.
    Still a reason why standalone GPS units still exist since the routing and turn by turn take place on device. (But of course, lacks the construction zone routing and frequent updates that hopefully try to avoid put
    - Re: (Score:2)
      
      by Krishnoid ( 984597 ) writes:
      
      Offline version [google.com] of Google Maps? Wow, Google's AI was predictive enough to know we would be posting about it! I'm pretty sure it has the turn-by-turn directions now too.
      If only they could have predicted that people might actually want to use it widely and added a MicroSD slot to their phones and tablets to store those (decently-sized) maps.
    - Re: (Score:2)
      
      by K. S. Kyosuke ( 729550 ) writes:
      
      You mean like the navigating program that I have on my phone that uses open data? What about it?
      - Re: (Score:2)
        
        by Tailhook ( 98486 ) writes:
        
        What about it?
        That was fine and all but now my phone folds.... wth!!
    - Re: (Score:2)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
  - Re: (Score:2)
    
    by sonamchauhan ( 587356 ) writes:
    
    You know when you print out a Google map? Then you print out separate sections and tape them together?
    Something like that
- Re:Soon ... (Score:4, Funny)
  
  by dgatwood ( 11270 ) writes: on Monday November 30, 2020 @04:02PM (#60779470) Homepage Journal
  
  ... AI will help you re-fold a road map.
  By driving the car while you do it.
  
  - Re: (Score:2)
    
    by fluffernutter ( 1411889 ) writes:
    
    There is no way that is coming soon.
- Re: (Score:1)
  
  by Tablizer ( 95088 ) writes:
  
  I just stepped on it. Worked every time! Downside is my dates would look at me funny, but I got used to that from other things.
- Re: (Score:2)
  
  by backslashdot ( 95548 ) writes:
  
  It's been a while since I had to do that. Brings back memories. GPS has really changed the world. Do gas stations even carry road maps anymore?
  - Re: (Score:2)
    
    by cusco ( 717999 ) writes:
    
    Very few do, mostly truck stops. AAA is one of the few places to get actual paper maps any more.
Really need more information here (Score:5, Interesting)

by damn_registrars ( 1103043 ) writes: <damn.registrars@gmail.com> on Monday November 30, 2020 @02:58PM (#60779258) Homepage Journal

The article makes no mention of what proteins were used for this - in particular what types of proteins. I say this as a biochemist with a particular interest in transmembrane proteins, which are much more difficult to predict structure for. We have lots of cytoplasmic proteins whose structures we have already solved; they could be great for training an algorithm however the transmembrane proteins present a different problem as they essentially fold inside-out with regards to hydrophobicity when they are crossing the membrane. Being as most transmembrane proteins have both hydrophilic and hydrophobic domains, solving them is much more difficult than their soluble counterparts that do not interact directly with a membrane.

- Re:Really need more information here (Score:4, Insightful)
  
  by backslashdot ( 95548 ) writes: on Monday November 30, 2020 @03:16PM (#60779324)
  
  This was CASP14, it's legit. It includes transmembrane proteins.
  
  - Re:Really need more information here (Score:4, Interesting)
    
    by damn_registrars ( 1103043 ) writes: <damn.registrars@gmail.com> on Monday November 30, 2020 @06:24PM (#60780132) Homepage Journal
    
    This was CASP14, it's legit. It includes transmembrane proteins.
    CASP14 is a good data set, for sure. However we don't know how the Google AI fared on the transmembrane proteins from the data set, as we don't know which proteins it did really well on and which ones it did not do well on. They're patting themselves on the back for what they did - which certainly they did well with - but they aren't saying what is in each set.
    
- Here you go (Score:5, Informative)
  
  by heteromonomer ( 698504 ) writes: on Monday November 30, 2020 @04:38PM (#60779652)
  
  CASP competition doesn't classify proteins into various types. They only categorize based on type of prediction (e.g. ab initio versus homology modeling etc). But they do list the actual proteins that were the targets for prediction. See list here.
  https://www.predictioncenter.o... [predictioncenter.org]
  
- Re: (Score:2)
  
  by Xylantiel ( 177496 ) writes:
  
  I'm more concerned that you can hide most anything behind "AI" in this context. You could take the best "non-AI" method, put a optimized arbitrary algorithm (= trained machine = AI) on top of it with some data added in, and you'd get something "better". But I don't think it is really more than the sum of its parts. i.e. it's not a predictive model, it's just a rule of thumb for adjusting the output of an actual predictive model, with real physics and chemistry in it, to get the observed structures in nat
We found ... (Score:2)

by Kiliani ( 816330 ) writes:

... the egg!!
- Re: (Score:1)
  
  by Tablizer ( 95088 ) writes:
  
  I'll donate proteins! I make plenty while...um
Folding at Home? (Score:3)

by chill ( 34294 ) writes: on Monday November 30, 2020 @03:04PM (#60779292) Journal

Geek minds want to know. How does this impact Folding @ Home [foldingathome.org]?

- Re: (Score:2)
  
  by GameboyRMH ( 1153867 ) writes:
  
  I expect all the work units will soon be GPU/NPU-accelerated neural net programs, and progress will be far faster.
So... (Score:3)

by nospam007 ( 722110 ) * writes: on Monday November 30, 2020 @03:05PM (#60779296)

Folding@home is dead?

- Re: (Score:2)
  
  by backslashdot ( 95548 ) writes:
  
  No, assuming Deepmind shares .. folding @home will become even better and more useful making it even more critical.
- Re: (Score:2)
  
  by heteromonomer ( 698504 ) writes:
  
  I wouldn't say dead. But they can focus on other challenges more amenable to distributed computing and first principle based algorithms (e.g. docking, complex prediction etc). I would think Deep Mind could go after protein-protein complexes as well actually.
Comment removed (Score:3)

by account_deleted ( 4530225 ) writes: on Monday November 30, 2020 @03:30PM (#60779350)

Comment removed based on user account deletion

- Re:My question is..., (Score:5, Informative)
  
  by ceoyoyo ( 59147 ) writes: on Monday November 30, 2020 @03:39PM (#60779382)
  
  That shouldn't be particularly difficult. In fact, it's likely that the websites you visit can identify you as you, individually.
  The reason you have to do stupid captchas isn't technological. It's Google getting some work units out of you increasing the size of their training set.
  
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
    - Re:My question is..., (Score:4, Funny)
      
      by ceoyoyo ( 59147 ) writes: on Monday November 30, 2020 @03:59PM (#60779462)
      
      This will be good data for the emotion engine.
      
      - Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
      - Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:2)
        
        by ceoyoyo ( 59147 ) writes:
        
        Ha ha, thanks for the setup. I'll add Wayne and Shuster because I'm Canadian and old enough that their end of career specials were on when I was a kid. Sadly, there's not much "on the road" happening right now.
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:2)
        
        by ceoyoyo ( 59147 ) writes:
        
        Reunion tour in 10?
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:2)
        
        by ceoyoyo ( 59147 ) writes:
        
        Lol. It's been a ride.
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
Beware... (Score:4, Insightful)

by Thelasko ( 1196535 ) writes: on Monday November 30, 2020 @03:51PM (#60779420) Journal

My experience with machine learning has shown me that it's great at filling in blanks (interpolation), but it can really fall on its face at blazing new trails (extrapolation). So if the protein is similar to two proteins it has seen before, it probably does great. However, it could go very wrong on a protein that is very different from samples in the training data.

- Re: (Score:2)
  
  by lorinc ( 2470890 ) writes:
  
  Your experience reflects very much the theory. ML theory guaranties generalization capabilities under some assumption on the training process that we cannot have in practice. For example, the training examples have to be independent and identically distributed, which is never the case in practice be cause we tend to collect data in clumps, that is, in a non random way. That alone tends to create an imbalance between areas where we have many observations (w.r.t. the natural distribution) and areas where we d
First Principles (Score:2)

by RoccamOccam ( 953524 ) writes:

So, I'm guessing that determining the structure from first principles is not possible or practical. Is that really the case?
- Re: (Score:3)
  
  by heteromonomer ( 698504 ) writes:
  
  You are correct. It is not practical. The current energy functions are not perfect. Even if you assume that they are good enough, the compute power needed doesn't exist yet. When I was in grad school I was of the feeling that only quantum computing can solve this problem (because in reality, protein folding is a kind of quantum computational problem, which is just collapsed into a good enough solution - or so I thought).
Background (Score:2)

by enriquevagu ( 1026480 ) writes:

Some background is missing here. The shape of a protein is very relevant, since it determines how it interacts with other elements, i.e. its behavior and function. The shape depends on the specific sequence of aminoacids that compose the genetic sequence of the protein. However, simulating the folding of the molecules to get the final shape is a very complex and resource-intensive problem.
IBM developed the Blue Gene [wikipedia.org] supercomputers 2 decades ago motivated, in part, by this complex simulations (the other moti
- Re: (Score:3)
  
  by heteromonomer ( 698504 ) writes:
  
  Blue Gene etc were being applied to fold proteins based on first principles (i.e. physics and numerical methods). Deep Mind however has side-stepped that whole process of solving through fundamental understanding and got to the solution. The good things however are:
  (1) It does use some of our fundamental learnings about protein structure.
  (2) We get to solve more applied problems, leaving the physics based methods to continue to develop, which will probably have other applications (like de novo design of cat
- Re: (Score:1)
  
  by Tablizer ( 95088 ) writes:
  
  However, simulating the folding of the molecules to get the final shape is a very complex and resource-intensive problem.
  It seems to me it should be relatively simple, what I am missing? You'd have a two-column "rule list" where each aminoacid joint (pair) produces "left turn 30 degrees", "right turn 62 degrees", etc. (i.e. vectors.) I imagine sometimes the structure would "bump into" itself, but handling that is just part of the simulation. What's an example of the complexity bottleneck(s)?
  - Re: (Score:1)
    
    by Tablizer ( 95088 ) writes:
    
    Correction, 3 columns: 1) Amino-acid type "A"; 2) Amino-acid type "B"; 3) 3D "turn" vector relative to "A" (maybe with an offset distance).
    - Re: (Score:3)
      
      by at0mjack ( 953726 ) writes:
      
      You're not far off. However, the protein has maybe 200-300 amino acids (is it's a smallish one). Assuming you quantize your turn angles to 30 degrees (and in reality you'll need much better resolution than that) that's a search space of 12^200. That's your complexity, right there.
      - Re: (Score:1)
        
        by Tablizer ( 95088 ) writes:
        
        I'm not clear on where the uncertainty is. The sequence of the proteins are all known, aren't they? The resulting angles between acid pairs are all known, correct? If reality doesn't match the look-up model, why?
        
        Re: (Score:2)
        
        by at0mjack ( 953726 ) writes:
        
        No, the connection between any two pairs of amino acids has two rotatable bonds (giving the so-called phi and psi angles). These aren't freely rotatable (google "Ramachandran plot" for the gory details), but there's a lot of flexibility there for most amino acid pairs. Proteins are far from being structurally rigid: they (mostly) stay in one shape as a result of interactions between the amino acid side chains. As a result, you're back into the massive search space problem, which is what makes the reported r
Spaghetti on the wall (Score:2)

by Mal-2 ( 675116 ) writes:

I have to imagine anyone with the resources is throwing neural networks at every remotely promising problem, simultaneously. I think we're just seeing the early trickle, the remaining low-hanging fruit that had somehow been missed. It remains to be seen how much of that there is to pick off before the problems escalate in difficulty to another "can't touch this" level, requiring still more hardware advances.
External links? (Score:2)

by balaam's ass ( 678743 ) writes:

Is it some kind of Fortune policy where *every link* only points to other Fortune articles? How about, IDK, a link to the actual paper, or the competition in which this occurred, or,....anything?
A little Google-fu turns up this link on Google's AI blog: https://deepmind.com/blog/arti... [deepmind.com].
And here are the CASP14 competition results just released: https://predictioncenter.org/c... [predictioncenter.org]
Soon... (Score:3)

by Carcass666 ( 539381 ) writes: on Monday November 30, 2020 @05:22PM (#60779860)

AI will analyze Slashdot's code and determine how to make it properly support unicode characters, including directional single and double quotation marks.

Next AI folding challenge (Score:4, Funny)

by iikkakeranen ( 6279982 ) writes: on Monday November 30, 2020 @06:01PM (#60780012)

Laundry!

old news (Score:1)

by belibem ( 933117 ) writes:

https://www.google.com/amp/s/m... [google.com]

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Google? That makes me wonder (Score:2)

Re:Google? That makes me wonder (Score:4, Insightful)

Soon ... (Score:5, Funny)

Re: (Score:2)

Comment removed (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Soon ... (Score:4, Funny)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Really need more information here (Score:5, Interesting)

Re:Really need more information here (Score:4, Insightful)

Re:Really need more information here (Score:4, Interesting)

Here you go (Score:5, Informative)

Re: (Score:2)

We found ... (Score:2)

Re: (Score:1)

Folding at Home? (Score:3)

Re: (Score:2)

So... (Score:3)

Re: (Score:2)

Re: (Score:2)

Comment removed (Score:3)

Re:My question is..., (Score:5, Informative)

Re: (Score:2)

Re:My question is..., (Score:4, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Beware... (Score:4, Insightful)

Re: (Score:2)

First Principles (Score:2)

Re: (Score:3)

Background (Score:2)

Re: (Score:3)

Re: (Score:1)

Re: (Score:1)

Re: (Score:3)

Re: (Score:1)

Re: (Score:2)

Spaghetti on the wall (Score:2)

External links? (Score:2)

Soon... (Score:3)

Next AI folding challenge (Score:4, Funny)

old news (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals