Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Biotech AI Science

International Challenge To Computationally Interpret Protein Function 59

Shipud writes "We live in the post-genomic era, when DNA sequence data is growing exponentially. However, for most of the genes that we identify, we have no idea of their biological functions. They are like words in a foreign language, waiting to be deciphered. The Critical Assessment of Function Annotation, or CAFA, is a new experiment to assess the performance of the multitude of computational methods developed by research groups worldwide and help channel the flood of data from genome research to deduce the function of proteins. Thirty research groups participated in the first CAFA, presenting a total of 54 algorithms. The researchers participated in blind-test experiments in which they predicted the function of protein sequences for which the functions are already known but haven't yet been made publicly available. Independent assessors then judged their performance. The challenge organizers explain that: 'The accurate annotation of protein function is key to understanding life at the molecular level and has great biochemical and pharmaceutical implications, explain the study authors; however, with its inherent difficulty and expense, experimental characterization of function cannot scale up to accommodate the vast amount of sequence data already available. The computational annotation of protein function has therefore emerged as a problem at the forefront of computational and molecular biology.'"
This discussion has been archived. No new comments can be posted.

International Challenge To Computationally Interpret Protein Function

Comments Filter:
  • Dang. (Score:1, Offtopic)

    by Syelnicar ( 2831945 )
    Here I was, hoping for another Folding@Home.
    • I wouldn't rule it out. My understanding was that folding at home was brute force taking these sequences, testing all possible conformations, and seeing what was the lowest energy conformation. That's still what happens to actual proteins when they fold up, so it's not like the approach doesn't make sense.

      It's possible that some protein out there will cure a lot of cancers. It could be in platypus, or in some fungus in a desert, some coral, or some other exotic species. We're never going to test all
      • My understanding was that folding at home was brute force taking these sequences, testing all possible conformations, and seeing what was the lowest energy conformation.

        Incorrect. Folding@Home uses proteins whose structure (and usually function) is already exceptionally well characterized. That's how they can tell if their simulation actually worked. The point of the project isn't to predict the structure, because that's still extraordinarily difficult to do by purely physical simulation (as opposed to m

    • by Tablizer ( 95088 )

      That's when my wife asks me to help out with the laundry.

  • by tsa ( 15680 ) on Sunday February 03, 2013 @10:48PM (#42782537) Homepage

    I am not a biologist so forgive me my ignorance but when people say that DNA is the blueprint for an organism I never understand how a bunch of proteins can determine an organism's shape and behavior. Aren't there more factors that determine those things, like the surroundings in which the DNA is used, like chemicals that the growing organism is surrounded with, temperature, etc?

    • by tsa ( 15680 )

      BTW I do know that DNA codes for proteins and that the proteins plus certain self-assembly mechanisms account for most of the work done in growing an organism. But there my knowledge ends.

    • by Anonymous Coward

      I am not a biologist so forgive me my ignorance but when people say that DNA is the blueprint for an organism I never understand how a bunch of proteins can determine an organism's shape and behavior. Aren't there more factors that determine those things, like the surroundings in which the DNA is used, like chemicals that the growing organism is surrounded with, temperature, etc?

      I think the details are not fully understood. However, in answer to your question I think nature and nurture both play a role. Lots of research has been done on identical twins who have the same DNA. Lots of research has also been done on dizygotic twins who do not share the same DNA. We know that identical twins look the same until the environment changes them. For example, if one of the twins works out to become a body builder, the pair will look quite different. We also know dizygotic twins look di

    • The proteins are what moves and shapes the cells and thereby the organism. Literally. The proteins are what a lot of the cell is made up of, it's what gives the cell it's structure, and they're all the motors in the cell. Cells are mostly water, and they have lipid envelopes, but what makes them more than bubbles is proteins, which are set by DNA. Environment, like nutrition, can have dramatic effects on the final product, but genetics is really what determines what the product is. There's no combinati
    • I am not a biologist so forgive me my ignorance but when people say that DNA is the blueprint for an organism I never understand how a bunch of proteins can determine an organism's shape and behavior. Aren't there more factors that determine those things, like the surroundings in which the DNA is used, like chemicals that the growing organism is surrounded with, temperature, etc?

      You're absolutely right. Microenvironment -- the cell's chemical, mechanical, and physical environment, determines which genes are

    • by EvilSS ( 557649 )
      Think of it like building a chain restaurant. You don't grab a blueprint of a building and run off and "poof", you have a fully functioning business. There is a whole process that surrounds it. Find a location, get permits, contract out the work, hire staff, advertise, etc. With a chain, the process is fairly standardized each time, with some minor (hopefully, at least at the individual level) variations. It's kind of the same with an organism. The DNA isn't so much the blueprint, it's the entire project
    • by Anonymous Coward

      Biologist here.

      Proteins do all the work. Here's the background:

      DNA data is transcribed (think of DNA as a sequence of information, stream of bytes, if that helps) to mRNA (the m stands for 'messenger'). The DNA has twice the redundancy, if you will, as the mRNA. The DNA is for long-term storage, and the mRNA serves as a template for protein production. DNA is read to make mRNA, which is in turn read (executed, perhaps? I'm bad at analogies) to create proteins. There are molecular machines that perform

  • That is all nice, but most of these prediction algorithms are based on one or more of the following assumptions, which are not always true:

    • 1. We have accurate mapping of the genes.
    • 2. We can predict the protein sequence from the sequence of the gene.
    • 3. One protein can not be the product of two genes.
    • 4. We have a good understanding of what the functions of the proteins in the training set are.
    • 5. If two proteins have similar sequence, they must have similar functions.
    • 6. One protein has one function.
    • 7. A prote
    • Re:Assumptions (Score:4, Informative)

      by the biologist ( 1659443 ) on Monday February 04, 2013 @01:17AM (#42783249)

      1. We have accurate mapping of the genes.

      We have a pretty good idea on this one. Specific polymerases have specific sequences which they respond to, defining the start sequences of genes. It is possible we have missed some polymerase, but the likelihood is low given the extensive searches which have been done for them. As well, regions which are genes have a distinctively different character than regions which are not genes (at least in the general sense).

      2. We can predict the protein sequence from the sequence of the gene.

      We also have a pretty good idea about this, due to decades and decades of biologists trying to figure out the answer to this problem. The genetic code turns out to differ in some organisms from what we think of as the default. Sometimes multiple amino acids are coded for by the same sequence of bases, and so multiple proteins are produced from the identical coding region of DNA. Sometimes proteins are produced with modified amino acids, which are not explicitly coded for in the DNA of the gene, but rather by the activity of other proteins defined elsewhere by DNA. (This is a stochastic process and interference in the distribution of outcomes can sometimes result in pathological consequences.) In some organisms, the DNA is decompressed into RNA which is then translated into protein in a more typical way. (Extra bases are incorporated into the RNA in a repeatable way that results in amino acids added which were not defined in the sequence of DNA of the gene being added to proteins.) There's a whole bunch of stuff on alternate splicing, which we explicitly know that we don't know how to predict, that produces variations in protein sequence from a single gene sequence.

      3. One protein can not be the product of two genes.

      There are plenty of ways in which two separate genes can produce an identical protein. This actually happens ALL THE TIME in mammals, since we have two copies of every gene and most of these pairs have identical sequence. Even if the genes produce the identical protein through different mechanisms, if the protein is identical... then the protein is identical.

      4. We have a good understanding of what the functions of the proteins in the training set are.

      We do have a good idea of what the functions of the proteins in the training set are. See all of molecular biology for your citations.

      5. If two proteins have similar sequence, they must have similar functions.

      This is explicitly known to be false and is not expected under the evolutionary model. Look up the category of proteins known as 'crystalins' for a specific case counter to your assumption.

      6. One protein has one function.

      It is generally thought that there is a primary function for every protein. All things in biology are fuzzy, such that every protein probably has secondary side reactions or functions which may or may not be biologically relevant. (Arsenic is poisonous to us because our enzymes have a hard time distinguishing it from Phosphorous, so the enzymes which incorporate phosphorous also 'function' to incorporate arsenic.)

      7. A protein has a function.

      Any protein synthesized by a cell costs energy. Under the evolutionary model of biology, proteins which don't have a function should have been discarded because their synthesis was wasting energy. That said, lots and lots of proteins are continuously created and then rapidly degraded because they were improperly folded or had other problems which brought them to the attention of intracellular systems with the 'function' of degrading such errant protein and returning their components to the cell for more productive use. Some genetic diseases are the consequence of the buildup of proteins which are otherwise non-symptomatic, but don't get degraded properly by the degradation systems.

      • In short, biologists are aware of the limitations of their assumptions and have some solid idea as to when their assumptions are valid or not.

        Doing the bioinformatics will help a researcher sort through the dramatically large number of gene sequences to find a set which is likely enriched for the characteristic they are looking for. They know they will miss interesting cases which don't match the models used. Without these sorts of predictions, they would have to rely on random guessing as a strategy wi

      • As I am not a biologist, feel free to correct anything I say here.

        ---

        As I understand it, the ribosome is responsible for taking the RNA and creating the protein. IIRC, it also folds the protein. A single protein can be folded in a few different ways to produce different building blocks.

        There is also some sort of checking mechanism that checks for proper sequence and proper folding. If there is an error, the constructed/folded protein is broken down and the process is retried. Sometimes, the error ch

        • The ribosome is a complex of protein and ribosomal RNA (rRNA). The catalytic subunit of the ribosome, which adds new amino acids to the nascent protein, is the rRNA. A single protein can be folded an infinite number of ways, but only a small subset of that possibility is stable. Proteins which have failed to fold 'properly' will be bound by 'heat shock proteins' (HSPs) which assist the new protein in folding. These complexes provide some buffering against the problems of incorrectly manufactured or

      • by pesho ( 843750 )

        1. We have accurate mapping of the genes.

        We have a pretty good idea on this one. Specific polymerases have specific sequences which they respond to, defining the start sequences of genes. It is possible we have missed some polymerase, but the likelihood is low given the extensive searches which have been done for them. As well, regions which are genes have a distinctively different character than regions which are not genes (at least in the general sense).

        You justify an assumption with assumption. The core promoter sequences are so degenerate that they can be found pretty much anywhere. This has lead to misannotation of long genes as multiple single genes. There are a number other causes of annotation errors.

        • Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies Alexandra M. Schnoes, Shoshana D. Brown, Igor Dodevski, Patricia C. Babbitt
        • Misannotations of rRNA can now generate 90% false positive protein matc
        • You justify an assumption with assumption. The core promoter sequences are so degenerate that they can be found pretty much anywhere. This has lead to misannotation of long genes as multiple single genes. There are a number other causes of annotation errors.

          • Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies Alexandra M. Schnoes, Shoshana D. Brown, Igor Dodevski, Patricia C. Babbitt
          • Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies. Tripp HJ, Hewson I, Boyarsky S, Stuart JM, Zehr JP.

          There are also numerous examples of manually curated entries that are wrong because people studied non-existent proteins as a result of cloning artifacts or ignoring nonsense mediated decay. Here is one example where a transcripts containing unspliced introns that are eliminated by NMD have been studied and ascribed a function Zhu J, Chen X. MCG10, a novel p53 target gene that encodes a KH domain RNA-binding protein, is capable of inducing apoptosis and cell cycle arrest in G(2)-M. Mol Cell Biol. 2000 Aug;20(15):5602-18. (accessions AF257770, AF257771)

          Those long single genes which are sometimes miss-annotated as a series of smaller genes... are sometimes transcribed as a long single gene and sometimes as a series of smaller genes. You've primarily pointed out that biology is hard and that most published papers are full of crap.

          Your pretty good idea is applicable to about 60% of the long reading frames and even less applicable to short ORFs: Ingolia NT, Lareau LF, Weissman JS. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell. 2011 Nov 11;147(4):789-802.. Mind you this does not include processes like RNA editing, that can further complicate how we predict protein sequence based on gene sequence.

          This counter-argument doesn't counter my argument.

          I wasn't commenting on ploidity. I had in mind things like trans-splicing, where you assemble mature RNA from transcripts that belong to different genes sometimes located on different chromosomes, or the way protozoan genomes are rearranged prior to expression in the macronucleus.

          I wasn't commenting on ploidy either. Protozoans do things in all sorts of ways, most of which we have no idea about... and don't care about for the most part. The knowledg

  • ...a post-genomic world be one in which we had stopped fiddeling with genes and DNA and such?

    Aren't we more in the midst of a Genomics Revolution?

    Or more accurately, we are in the infancy of the Genomics Revolution.

  • This TED.com talk by Danny Hillis is informative on this topic, http://www.ted.com/talks/danny_hillis_two_frontiers_of_cancer_treatment.html [ted.com] "Danny Hills makes a case for the next frontier of cancer research: proteomics, the study of proteins in the body. As Hillis explains it, genomics shows us a list of the ingredients of the body -- while proteomics shows us what those ingredients produce. Understanding what's going on in your body at the protein level may lead to a new understanding of how cancer happ

Some people manage by the book, even though they don't know who wrote the book or even what book.

Working...