Forgot your password?
typodupeerror
Biotech AI Science

International Challenge To Computationally Interpret Protein Function 59

Posted by samzenpus
from the working-together dept.
Shipud writes "We live in the post-genomic era, when DNA sequence data is growing exponentially. However, for most of the genes that we identify, we have no idea of their biological functions. They are like words in a foreign language, waiting to be deciphered. The Critical Assessment of Function Annotation, or CAFA, is a new experiment to assess the performance of the multitude of computational methods developed by research groups worldwide and help channel the flood of data from genome research to deduce the function of proteins. Thirty research groups participated in the first CAFA, presenting a total of 54 algorithms. The researchers participated in blind-test experiments in which they predicted the function of protein sequences for which the functions are already known but haven't yet been made publicly available. Independent assessors then judged their performance. The challenge organizers explain that: 'The accurate annotation of protein function is key to understanding life at the molecular level and has great biochemical and pharmaceutical implications, explain the study authors; however, with its inherent difficulty and expense, experimental characterization of function cannot scale up to accommodate the vast amount of sequence data already available. The computational annotation of protein function has therefore emerged as a problem at the forefront of computational and molecular biology.'"
This discussion has been archived. No new comments can be posted.

International Challenge To Computationally Interpret Protein Function

Comments Filter:
  • Re:Assumptions (Score:4, Informative)

    by the biologist (1659443) on Monday February 04, 2013 @02:17AM (#42783249)

    1. We have accurate mapping of the genes.

    We have a pretty good idea on this one. Specific polymerases have specific sequences which they respond to, defining the start sequences of genes. It is possible we have missed some polymerase, but the likelihood is low given the extensive searches which have been done for them. As well, regions which are genes have a distinctively different character than regions which are not genes (at least in the general sense).

    2. We can predict the protein sequence from the sequence of the gene.

    We also have a pretty good idea about this, due to decades and decades of biologists trying to figure out the answer to this problem. The genetic code turns out to differ in some organisms from what we think of as the default. Sometimes multiple amino acids are coded for by the same sequence of bases, and so multiple proteins are produced from the identical coding region of DNA. Sometimes proteins are produced with modified amino acids, which are not explicitly coded for in the DNA of the gene, but rather by the activity of other proteins defined elsewhere by DNA. (This is a stochastic process and interference in the distribution of outcomes can sometimes result in pathological consequences.) In some organisms, the DNA is decompressed into RNA which is then translated into protein in a more typical way. (Extra bases are incorporated into the RNA in a repeatable way that results in amino acids added which were not defined in the sequence of DNA of the gene being added to proteins.) There's a whole bunch of stuff on alternate splicing, which we explicitly know that we don't know how to predict, that produces variations in protein sequence from a single gene sequence.

    3. One protein can not be the product of two genes.

    There are plenty of ways in which two separate genes can produce an identical protein. This actually happens ALL THE TIME in mammals, since we have two copies of every gene and most of these pairs have identical sequence. Even if the genes produce the identical protein through different mechanisms, if the protein is identical... then the protein is identical.

    4. We have a good understanding of what the functions of the proteins in the training set are.

    We do have a good idea of what the functions of the proteins in the training set are. See all of molecular biology for your citations.

    5. If two proteins have similar sequence, they must have similar functions.

    This is explicitly known to be false and is not expected under the evolutionary model. Look up the category of proteins known as 'crystalins' for a specific case counter to your assumption.

    6. One protein has one function.

    It is generally thought that there is a primary function for every protein. All things in biology are fuzzy, such that every protein probably has secondary side reactions or functions which may or may not be biologically relevant. (Arsenic is poisonous to us because our enzymes have a hard time distinguishing it from Phosphorous, so the enzymes which incorporate phosphorous also 'function' to incorporate arsenic.)

    7. A protein has a function.

    Any protein synthesized by a cell costs energy. Under the evolutionary model of biology, proteins which don't have a function should have been discarded because their synthesis was wasting energy. That said, lots and lots of proteins are continuously created and then rapidly degraded because they were improperly folded or had other problems which brought them to the attention of intracellular systems with the 'function' of degrading such errant protein and returning their components to the cell for more productive use. Some genetic diseases are the consequence of the buildup of proteins which are otherwise non-symptomatic, but don't get degraded properly by the degradation systems.

"Now this is a totally brain damaged algorithm. Gag me with a smurfette." -- P. Buhr, Computer Science 354

Working...