International Challenge To Computationally Interpret Protein Function 59
Shipud writes "We live in the post-genomic era, when DNA sequence data is growing exponentially. However, for most of the genes that we identify, we have no idea of their biological functions. They are like words in a foreign language, waiting to be deciphered. The Critical Assessment of Function Annotation, or CAFA, is a new experiment to assess the performance of the multitude of computational methods developed by research groups worldwide and help channel the flood of data from genome research to deduce the function of proteins. Thirty research groups participated in the first CAFA, presenting a total of 54 algorithms. The researchers participated in blind-test experiments in which they predicted the function of protein sequences for which the functions are already known but haven't yet been made publicly available. Independent assessors then judged their performance. The challenge organizers explain that: 'The accurate annotation of protein function is key to understanding life at the molecular level and has great biochemical and pharmaceutical implications, explain the study authors; however, with its inherent difficulty and expense, experimental characterization of function cannot scale up to accommodate the vast amount of sequence data already available. The computational annotation of protein function has therefore emerged as a problem at the forefront of computational and molecular biology.'"
Re:No idea... like words in a foreign language (Score:5, Insightful)
Re:A plan of action (Score:5, Insightful)
Without a good plan, we'll be at it for decades. Here's what I think genomic researchers should do.
Genes (and proteins) are obviously organized hierarchically. Which means there must be a control hierarchy in there somewhere. To unravel and properly classify the genome, researchers must first identify and understand the hierarchical control system. Only then can they begin to populate the branches with the correct genes.
After the tree is completely built and all the genes have found their correct locations on the tree, then it's a matter of going through the tree from the top down and switching the branches of the tree off/on one at a time to see what happens. It's hard but it can be done.
Unfortunately there doesn't have to be "a" control hierarchy: each subsystem can have its own hierarchy (or none) that uses its own unique control mechanisms, they don't have to operate by the same rules, they can mess with each other by lots of different ad hoc means. And that's just the genes: the proteins are much harder to model, at least as far as useful predictions go.
It's been ad hoc with no code review for over 3 billion years.