Forgot your password?
typodupeerror
Biotech Science

Sequencing a Human Genome In a Week 101

Posted by kdawson
from the data-data-everywhere dept.
blackbearnh writes "The Human Genome Project took 13 years to sequence a single human's genetic information in full. At Washington University's Genome Center, they can now do one in a week. But when you're generating that much data, just keeping track of it can become a major challenge. David Dooling is in charge of managing the massive output of the Center's herd of gene sequencing machines, and making it available to researchers inside the Center and around the world. He'll be talking about his work at OSCON, and gave O'Reilly Radar a sense of where the state of the art in genome sequencing is heading. 'Now we can run these instruments. We can generate a lot of data. We can align it to the human reference. We can detect the variance. We can determine which variance exists in one genome versus another genome. Those variances that are cancerous, specific to the cancer genome, we can annotate those and say these are in genes. ... Now the difficulty is following up on all of those and figuring out what they mean for the cancer. ... We know that they exist in the cancer genome, but which ones are drivers and which ones are passengers? ... [F]inding which ones are actually causative is becoming more and more the challenge now.'"
This discussion has been archived. No new comments can be posted.

Sequencing a Human Genome In a Week

Comments Filter:
  • by QuantumG (50515) * <qg@biodome.org> on Monday July 13, 2009 @07:45PM (#28684431) Homepage Journal

    Typically they sequence every base at least 30 times.

  • by blackbearnh (637683) * on Monday July 13, 2009 @07:46PM (#28684443)
    I wondered the same thing, so I asked. From the article: And between two cells, one cell right next to the other, they should be identical copies of each other. But sometimes mistakes are made in the process of copying the DNA. And so some differences may exist. However, we're not at present currently sequencing single cells. We'll collect a host of cells and isolate the DNA from a host of cells. So what you end up is with when you read the sequence out on these things is, essentially, an average of this DNA sequence. Well, I mean it's digital in that eventually you get down to a single piece of DNA. But once you align these things back, if you see 30 reads that all align to the same region of the genome and only one of them has an A at the position and all of the others have a T at that position, you can't say whether that A was actually some small change between one cell and its 99 closest neighbors or whether that was just an error in the sequencing. So it's hard to say cell-to-cell how much difference there is. But, of course, that difference does exist, otherwise that's mutation and that's what eventually leads to cancer and other diseases.
  • by SlashBugs (1339813) on Monday July 13, 2009 @08:00PM (#28684603)
    Data handling and analysis is becoming a big problem for biologists generally. Techniques like microarray (or exon array) analysis can tell you how strongly a set of genes (tens of thousands, with hundreds of thousands of splice variants) are being expressed under given conditions. But actually handling this data is a nightmare, especially as a lot of biologists ended up there because they love science but aren't great at maths. Given a list of thousands of genes, teasing out the statistically significantly different genes from the noise is only the first step. Then you have to decide what's biologically important (e.g. what's the prime mover and what's just a side-effect), and then you have a list of genes which might have known functions but more likely have just a name or even a tag like "hypothetical ORF #3261", for genes that are predicted by analysis of the genome but have never been proved to actually be expressed. After this, there's the further complication that these techniques only tell you what's going on at the DNA or RNA level. The vast majority of genes only have effects when translated into protein and, perhaps, further modified, meaning that you cant's be sure that the levels you're detecting by the sequencing (DNA level) or expression analysis chips (RNA level) actually reflects what's going on in the cell.

    One of the big problems studying expression patterns in cancer specifically is the paucity of samples. The genetic differences between individuals (and tissues within individuals) means there's a lot of noise underlying the "signal" of the putative cancer signatures. This is especially true because there are usually several genetic pathways that a given tissue can take to becoming cancerous: you might only need mutations in a small subset of a long list of genes, which is difficult to spot by sheer data mining. While cancer is very common, each type of cancer is much less so; therefore the paucity of available samples of a given cancer type in a given stage makes reaching statistical significance very difficult. There are some huge projects underway at the moment to collate all cancer labs' samples for meta-analysis, dramatically increasing the statistical power of the studies. A good example of this is the Pancreas Expression Database [pancreasexpression.org], which some pacreatic cancer researchers are getting very excited about.
  • DNA is digital (Score:2, Informative)

    by EndoplasmicRidiculus (1181061) on Monday July 13, 2009 @09:25PM (#28685195)
    Four bases and not much in between.
  • by cbailster (806961) on Tuesday July 14, 2009 @06:56AM (#28688593)

    Fingerprinting doesn't rely on DNA sequencing, but does rely on the DNA sequence being different between people. Everyone's DNA contains subtle differences (particularly in the non-coding DNA regions). These differences can be exploited by various laboratory techniques to produce small pieces of DNA which will be of different sizes because of these differences. When these fragments of DNA are run down a suitable gel (usually agarose, a substance derived from seaweed) under an electric current the fragments will separate by size. The pattern of fragments formed will be unique for each individual.

    Several fingerprinting techniques rely on what most programmers would best recognise as regular expression matching. For example there are enzymes in biology which will recognise certain DNA sequences but not others, and will cut the DNA in two where ever this sequence is matched. (in perl:

    my @dna_fragments = split /GAATTC/, $my_dna;

    is the equivalent of what an enzyme called EcoRI [wikipedia.org] does). Not everyone will have the same numbers of this sequence in their DNA, and nor will they be in the same place, thus the number and size of fragments will differ. By using a suitable range of such enzymes you can generate a pattern of DNA fragments which is sufficiently unique as to identify a single person amongst a population of several billion.

    for more information take a look at DNA Profiling [wikipedia.org] on wikipedia

The economy depends about as much on economists as the weather does on weather forecasters. -- Jean-Paul Kauffmann

Working...