Submitting a review for consideration is easy; please first read Slashdot's book review guidelines. Updated: 2008114 by samzenpus
All trademarks and copyrights on this page are owned by their respective owners. Comments are owned by the Poster. The Rest © 1997-2009 Geeknet, Inc.
DNA GATC (Score:5, Funny)
Functions that don't do anything, no comments, worst piece of code ever!
I say we fork and refactor the entire project.
Re:DNA GATC (Score:5, Interesting)
'I say we fork and refactor the entire project.'
You mean like this?:
http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=16729053 [nih.gov]
Parent
How do you know it's NOT comments? (Score:2)
Functions that don't do anything, no comments, worst piece of code ever!
Most of it doesn't code proteins or any of the other things that have been reverse-engineered so far. How do you know it's NOT comments?
(And if terrestrial life was engineered and it IS comments, do they qualify as "holy writ"?)
Re: (Score:2)
There was some SF book I read, where it was explained that the comments were "made by a demo version of creature editor" and that was the reason for humans to die after 100 years. Some hacker has then found a way to reset the demo counter and thus to make people live forever.
Re: (Score:2)
Like this girl [go.com]?
Re: (Score:2)
How do you know it's NOT comments?
Come on, how many programmers do you know that write comments, meaningful or not? I personally have a massive descriptive dialogue running down the side. "Real" programmers have told me that is excessive. Looking at their code I find one comment every 20 to fifty lines, and descriptive identifiers, like i, x or y. The genome will be just like that. (Also, given that any big project ends up with lots of dead code. (yes, I know the compiler identifies that, but ...)
Re: (Score:2)
Re: (Score:2)
At least it's backed up well. 3 backups of almost everything ain't bad.
Two strands on each chromesome... I'm probably in the wrong crowd of nerds...
Re: (Score:3, Funny)
Re: (Score:2)
Actually it's just the arrogance of some scientist. Who later found out, that all those parts who seemingly did not do anything, were in fact just as relevant. Just in a different way. Whoops!
Here's what I want to know... (Score:3, Insightful)
Re:Here's what I want to know... (Score:4, Informative)
Typically they sequence every base at least 30 times.
Parent
Re:Here's what I want to know... (Score:4, Informative)
Parent
Re: (Score:2)
It can get even more complicated too: if you have 10x coverage of a position, and 9 say T while 1 says G, it may be an allelic variation. There's a one in 16 chance this'll happen randomly instead of 5 Ts and 5Gs as you expect.
Re: (Score:2)
Re: (Score:3, Interesting)
"Suppose they sequence a specific human's genome. Now they do it again. Will the two sequences be the same?"
Not [wikipedia.org] necessarily [wikipedia.org]. ;-)
Re: (Score:2)
Suppose they sequence a specific human's genome. Now they do it again. Will the two sequences be the same?
They should be. An individual's genome does not change over time. Gene expression can change, which can itself lead to significant problems such as cancer.
Re: (Score:2)
The genome sure as hell changes
Not necessarily. The genome refers specifically to the genes encoded by DNA; mutations can also occur in the non-coding regions. Indeed the non-coding regions are often the most critical for gene expression.
Hence a non-genomic mutation can have a profound effect on gene expression.
lots of mutations happening all the time in probably every cell of our body
Also not necessarily true. For example, a non-dividing cell has no reason to duplicate its own genome, hence it has almost no chance to acquire mutations.
That in turn causes your gene expressions to change since they're also, to a large extent, controlled by the genome.
As I already described, much of gene expression is regulated by non c
Re: (Score:2)
The genome refers specifically to the genes encoded by DNA;
Genome(gene+[chromos]om) refers to all the genetic material, coding and non-coding, gene or not. It has meant that for the past 90 years, before non-coding regions were even thought of I'm guessing, and last I checked it hasn't been redefined. You have very nicely explained why having it refer to simply coding regions is stupid. If I'm wrong then I'd love to know but you'll need to provide me a reference.
The definition of a gene on the other hand may be changing to include both coding and non-coding regions
Re: (Score:2)
Just to add so we're all clear on definitions, as I understand it we're talking about these regions of dna here:
a) rna/protein coding region
b) transcribed non-coding regions (introns)
c) regulatory region
d) unknown/junk/other non-coding region
I suspect there's some additional stuff I missed but it's been a while since I cared too much about this.
In my molecular biology classes "gene" was used to refer to a, b and c as they relate to a protein. To be honest that someone would t
Re: (Score:2)
I would suggest that you spend some time studying the topic in more detail before you make comments on /.
At all the genome conferences I've been to the "genome" includes everything -- the chromosome number and architecture, the coding, regulatory & non-coding regions (tRNA, rRNA, miRNA/siRNA, telomere length, etc.). But the non-coding, highly variable parts of the genome can be considered part of the "big picture" because the amount of *really* junk DNA may function as a free radical "sink" which prot
Re: (Score:2)
Suppose they sequence a specific human's genome. Now they do it again. Will the two sequences be the same.
You're talking about for different individuals? There will be differences, yes, but most of that difference should be in non-coding regions. The actual regions making proteins should be nearly identical. I only work with a few DNA sequences that code for proteins, so that's all I'd be interested in, but there are other applications for medicine that the variation in non-coding regions would be important.
How about storing it in analog format? (Score:5, Funny)
Just store all that data as a chemical compound. Maybe a nucleic acid of some kind? Using two long polymers made of sugars and phosphates? I bet the whole thing could be squeezed into something smaller than the head of a pin!
DNA is digital (Score:2, Informative)
Re: (Score:2)
Plus histones, methylation, imprinting, a few thousand proteins, and a few pieces of RNA to bootstrap the transcription/translation.
Data analysis a rapidly growing problem in Biology (Score:5, Informative)
One of the big problems studying expression patterns in cancer specifically is the paucity of samples. The genetic differences between individuals (and tissues within individuals) means there's a lot of noise underlying the "signal" of the putative cancer signatures. This is especially true because there are usually several genetic pathways that a given tissue can take to becoming cancerous: you might only need mutations in a small subset of a long list of genes, which is difficult to spot by sheer data mining. While cancer is very common, each type of cancer is much less so; therefore the paucity of available samples of a given cancer type in a given stage makes reaching statistical significance very difficult. There are some huge projects underway at the moment to collate all cancer labs' samples for meta-analysis, dramatically increasing the statistical power of the studies. A good example of this is the Pancreas Expression Database [pancreasexpression.org], which some pacreatic cancer researchers are getting very excited about.
Re:Data analysis a rapidly growing problem in Biol (Score:2)
The vast majority of genes only have effects when translated into protein
That depends on your definition. If you define a gene as "stretch of DNA that is translated into protein," which until fairly recently was the going definition, then of course your statement is tautologically true (replacing "the vast majority of" with "all.") But if you define it as "a stretch of DNA that does something biologically interesting," then it's no longer at all clear. Given the number of regulatory elements not directly
Buttload of data (Score:2, Interesting)
Humans have ~810.6 MiB of DNA (Score:2, Interesting)
Re: (Score:2)
Re: (Score:2)
So, what's going on here? Are the file formats used to store this data *that* bloated?
<genome species="human">... ;-)
Re: (Score:2, Interesting)
I also manage a Next-gen Sequencing Machine (Score:3, Interesting)
Next gen sequencing eats up huge amounts of space. Every run on our Illumina Genome Analyzer II machine takes up 4 terabytes of intermediate data, most of which comes from the something like 100,000+ 20 Mb bitmap picture files taken from the flowcells. All that much data is an ass load of work to process. Just today I got a little lazy with my Perl programming and let the program go unsupervised...and it ate up 32 gb of ram and froze up the server. Took redhat 3 full hours to decide it had enough of the swapping and kill the process.
For people not familiar with current generation sequencing machines, they can scan between 30-80 bp reads and use alignment programs to match up the reads to species databases. The reaction/imaging takes 2 days, prep takes about a week, processing images takes another 2 days, alignment takes about 4. The Illumina machine achieves higher throughput than the ABI ones but gives shorter reads; we get about 4 billion nt per run if we do everything right. Keep in mind though, that 4 billion that they mention in the summary is misleading: the read cover distribution is not uniform (ie you do not cover every nucleotide of the human's 3 billion nt genome). To ensure 95%+ coverage, you'd have to use 20-40 runs on the Illumina machine...in other words, about 6-10 months of non-stop work to get a reasonable degree of coverage over the entire human genome (at which point you can use programs to "assemble" the reads in a contiguous genome). WashU is very wealthy so they have quite a few of these machines available to work at any given time.
the main problem these days is that processing all that much data requires a huge amount of computer knowhow (writing software, algorithms, installing software, using other people's poorly documented programs), and a good understanding of statistics and algorithms, especially when it comes to efficiency. Another problem they never mention are artifacts from the chemical protocol; just the other day we found a very unusual anomaly that indicated the first 1/3 of all our reads was absolutely crap (usually only the last few bases are unreliable); turned out our slight modification of the Illumina protocol to tailor it to studying epigenomic effects had quite large effects of the sequencing reactions later on. Even for good reads, a lot of the bases can be suspect so you have to do a huge amount of averaging, filtering, and statistical analysis to make sure your results/graphs are accurate.
Genome as a cause? (Score:2)
Well, how about pollution, processed food, and all that trash being the main reason we get cancer?
Cancer was not even a known disease, a century ago, because nobody had it. (And if people get cancer now, way before the average age of death a century ago, then it can't be that it is because we now get older.)
But I guess there is no money in that. Right?
wow. read a book. (Score:3, Insightful)
First, kinds of cancers were known to exist a century ago. Tumors and growths were not unheard of. Most childhood cancers killed quickly and were undiagnosed as specific disease other than "wasting away". When the average lifespan was 30-40 years, a great many other cancers were not present because people didn't live long enough to die from them.
As we cure "other" diseases, cancers become more likely causes of death. Cells fail to divide perfectly, some may go cancerous others simply don't produce as
I want to copyright my dna. Then, it can't be.... (Score:2)
...used against me for anything without violating the DMCA. The act of decoding it by some forensics lab paternity test or future insurance company medical cost profile would become unlawful and I'm sure the RIAA would help me with the cost of prosecuting the lawsuit.
Where's Nedry ? (Score:2)
Check the vending machines !
Re: (Score:2)
Illumina will sequence your genome for $48,000.
http://scienceblogs.com/geneticfuture/2009/06/illumina_launches_personal_gen.php [scienceblogs.com]
Details.
Re: (Score:3, Interesting)
a whole human genome will fit on a CD.
if you just transmit the diffs from the generic human you could put it in an e-mail
Re:Passing this data back to the scientist (Score:4, Insightful)
I suppose it's worth noting that the intermediate (raw) data sets can get pretty large. they are actually getting larger as the trend goes towards shorter less informative "reads" that require more of them to recover the connective information and to recover from errors and duplications. However that's a tend that has a stopping point. While more reads is better at some point there is almost no added value from more reads. So at that point that's the maximum amount of data you need to collect. it's won't increase ever. meanwhile hard drive and network speeds will go up factors of ten.
thus the storage issues here are well tolerated at present and soon will become trivial.
Parent
Re: (Score:2)
This actually suggests that perhaps we should start transmitting into space or on space crafts the genome of all the genes ever sequence, even the ones hauled out of the ocean that we don't know what organism they belong too. you send that, plus the molecular composition of DNA, and the molecular structure of the ribosome and T-rna
while there's more to a cell than just that, it's well known that in virto you can get transciption of the DNA from just that. It won't be too long I suspect before you could co
Re: (Score:2)
yes, let's give those aliens something to experiment on, so they can figure out what bugs to send our way to exterminate us :)
Money well spent (Score:5, Insightful)
We pissed away $3 billion dollars and 13 years of time, when we could have waited a few more years and got it done in a week, and much, much cheaper. What a waste of time and money that was....
I know I'm being trolled, but you're an idiot. It's pretty obvious that the ability to sequence the genome in a week could only result from techniques developed and information gathered in the original Human Genome project.
Parent
Re: (Score:2)
It doesn't really look like a troll, more a facetious back-handed complement.
Re:We pissed away $3 billion dollars (Score:5, Insightful)
What's funny is that there is actually people who think like that. Apparently if we just sit around and wait, things will get better. I call this the dark side of the "invisible hand" of the market.. because it is invisible, people forget how it comes about. In order to get improvement in technology you need a market for that technology. And, typically, you need some loss-leader to create the market in the first place. Government funding serves this purpose well.
Parent
Re: (Score:2)
What's funny is that there is actually people who think like that. Apparently if we just sit around and wait, things will get better. I call this the dark side of the "invisible hand" of the market.. because it is invisible, people forget how it comes about. In order to get improvement in technology you need a market for that technology. And, typically, you need some loss-leader to create the market in the first place. Government funding serves this purpose well.
The sad thing is that this seems to be pretty much par for the course. If only we wait just a little while and skip all those annoying intermediate steps, we will soon have fantastically good rockets / fusion reactors / whatever else without having to pay anything...
Re:Moore's law at work? (Score:4, Interesting)
Parent
Re: (Score:2)
What does "sequence a genome" actually mean. The name "sequence" suggests that it has something to do with the "order" of something. Your post makes it sound like sequencing is something done before the computer gets ahold of the data. Can you explain for us genetics laypersons what the heck "sequencing" is? Tnx.
Very simply: Your DNA is stored in chromosomes. Each chromosome contains DNA in tight bundles with lots of weird secondary and tertiary structure. Suppose that you took all the chromosomes from o
Re: (Score:2)
If it has taken 13 years until recently to properly sequence the genome of a single human, how has it been possible to do DNA "fingerprinting" e.g. for crime investigations? Is the actual sequencing not required there?
Re: (Score:2, Informative)
Fingerprinting doesn't rely on DNA sequencing, but does rely on the DNA sequence being different between people. Everyone's DNA contains subtle differences (particularly in the non-coding DNA regions). These differences can be exploited by various laboratory techniques to produce small pieces of DNA which will be of different sizes because of these differences. When these fragments of DNA are run down a suitable gel (usually agarose, a substance derived from seaweed) under an electric current the fragments