

Celera Opens Up DNA Database 181
greenplato writes "Thirty billion base pairs from the sequences of humans, mice, and rats that were available only by subscription to Celera's DNA database are being put into the public domain. Celera will donate this information to a 'federally run database,' presumably GenBank. Francis Collins, head of the National Human Genome Research Institute, notes that 'data just wants to be public.' Stories in BusinessWeek and The New York Times."
Shouldn't that be (Score:5, Insightful)
Re:Shouldn't that be (Score:1, Offtopic)
Here in the USA, no. Yet another reason why it doesn't pay to be a grammar nazi.
Re:Shouldn't that be (Score:2, Funny)
Re:Shouldn't that be (Score:3, Funny)
Okay, it's probably just me but when I read that I had a vision of Brent Spiner rattling the bars of a cage yelling "Picard, get your bald ass down here, Data want to be free!"
Re:Shouldn't that be (Score:2)
(On the flip-side, this is excellent news. Researchers have a long history of putting things in the public domain - they have been the main driving force behind the idea - and it is most excellent that commercial researchers are beginning to realize that this isn't purely by chance.)
Re:Shouldn't that be (Score:2)
In British English, the populace want to be free. In American English, the populace wants to be free. Limeys think that a collection is a set; Yanks think it's a singular.
Re:Shouldn't that be (Score:2)
from the summary (Score:5, Funny)
Data hates when you anthropomorphize it.
Re:from the summary (Score:1)
Re:from the summary (Score:1)
Re:from the summary (Score:2)
Re:from the summary (Score:3, Funny)
Re:from the summary (Score:2)
I thought he wanted to be human!?
I don't think it wants to be free. (Score:3, Insightful)
Re:I don't think it wants to be free. (Score:1)
If that has already happened, then I can see why they are releasing the information.
Okay, I just RTA and it turns out that the subscriptions just weren't profitable to continue doing it.
Re:I don't think it wants to be free. (Score:4, Interesting)
Considering the millions of dollars that Celera invested in gene sequencing, it should at least have the opportunity to make back that money.
If he were creating something new then perhaps, but it was just a land grab. The DNA was there and they tried to patent as much of it as possible. It reminds me of the Eddie Izzard skit when the Europeans claim America and the Indians say, "but it's here, you know, we're using it, how can it be yours?" And the Europeans say, "but ah, have you got a flag?"
Replace flag with patent. You might as well say that the Spaniards spent a lot of money colonizing Peru so they deserved all the gold. This is DNA! It belongs to no individual or corporation. I want access to my source code for whatever purposes I choose.
Oh No! (Score:5, Funny)
Re:Oh No! (Score:1)
Or would they just use the return value 1?
Re:Oh No! (Score:2)
Gnu's
Not
Urkel.
Re:Oh No! (Score:1)
Where's the data? (Score:2)
What about patents? (Score:3, Insightful)
Re:What about patents? (Score:3, Informative)
In a word, no.
You can't generally patent "found" sequences. You have to create or assemble something novel. The raw sequence of the human genome is not patentable. Inserting novel or transgenic genes into the human genome might be, but that's still science fiction.
Re:What about patents? (Score:5, Insightful)
I wish that were not the case. However, there are many gene patents in existence. The trick is that now you have to show a function for that gene - although bioinformatics is sophisticated (or rather, automated) enough that you can come up with a plausible-sounding function without ever doing benchwork.
What's really being patented is the medical application of these sequences. For instance, Company X discovers that gene Y is overexpressed in cancer Z. They take out a patent on gene Y based on this discovery. That means that no one else can pursue gene Y as a therapeutic target. Moreover, in one case testing for a specific mutation to detect cancer was covered by a patent. This is a very simple piece of labwork being covered, which any competent cancer researcher could have figured out.
The end result is that patents are being awarded for hard work, not for novelty and invention. Throw enough money at a subject, and you'll get data but not necessarily results. Since companies (or academics) can now patent just the data, if someone else gets "lucky" and comes up with an actual result the patent holders can sue the tar out of them if they try to make money off it. (Or even if they don't, as in the case of the breast cancer gene; the company wanted people to pay three times as much for its own testing kit.)
You may soon be able to patent single-nucleotide polymorphisms (SNPs), which may be involved in differential drug responses. Back when I was in college we had a guest lecturer who was a biotech patent attorney, and he said he though SNPs should definitely be patentable. In any case, there is a world of difference between patenting a cancer drug, and patenting a gene (or a FUCKING POINT MUTATION) that may, in the future, be a drug target.
Since most of the human genome is noncoding, I suspect it will be harder to patent pieces of it. I also suspect that some asshole will try anyway.
Re:What about patents? (Score:2)
So, that's 10 SNPs to patent - except most of them were published papers comming out of acedemia, so they can't be patented. Now, if you can create a drug that acts to affect the changed
Re:What about patents? (Score:2)
Ahhh, an excellent demonstration of my point. These cumdumpsters have patented primers used to amplify specific regions of DNA that may be of clinical interest. Here's the trick with primers: you look at the sequence, and pick an optimal primer pair to amplify that sequence via PCR. They need to be specific to a single genomic locus, and have a certain melting temperature. People I work with who do cloning hate primer design.
So, it must be a pretty coo
Re:What about patents? (Score:2)
Celera spent the time to cateogrize and sort the reasings of thousands of human individuals into a comprehensive statistical analysis of the genome, and then sold the results.
They're no more evil than the Encyclopedia Brittanica.
Again? (Score:5, Insightful)
Re:Again? (Score:1)
However, until people are willing to pay for research to be done for the common good things will not change. Given the severe underfunding of the NSF and other agencies it is clear that the public does not care about the current situation.
So if the public is unwilling to fund research and there is no IP protection to encourage the priva
Re:Again? (Score:4, Insightful)
Re:Again? (Score:2)
I think that on these things, companies should be given limited access - perhaps for a few years, so that they can capitalize on their investment. After about 5 years or so, they'd better make it public domain.
Ofcourse, in that case, companies will wait for a good while before making it public that they indeed do have the data.
Re:Again? (Score:2)
Great idea, isn't it? It's called "patents", and they have thought of it a while ago. The problem is mostly with the current implementation.
Re:Again? (Score:2)
Re:Again? (Score:2)
You can Copyright your writings about a fact, or your pictures of that fact, or your rantings about your discoveries of the fact, but not the fact itself.
You could trademark a fact, or even patent a method for discovering a fact.
However, most companies depend on trade secrets and licenses for these situations.
Re:Again? (Score:2)
Re:Again? (Score:2)
Copyrights have the exact same function and purpose as patents - to provide a temporary monopoly to the author/inventor as an incentive. They just cover different domains and function differently in practice.
Go back to Russia, you uneducated potato eater.
If it's all the same to you (even despite my awful patents vs. copyright faux pas), I would rather not - it is a fairly ghastly place.
Re:Again? (Score:2)
Copyrights have the exact same function and purpose as patents - to provide a temporary monopoly to the author/inventor as an incentive. They just cover different domains and function differently in practice.
That is wrong imho.
Copyright protects you from copying
E.g. if I write a pirate story, you still can write pirat stories, you are free to compete.
A patent is issued on a process/method for crafting a certain piece. Again you don't really
Re:Again? (Score:2)
Copyright protects you from copying
Copyright gives you a monopoly on the particular work you wrote, ie no one else may publish it. Might seem obvious nowadays, but before the concept of copyright (and more importantly before international copyrights) anyone could get a hold of your manuscript or (more likely) an already printed copy, set it to type and sell it, leaving you with zilch.
Sure patents are somewhat different in
Should... (Score:1)
Are you sure you don't want to add "make love not war" to your rant?
The data generated would not EXIST had not investors (read people) put millions of dollars into the company to hire the researchers, buy the equipment, and develop and analyize the data. Odd that, at some point, they'd hoped to get their money back.
Some people, unlike most here it seems, understand that INFORMATION is not free, that it costs time and money and
Re:Should... (Score:2)
And there WAS a public project to sequence the human genome which did rather well. If you want the data to be public then the public has to pay for it, or have some altruistic individual pay for it. Can't get something for nothing.
The parent really should keep the following in mind: if the data wasn't private then there would be no Celera data (s
Re:Again? (Score:2)
Why should it? They spent tremendous amounts of effort and money discovering and cataloguing that data. Should the Brittanica be public domain?
You could always sequence the genome yourself; nobody's stopping you.
Re:Again? (Score:2)
Re:Again? (Score:1)
really, at what point did we convince even ourselves that we could put a pricetag on life. not only did we ruin it for ourselves, but we've completley fucked generations to come not only with the laws that we've allowed to be passed, but also putting the notion into our society that doing wrong is acceptable and even merited if the end result is a 'successful' business. what have we really succeeded in? whats even worse is the the consensus of the american public now:
Re:Again? (Score:2)
That's silly. Everybody puts a pricetag on life. It is just wrong to actually admit you do it and publicly debate what the price should be set at.
Take environmental regulations. Your drinking water has some some limit on various heavy metals - say Cd is 5ppb or something like that (just making numbers up). Suppose 1 life per decade could be saved by lowering that to 1ppb - should we do it? Maybe that lower threshold wou
One problem... (Score:1, Redundant)
Who holds the patent for "viewing alpha sequences comprised of the letters G, A, T, and C, superimposed on a dual helix-shaped structure...on the internet"?
Re:One problem... (Score:1)
Curious (Score:4, Interesting)
Re:Curious (Score:2)
Re:Curious (Score:2)
Re:Curious (Score:3, Interesting)
> unprotectable
The data itself was never protected in any way: you've always been free to read your own DNA. The database that Celera owned was protected as a trade secret. You could only look at it after signing a contract in which you agreed not to disclose what you saw.
Re:Curious (Score:2)
And under copyright. Anyone else is free to duplicate a private genome database if they're willing to spend millions of dollars on sequencing. However, you couldn't take someone else's proprietary database and redistribute it. I assume the trade secrets were any specific annotations that Celera had made - for instance, you couldn't subscribe and then start blabbing about their annotations, or re-annotating the public database based on their
Re:Curious (Score:2)
Annotations themselves are facts, and can be reproduced simply by mentioning in a paper that "gene X has been found to be overexpressed in cell line Y." The form in which it appears in the database may be copyrighted, but there is no pre-existing barrier to reporting this in an article. Keeping such information a trade secret under an NDA will prevent it being released into the literature. (Because once it's in the literature, it wi
Re:Curious (Score:2)
Please make the distinction between copyrighting the data and copyrighting one instance of the data. I can copyright a photograph of a magnetic field, if I want to. That doesn't stop you from making one, but it does stop you from copying mine.
The only difference here is the tremendous
Free data - or unable to sell it? (Score:5, Interesting)
I work for a biotech company with a database which we've been trying to sell subscriptions to for a few years. The prevailing experience with trying to sell the database is that people are very reluctant to shell out the cash to access the data.
I think this is a symptom of trying to sell data to academic institutions. The problems with selling to academic institutions are two-fold; Firstly the universities don't have the cold hard cash to spend on the databases, so any cost over free is too expensive. Secondly, there is the free/open culture within universities that almost punishes commercial ventures for trying to build a business around adding some kind of value to the data (such as convenience or quality of data).
Because of the lack of sales for this database, we're considering handing the data over to a large government body so that they can maintain it, because the company can't simply afford to maintain the database - it costs a lot of money to hire talented people to do database curation.
So when Celera say that "data wants to be free", I think they mean "We'd sell you this data to try and recoup our investment, but we're resigned to the fact that you're not going to buy it".
Re:Free data - or unable to sell it? (Score:1)
It's a wonder that A) Celera hasn't started sueing other parties with similar datasets or B) The **AA hasn't validated this line of reasoning and stopped sueing filesharers.
Re:Free data - or unable to sell it? (Score:5, Informative)
I would not have stated it that way. The real reason is that academics hate to leave anything unpublished. If they're constrained by copyright law or some NDA, they can't tell everyone about the fabulous new work they've been doing - or at the very least, it becomes much more difficult.
I worked in bioinformatics at a university for several years, and much of what we did was take existing databases and analyze them, then publish the results online as our own database of annotations. As part of this, we reproduced much of the original database in modified form - and all we had to do was cite the original authors and describe our methods/sources. If the databases we used had not been public, none of these projects would have happened. In some cases, we had to ignore private databases that we had limited access to because we were not allowed to reproduce any of their data.
This is only cultural to the extent that academia thrives on publications. We're not out to punish anyone from trying to make an honest buck (lots of people here collaborate with or consult for companies), but we literally can't afford, professionally, to limit ourselves in accordance with restrictions on databases. So why pay money for something we can't legally use in the manner to which we're accustomed?
Re:Free data - or unable to sell it? (Score:2)
It just seems like a huge waste of money to duplicate the database, and the data is not patented (just copyrighted), so that shouldn't stop it's usage in research.
See my sig. Think about it.
---
Large public or private organisations paying recurring, per-seat licensing for software are being economically stupid.
...but they still own the patents to the genes? (Score:2)
Re:...but they still own the patents to the genes? (Score:2)
Finally... (Score:2, Interesting)
Threat of being sued? (Score:1)
Personally, I think the real reason is the companies can't make a profit by simply having the "standard definition" and its effectively useless to them.
To 99.99999% of the population, these base pair sequences could be random bits, and we wouldn't know a chromosome if it came up and bit us on the ass.
They are holding a single sample of data, when in reality whats needed is the variation patterns based upon this starting point. We could start to see just how di
Some stuff you just can't sell (Score:3)
Celera saw the writing on the wall. Everyone is using the public reference assembly because it's free, and in terms of contents the two are merging toward a complete consensus as they approach total coverage. You can only make money selling this kind of information while vast portions of the genome remain unknown or unavailable, and that's not true anymore.
Plus using a different assembly than other researchers cuts you off. When we import data from dbSNP, for example, we regularly drop references to positions specified in reference to Celera contigs. (Not much of a problem, since they're in the vast minority.) The Celera assembly has not been freely downloadable and redistributable, and we haven't been including a copy of it in our software (we always include a current public assembly build). Now that this has happened, I think the next build of the public assembly is going to be really good.
Be afraid... (Score:1)
imporant thing to ask is (Score:1)
I know what data really wants (Score:1)
The human genome project (Score:4, Interesting)
Excellent PBS video on race between government and Celera to crack the human genome:
http://www.pbs.org/wgbh/nova/genome/program.html [pbs.org]
Mirrors please..
Here we go... (Score:1, Interesting)
*zing* (Score:1)
Just don't tell Senator Santorum... (Score:2)
Oh wait, there's no corporation for him to whore himself out to. Maybe this will actually see daylight.
In case it gets slashdotted.... (Score:5, Funny)
acgcggcgatgcgtacatagctagcgctgcatagatcgactatgacgat
Re:In case it gets slashdotted.... (Score:2)
Re:In case it gets slashdotted.... (Score:2)
Re:In case it gets slashdotted.... (Score:4, Informative)
Query: 103 catcagctactatgtagctacgatc 127
Sbjct: 84163 catcagctactttgtagctacgatc 84187
The quality of match is rated at E=0.65, which means that you would expect to find a match this good by chance 65% of the time. (E value will change slightly if you search different databases.)
Try searching for the sequence yourself here under Nucleotide-nucleotide BLAST (blastn) [nih.gov]
If you want to see the real thing, you can browse one version of the "real" human genome here [nih.gov]. If you click on the blue chromosome 1, and then "Download/View Sequence/Evidence", then "display", you can see the repeating "telomere" sequence at the beginning of chromosome 1.
Re:In case it gets slashdotted.... (Score:2)
Does anyone remember... (Score:2, Interesting)
Re:Does anyone remember... (Score:1)
However the progenitors had the forsight (rightly s
Re:Does anyone remember... (Score:1, Insightful)
Jim Kent did not sequence anything. Big machines run by lots of people around the world bought with your tax dollars did that
Well this is a bit embarrassing (Score:5, Insightful)
Anyway, Celera seems to epitomize the way large projects like this become free: they sink billions upon billions of dollars into a project which is soon supplanted by a better free (though, of course, government funded) alternative, and after years of unsuccessfully trying to sell it, release it for free for a bit of good PR.
But then again, they've made a huge contribution to the field overall; Craig Venter may be an arrogant prick, but he gets shit done, while Francis Collins mostly waxes poetic about the bright future of genomics.
Well, that seems like enough venting about the sad state of research.
Re:Well this is a bit embarrassing (Score:2)
Re:Well this is a bit embarrassing (Score:2)
More important are the sequencing techniques that were developed in that first decade. They are a far more important contribution to the field than the completion of the one genome (which is really just a lot of very tedious work).
Craig's sequence? (Score:1, Insightful)
In all seriousness however, Celera's sequences essentially suck anyway. The public projects have handily beat them and their sequencing methods have been deemed inferior (see last October's issue of Nature). They are not adding any scientific value by releasing their versions of these three genomes.
30 Billion Base Pairs (Score:3, Insightful)
-Bio major/Nerd
Read the opening again. (Score:1)
The information was most likely taken from a press release by Celera. Press releases tend lean to hyperbole so long as it remains technically truthful. Either there were a heck of a lot of mice and rat genomes, which along side the human totaled to 30Gbp, or much of the data is redundant.
Re:30 Billion Base Pairs (Score:2)
Bionerd indeed.
The Common Thread - a book. (Score:1)
The book is very readable, and from my own experiences rings of the truth.
It's already free (Score:5, Informative)
Read that as: (Score:2)
OK, heart rate is lowering now...
Re:'Bout Time (Score:4, Insightful)
Let's see, the one company that pioneered genome research with reliable and extremely efficient shotgun sequencing, is now an evil corporation because it wanted to use its investments in research for developing novel therapeutics. Which in the end benefits human-kind. Please...
Re:'Bout Time (Score:3, Informative)
I worked for a small biotech company that became a part of Celera. They are doing a g
Re:'Bout Time (Score:4, Insightful)
Upper management may or may not be rotten, but you don't really explain what was "evil" about their actions.
Re:'Bout Time (Score:2)
Re:'Bout Time (Score:3, Insightful)
Celera relied on the "free research" of the NIH. They extended that research with their own technique, and then patented the result of the joint data.
Re:'Bout Time (Score:2)
Fixed: Car companies rely on the "free roads" of the federal government. They extend that infrustructure with their own cars, and then profit off the result of the joint use.
How evil of them!
Re:'Bout Time (Score:2)
Re:'Bout Time (Score:2)
Re:'Bout Time (Score:2)
He just possessed it and had a "license" to it.
If anyone should hold the copyright it should be God and his parents.
Re:'Bout Time (Score:2)
Re:'Bout Time (Score:2)
Celera's advantage was/is that the data was of higher quality and their database was curated better and had a higher reliability.
Now the public databases have become good enough that you don't need to use Celera's tools. I still find that the public databases are a bit of a mess but they are good enough to get the job done.
Re:'Bout Time (Score:1)
Because of Celera's choice of approach (whole genome shotgun) they could not even successfully assemble the millions of small stretches of sequence into the chromosomes. They resorted to using the public sequence to assemble their own data, very much like using another person'
Re:'Bout Time (Score:3, Informative)