Google To Offer Free Database Storage for Scientists 107
An anonymous reader writes "Google has revealed a new project aimed at the scientific community. Called Palimpsest, the site research.google.com will play host to 'terabytes of open-source scientific datasets'. It was originally previewed for scientists last August . 'Building on the company's acquisition of the data visualization technology, Trendalyzer, from the oft-lauded, TED presenting Gapminder team, Google will also be offering algorithms for the examination and probing of the information. The new site will have YouTube-style annotating and commenting features.'"
mining for ads (Score:5, Funny)
I'd be curious what their algorithms think my data says I want to buy...
Re: (Score:3, Interesting)
Re: (Score:2, Insightful)
Re:mining for ads (Score:5, Informative)
Re: (Score:2)
This is for big mathematical stuff.
Re: (Score:2)
Got to be careful, a passing interest in bacteriological research might land you on the extreme better safe than sorry terror watch list. Where the systematically dismantle your household in search of dangerous substances, for untidy and scruf
And in Redmond.... (Score:1)
Google doing this. And they use Linux "suitcases" for transport.
Hide the chairs.
Re:And in Redmond.... (Score:4, Funny)
Re:And in Redmond.... (Score:4, Informative)
Or 4 x 1TB hard drives ($180 ea) gives you $720, so throw in $10 to boot the os off a usb key.
Cheap linux box? Well, you don't need to supply a monitor, keyboard, mouse, speakers, or even much ram - you do the math.
OMG WTF THIS SUX (Score:5, Funny)
And hopefully the commentary will be just as insightful and poignant!
oblig (Score:5, Funny)
Is this [xkcd.com] the future of scientific discourse?
Re: (Score:1)
Scientific Research (Score:2, Funny)
Are they insane? (Score:5, Funny)
Re:Are they insane? (Score:4, Funny)
Re: (Score:3, Funny)
Re: (Score:2, Flamebait)
Re: (Score:2, Interesting)
Even so, though, unions only have a bad rep in America. Interestingly, America is also the country with the greatest number of stress-related illnesses in the western world (more than twice as many heart attacks from stress as in England), and that is tied to their self-destructive yet amazingly narcistic "work ethic" which simultaneously creates unbearable stresses on the human frame whilst producing only minimal extra productivity. Tra
Re: (Score:2)
I realize Slashdot attracts anti-social nerds who often have weird
Re: (Score:2)
Re: (Score:3, Informative)
Re: (Score:2)
Re:Are they insane? (Score:5, Funny)
So that these geeks can have normal relationships.
Re: (Score:2, Funny)
So that these geeks can have normal relationships.
Re: (Score:1)
No scientists in the database (Score:1)
Re: (Score:2)
Ahhhh.... (Score:2)
http://it.slashdot.org/article.pl?sid=08/01/18/1813248 [slashdot.org]
Fantastic for Students and New Researchers (Score:5, Informative)
"Designed a model for the dataset on the CD-ROM included with the Modeling Organic Systems textbook"
"Designed a model for the WISK-III heart output dataset published in 2006."
New entrants to a field would have instant access to enormous amounts of data very quickly and easily. Although the big kudos comes when you can do totally original work (new data, new analyss), a researcher who could come up with a new critique of older papers and studies would definitely get themselves noticed.
Overall, this is a really positive step for everyone on the lower rungs of the scientific ladder, and especially positive for those with limited resources.
Re:Fantastic for Students and New Researchers (Score:5, Insightful)
Isn't this information more likely to be capitalized upon by those who already dominate the commercialization of research?
Yes, noobs would have enormous amounts of raw material at their disposal, but wouldn't they find applications derived from this data already covered by patents that were distilled from the data sets through analysis performed by labs full of trained corporate monkeys before they can get their own foot in the door of innovation?
I would love to awaken one day and find that I am just being a jaded fool, but I believe developments like this will help the commercialized overlords more than anyone else as they are the ones with sufficient resources to throw at privatizing the results of scientific research.
Re:Fantastic for Students and New Researchers (Score:4, Insightful)
You cannot patent mere data, or interpretations of data. Patents are for machines, processes, and the like. Of course, the publication of data doesn't preclude people from patenting a chemical process that results in a specific gene, but this is already happening elsewhere.
In fact, I suspect the entire point of this is for Google to take over maintenance of the Genomic Databases and create new such databases. Many times the academic databases are.. poorly maintained, and certainly not compatible, despite the very similar contents. There's already efforts to make them more compatible, but Google appears to be able to offer some very neat stuff on top of it all. The silliness about shipping RAID arrays mostly seems to be for unis not already hooked up to I2.
Re:Fantastic for Students and New Researchers (Score:5, Insightful)
Re:Fantastic for Students and New Researchers (Score:4, Insightful)
But in that case, would you want to go anywhere close to someone else's data, for the risk of "contaminating" your research and perhaps end up in a protracted brawl over discovery rights?
I mostly agree with everybody else: it's a neat idea but for a lot of people it's not going to fly.
The one area I think it could be good is for datasets that are already open and that are meant to be shared. In vision research, for instance, or in various fields in machine learning there's quite a lot of sort-of-standard test data sets created by various groups that can make it easier to compare models directly. Having all of those collected in one place would certainly make it easier to find and actually use them rather than reinventing the wheel once again.
Re:Fantastic for Students and New Researchers (Score:4, Insightful)
Re: (Score:3, Informative)
My favorite: near-real-time medium-resolution satellite images from NASA: http://rapidfire.sci.gsfc.nasa.gov/ [nasa.gov]
Re: (Score:2)
That's so twentieth century. The scarce resource these days is not data, it is mindshare in the science community. In the 1990s, many of the SOHO [nasa.gov] instruments experimented with opening up their data sets to all comers immediately, and those instrument teams have generated about an order of magnitude more publications than their less-forward-thinking cohorts.
You should be so
Re: (Score:3, Interesting)
Re: (Score:2)
Nothing! On the other hand, it would be a pretty foolish person who tried to do that -- if you made the data you're likely the only one who truly understands it. Other threads in this discussion talk about that problem in the context of elementary particles. For solar observations it is similar -- there are plenty of "gotchas" in every data set, and you'd better be working with the instrument team if you want t
Re: (Score:2)
Hmmm, I seem to have omitted an embarrassing "don't", as in "if you don't want to make a fool of yourself.".
Re: (Score:2, Informative)
This is exactly why this system is likely to fail. No scientist is go
Re: (Score:2)
20th century or not, the fact is that if I don't publish papers with my name as first or last author I don't get tenure.
I'm intrigued. What is significant about last author in your field? For us, contributors are listed in alphabetical order when their contributions are equal and in order of their contributions when they are not. The last author is always the guy who did the least work (maybe just proof-read, but for various political reasons still gets his name on the paper) or the guy[1] with the surname that comes last alphabetically.
Re: (Score:2)
What field are you talking about.
Re: (Score:1)
Re: (Score:2)
Where I come from, that's called plagiarism, and is not only a serious academic offense at every school I've ever heard of, but is also an infringement of copyright law.
Of course, if you can't be bothered trying to protect your copyrights because you're too busy doing other things, than you just have to have enough faith that the segment of population that would be interested in your data isn't particularly
Re:Fantastic for Students and New Researchers (Score:5, Insightful)
1) trivially, 3TB is no where near enough to store my data
Bit of a non issue for the overall concept but if google wants my data, they really are going to have to up the storage by a few orders of magnitude.
2) as others stated, we work really really hard to acquire our data, research is about 10% inspiration, 90% perspiration. We are not giving up our data till we have milked it for all its worth.
This again is solvable, we release our data after we have all the publishable results we can think of and them let others have a crack. Somebody might find something useful and if not, well its great for younger scientists as you say. At the very least, people can reconfirm results at a later date easier. Main reason I like it.
3) The deal killer, for my field and I suspect others, it is really really difficult to understand our data and its really easy to misinterpret it.
New particles have been "discovered" so many times by grad students (and some professors who should know better) in particle physics data that I'm terrified of what somebody with no training outside the system might conclude from the data. At CDF (a fermilab expt) it took us (800 physicists) about 2-3 years to understand the data from the experiment enough to get proper physics results out of it. Even now, it takes a new comer about a year to get upto speed and thats with help from all the experts. But its very easy to think you understand things after a few weeks when infact your missing some incredibly subtle point and so I'm sure we would be flooded by bogus results due to misinterpretations from the data if we release it.
Anyway this all comes from a particle physics view point but I suspect quite a few other fields will be similar.
Re:Fantastic for Students and New Researchers (Score:5, Informative)
As for the few geniuses who can handle the data better than any of us, yes its a noble idea and it sounds nice in practice. However these geniuses are still going to have to slog through the data and its still going to be hard, even for them to do it by them selfs. Its not something some wiz kid will pick up and by the afternoon have a nobel prize. However if they are really interested, they can stop by their local particle physics lab and talk to the people there. Its not as if we dont ever give out our data, lots of students (undergraduates and 6th formers (high schoolers for yanks) over the years have been given a copy and helped to understand it. If you want it badly enough you'll probably get some sort of access to old data. Sure some may fall through the cracks but thats unavoidable.
Also incidentally the most bogus results I'm afraid of are not from the general public but from our theoretical colleagues who are actually the people we are most concerned about hiding the data from
Re: (Score:3, Interesting)
You sound very intelligent and I'm sure you're correct. But I couldn't help but think how much that sounds like the reasons why the Catholic Church conducted mass in Latin for so long, and why they were initially reluctant to have the Bible translated to English.
Re: (Score:2)
Re: (Score:2)
Yeah, and look how that turned out! We end up with complete Christian loonies [wikipedia.org] instead of reasonable Catholics. (sarcasm intended)
Re: (Score:2)
Nonsense, scientific experiments are supposed to be carried out in a reproducible way, meaning that if the guy who wrote a paper won't give you your data you should be able to just go do the experiment yourself. If the GP was arguing scientists shou
Re: (Score:2)
Nonsense, scientific experiments are supposed to be carried out in a reproducible way, meaning that if the guy who wrote a paper won't give you your data you should be able to just go do the experiment yourself.
Except the grandparent was talking about particle physics. For any given experiment, there are likely to be at most two sites in the world where it can be reproduced and you need to book time years in advance to use them and often justify why your experiment is worth performing. If the reason for performing it is 'I don't trust this guy's results' you may well be denied. This means, unless you have a few billion dollars sitting around to build your own particle accelerator, you can't reproduce the exper
Re: (Score:2, Insightful)
Re: (Score:2, Insightful)
Yes, that's exactly the point. I am a physics student and the first thing that was told to us before we began our first lab course was: "Don't throw away any data! Even if you think it's unimportant, equipment failure, ...". New discoveries have been postponed for years because someone simply threw away data which seemed to be unimportant at this time. There's simply no way of telling if some data set is essential or not. If you're thinking this
Re: (Score:1)
Main problem, publication rights (Score:1)
The scientists who collect the data are often other people than those who analyze the data, and fit them to the models. As long as everybody is working on the same project, it is possible to ensure that the people who collect the data will be listed as authors in the papers, even if they are
Re: (Score:1)
http://cdp.ucar.edu/home/home.htm [ucar.edu]
I imagine other public domain data is already available if you just know where to look. Google might help by providing a consistent interface, and more well known portal, but we've put a lot of effort into organizing and making available this data in its present f
just wait for the *clerical error* over at the DOJ (Score:1)
Is it limited? (Score:2)
Re: (Score:2)
Re: (Score:1)
It'll All End In Tears (Score:5, Insightful)
This is a Bad Idea. Too much of the world now depends on Google. And people are running to Google, willing to give their data and identity.
/me shakes walking stick and creeps back into cave.
Re:It'll All End In Tears (Score:4, Funny)
Re: (Score:1)
Do you have any datasets to back up this claim?
I'm old and grumpy now. I don't need no stinking data. Get off my lawn!
Horrible Idea - What are the TOS? (Score:5, Insightful)
Re:Horrible Idea - What are the TOS? (Score:5, Informative)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Doesn't matter how foolish the scientists are, as the contracts will have to be vetted by the various University legal departments. I'm quite confident that the lawyers will be very careful about their legal rights.
Re: (Score:1)
I assume you're not aware that they already do just that when they publish an article in most scientific journals? The publisher owns the copyright to the article, not the authors.
Comment System? (Score:3, Funny)
I'm looking forward to "OMG, ur resrch is teh sux" comments and "CHEEP FUNDING M0RTG4GE" spam from elite universities around the world.
Google Everything (Score:3, Interesting)
Re: (Score:2, Interesting)
Not necessarily
Now having said that, as I look at my credit card's online statement, I see several days of Avis car rental c
Re: (Score:2)
An alternative to (Score:2)
The Storage@Home thing that was mentioned, albeit possibly in the comments, a while back. I'm not sure, at all, whether or not the Folding@Home data is meant to be public domain but, were it so, then it'd be a preferable solution in part to using a p2p style storage alternative.
Of course the three terabyte limit might cause problems there.
Adsense revenue! (Score:1)
Why do they call it (Score:1)
Re: (Score:1)
Are they planning to routinely overwrite your data?
My first reaction as well. Yes, the name does seem oddly inapposite. I guess not that many scientitsts are also classical/medieval scholars.
Oh Really? (Score:1)
Big deal ... terabytes are tiny these days. (Score:2)
The Solar Dynamics Observatory, due to launch into geosynchronous orbit next summer, is a three petabyte mission.
ManyEyes (Score:1)
Authoritative link? (Score:1)
how about (Score:2)
Rainbow Tables? (Score:1)