Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×

Google To Offer Free Database Storage for Scientists 107

An anonymous reader writes "Google has revealed a new project aimed at the scientific community. Called Palimpsest, the site research.google.com will play host to 'terabytes of open-source scientific datasets'. It was originally previewed for scientists last August . 'Building on the company's acquisition of the data visualization technology, Trendalyzer, from the oft-lauded, TED presenting Gapminder team, Google will also be offering algorithms for the examination and probing of the information. The new site will have YouTube-style annotating and commenting features.'"
This discussion has been archived. No new comments can be posted.

Google To Offer Free Database Storage for Scientists

Comments Filter:
  • by spud603 ( 832173 ) on Saturday January 19, 2008 @05:36PM (#22112652)
    So will they be mining the data for contextual ads?
    I'd be curious what their algorithms think my data says I want to buy...
    • Re: (Score:3, Interesting)

      by Seto89 ( 986727 )
      It managed to pick ads accurately even when I view a GPG encrypted [wikipedia.org] emails through the web-interface - it gave links to proprietary PGP, some Fedora related sites and a page about encryption - all that from a standard header and encrypted text...
      • Re: (Score:2, Insightful)

        by Anonymous Coward
        This is more than likely "tweaked" by a savvy google employee. Think of it as the way "750 ml in shots" gives you the right answer. It's clever, but "it" didn't manage to do it; it was just some Google engineer's Friday project which made it to release, because google isn't entirely soulless yet.
    • by rtb61 ( 674572 )
      More likely contextual patents based upon data mining scientific research. Also it gives them valued data to sell to three letter government agency about what particular scientists are working on and more importantly what other people are interested in that research.

      Got to be careful, a passing interest in bacteriological research might land you on the extreme better safe than sorry terror watch list. Where the systematically dismantle your household in search of dangerous substances, for untidy and scruf

  • From TFA, to get masssive amounts of data to Google:

    (Google people) are providing a 3TB drive array (Linux RAID5). The array is provided in "suitcase" and shipped to anyone who wants to send they data to Google.

    Google doing this. And they use Linux "suitcases" for transport.

    Hide the chairs.

  • by User 956 ( 568564 ) on Saturday January 19, 2008 @05:40PM (#22112690) Homepage
    The new site will have YouTube-style annotating and commenting features.

    And hopefully the commentary will be just as insightful and poignant!
  • oblig (Score:5, Funny)

    by qw0ntum ( 831414 ) on Saturday January 19, 2008 @05:44PM (#22112728) Journal
    So we're going to have YouTube-like commenting?

    Is this [xkcd.com] the future of scientific discourse?
  • by Anonymous Coward
    This should come in handy for my research on normal variants of the female mammary glands.
  • by Hognoxious ( 631665 ) on Saturday January 19, 2008 @05:49PM (#22112778) Homepage Journal
    Why would you want to store a scientist in a database?
    • by jd ( 1658 ) <imipakNO@SPAMyahoo.com> on Saturday January 19, 2008 @07:13PM (#22113378) Homepage Journal
      Because you can then replicate the really good ones. I would have thought that obvious.
    • Re: (Score:3, Funny)

      Might be a way to get them to join a union.
      • Re: (Score:2, Flamebait)

        by CastrTroy ( 595695 )
        That'll never happen. Scientists know that unions are for people who hate their job, and don't actually want to do any work. Scientists at least most that I've met, love their jobs, and love to actually work while at their job.
        • Re: (Score:2, Interesting)

          by jd ( 1658 )
          The whooshing sound you heard was the set logic joke flying overhead.

          Even so, though, unions only have a bad rep in America. Interestingly, America is also the country with the greatest number of stress-related illnesses in the western world (more than twice as many heart attacks from stress as in England), and that is tied to their self-destructive yet amazingly narcistic "work ethic" which simultaneously creates unbearable stresses on the human frame whilst producing only minimal extra productivity. Tra

          • Uh, wow, as a reply to "scientists don't like Unions" you state "The Unites States won't last as long as the Roman Empire" and continue on with a lengthy, somewhat nonsensical anti-US rant. It really had very little to do with what the poster stated, or with Google offering free database storage - obviously you're looking for the slightest provocation to rave on against the US, whether or not is has anything to do with the subject at hand.

            I realize Slashdot attracts anti-social nerds who often have weird

          • unions only have a bad rep in America.
            Not really true, some of us in olde England can remember the "Winter of Discontent" and we'd rather not have another one. And in France they're only popular among their own members (largely in the public sector); the rest of the populace know that they'll have to pick up the tab, one way or another.
        • Re: (Score:3, Informative)

          by ryanov ( 193048 )
          I like my job, I'm a sysadmin, I'm on call right now, and I'm a committee chair for my union. Guess you don't know everything.
      • by jd ( 1658 )
        It also means you can subtract differences. Creationist scientists won't like it, as it's possible to have alternative views.
    • by jma05 ( 897351 ) on Saturday January 19, 2008 @08:04PM (#22113670)
      > Why would you want to store a scientist in a database?

      So that these geeks can have normal relationships.
      • Re: (Score:2, Funny)

        > Why would you want to store a scientist in a database?

        So that these geeks can have normal relationships.
        But they probably won't perform as well as before normalization. After all. there will be performance hit in joining tables.

    • Maybe the scientist studies cryonics...
    • No idea where you got that idea. As they wrote, the database is free for scientists.
  • ... so that explains why the RDBMS dudes were bitching about mapreduce t'other day:

    http://it.slashdot.org/article.pl?sid=08/01/18/1813248 [slashdot.org]
  • by cheesethegreat ( 132893 ) on Saturday January 19, 2008 @05:56PM (#22112840)
    If this actually happens, and researchers are willing to make their data-sets open source, it would be a huge boon for budding researchers. It would allow students to do more than just work with a sample dataset out of a textbook. Graduate students learning how to do advanced modeling would be able to work with real datasets, vastly improving their skillset and employability. Just consider these two lines on a CV, and ask yourself which one jumps out at you.

    "Designed a model for the dataset on the CD-ROM included with the Modeling Organic Systems textbook"

    "Designed a model for the WISK-III heart output dataset published in 2006."

    New entrants to a field would have instant access to enormous amounts of data very quickly and easily. Although the big kudos comes when you can do totally original work (new data, new analyss), a researcher who could come up with a new critique of older papers and studies would definitely get themselves noticed.

    Overall, this is a really positive step for everyone on the lower rungs of the scientific ladder, and especially positive for those with limited resources.
    • by ushering05401 ( 1086795 ) on Saturday January 19, 2008 @06:04PM (#22112904) Journal
      I feel your optimism, and support this idea, but the cynical side of me must speak out.

      Isn't this information more likely to be capitalized upon by those who already dominate the commercialization of research?

      Yes, noobs would have enormous amounts of raw material at their disposal, but wouldn't they find applications derived from this data already covered by patents that were distilled from the data sets through analysis performed by labs full of trained corporate monkeys before they can get their own foot in the door of innovation?

      I would love to awaken one day and find that I am just being a jaded fool, but I believe developments like this will help the commercialized overlords more than anyone else as they are the ones with sufficient resources to throw at privatizing the results of scientific research.
      • by xenocide2 ( 231786 ) on Saturday January 19, 2008 @06:21PM (#22113032) Homepage

        Isn't this information more likely to be capitalized upon by those who already dominate the commercialization of research?
        Can't it be both? It's not like by subscribing you're depriving others. And the data uploaded will be made freely available.

        You cannot patent mere data, or interpretations of data. Patents are for machines, processes, and the like. Of course, the publication of data doesn't preclude people from patenting a chemical process that results in a specific gene, but this is already happening elsewhere.

        In fact, I suspect the entire point of this is for Google to take over maintenance of the Genomic Databases and create new such databases. Many times the academic databases are.. poorly maintained, and certainly not compatible, despite the very similar contents. There's already efforts to make them more compatible, but Google appears to be able to offer some very neat stuff on top of it all. The silliness about shipping RAID arrays mostly seems to be for unis not already hooked up to I2.

    • by cortex ( 168860 ) <neuraleng@gmail.com> on Saturday January 19, 2008 @06:22PM (#22113042)
      As a neural engineering researcher who routinely generates terabyte size datasets, I have to say that I both like this idea and think it is unlikely to succeed. I would love to have a place to store large datasets and access them from wherever I am at. However, since these datasets will be open sourced, I will be extremely unlikely to put any dataset on google until I am certain I have extracted all of the publishable findings from it. I think that most researchers after putting in years of effort and a lot money into acquiring a dataset will also think twice about open sourcing their data. If the TOS where to include some means for controlling publications which resulted from analysis of the data, then it might be more likely to succeed.
      • by JanneM ( 7445 ) on Saturday January 19, 2008 @07:19PM (#22113414) Homepage
        If the TOS where to include some means for controlling publications which resulted from analysis of the data, then it might be more likely to succeed.

        But in that case, would you want to go anywhere close to someone else's data, for the risk of "contaminating" your research and perhaps end up in a protracted brawl over discovery rights?

        I mostly agree with everybody else: it's a neat idea but for a lot of people it's not going to fly.

        The one area I think it could be good is for datasets that are already open and that are meant to be shared. In vision research, for instance, or in various fields in machine learning there's quite a lot of sort-of-standard test data sets created by various groups that can make it easier to compare models directly. Having all of those collected in one place would certainly make it easier to find and actually use them rather than reinventing the wheel once again.

      • I will be extremely unlikely to put any dataset on google until I am certain I have extracted all of the publishable findings from it

        That's so twentieth century. The scarce resource these days is not data, it is mindshare in the science community. In the 1990s, many of the SOHO [nasa.gov] instruments experimented with opening up their data sets to all comers immediately, and those instrument teams have generated about an order of magnitude more publications than their less-forward-thinking cohorts.

        You should be so

        • Re: (Score:3, Interesting)

          by cortex ( 168860 )
          20th century or not, the fact is that if I don't publish papers with my name as first or last author I don't get tenure. I'd be happy to have people publish papers using my data as long as I have already gotten a few first author papers out of it. Of couse, that would only apply to my data that is several years old. Also, what is to stop someome from publishing using my data and not having me as an author at all? The TOS to access the data are going to be very important.
          • ...what is to stop someone from publishing using my data and not having me as an author at all?

            Nothing! On the other hand, it would be a pretty foolish person who tried to do that -- if you made the data you're likely the only one who truly understands it. Other threads in this discussion talk about that problem in the context of elementary particles. For solar observations it is similar -- there are plenty of "gotchas" in every data set, and you'd better be working with the instrument team if you want t

            • and you'd better be working with the instrument team if you want to make a fool of yourself.


              Hmmm, I seem to have omitted an embarrassing "don't", as in "if you don't want to make a fool of yourself.".

            • Re: (Score:2, Informative)

              by cortex ( 168860 )

              Nothing! On the other hand, it would be a pretty foolish person who tried to do that -- if you made the data you're likely the only one who truly understands it. Other threads in this discussion talk about that problem in the context of elementary particles. For solar observations it is similar -- there are plenty of "gotchas" in every data set, and you'd better be working with the instrument team if you want to make a fool of yourself.

              This is exactly why this system is likely to fail. No scientist is go

          • 20th century or not, the fact is that if I don't publish papers with my name as first or last author I don't get tenure.

            I'm intrigued. What is significant about last author in your field? For us, contributors are listed in alphabetical order when their contributions are equal and in order of their contributions when they are not. The last author is always the guy who did the least work (maybe just proof-read, but for various political reasons still gets his name on the paper) or the guy[1] with the surname that comes last alphabetically.

            • by maubp ( 303462 )
              In Biology and Chemistry at least, the supervisor or project leader is often named last, while the student/researcher who did the bulk to the grunt work is named first. Of course, it wasn't always this way.

              What field are you talking about.
            • by cortex ( 168860 )
              In Neuroscience, Neural Engineering, Biomedical Engineering... The first author is the person who did the most work and wrote the paper. The Last author is typically the Principle Investigator (PI), i.e. the lab/project supervisor who wrote the grant that funded the project. While the PI is usually the person who does the least of the day-to-day work, they are often the person who makes the most intellectual contribution in terms of experimental design and problem solving. Other authors are typically lis
          • by mark-t ( 151149 )

            Also, what is to stop someome from publishing using my data and not having me as an author at all?

            Where I come from, that's called plagiarism, and is not only a serious academic offense at every school I've ever heard of, but is also an infringement of copyright law.

            Of course, if you can't be bothered trying to protect your copyrights because you're too busy doing other things, than you just have to have enough faith that the segment of population that would be interested in your data isn't particularly

    • by Gromius ( 677157 ) on Saturday January 19, 2008 @06:59PM (#22113286)
      As a researcher myself (particle physics), I echo others comments in this thread that a) its a nice idea but b) isnt going to happen. There are three main problems, the first two are solvable, the third isnt

      1) trivially, 3TB is no where near enough to store my data

      Bit of a non issue for the overall concept but if google wants my data, they really are going to have to up the storage by a few orders of magnitude.

      2) as others stated, we work really really hard to acquire our data, research is about 10% inspiration, 90% perspiration. We are not giving up our data till we have milked it for all its worth.

      This again is solvable, we release our data after we have all the publishable results we can think of and them let others have a crack. Somebody might find something useful and if not, well its great for younger scientists as you say. At the very least, people can reconfirm results at a later date easier. Main reason I like it.

      3) The deal killer, for my field and I suspect others, it is really really difficult to understand our data and its really easy to misinterpret it.

      New particles have been "discovered" so many times by grad students (and some professors who should know better) in particle physics data that I'm terrified of what somebody with no training outside the system might conclude from the data. At CDF (a fermilab expt) it took us (800 physicists) about 2-3 years to understand the data from the experiment enough to get proper physics results out of it. Even now, it takes a new comer about a year to get upto speed and thats with help from all the experts. But its very easy to think you understand things after a few weeks when infact your missing some incredibly subtle point and so I'm sure we would be flooded by bogus results due to misinterpretations from the data if we release it.

      Anyway this all comes from a particle physics view point but I suspect quite a few other fields will be similar.
      • Re: (Score:3, Interesting)

        by Unoti ( 731964 )

        But its very easy to think you understand things after a few weeks when infact your missing some incredibly subtle point and so I'm sure we would be flooded by bogus results due to misinterpretations from the data if we release it.

        You sound very intelligent and I'm sure you're correct. But I couldn't help but think how much that sounds like the reasons why the Catholic Church conducted mass in Latin for so long, and why they were initially reluctant to have the Bible translated to English.

        • Terabytes of scientific data don't purport to hold the answers to life, the universe, and everything. A limit on CP violation probably doesn't have anything to do with getting into heaven.
        • But I couldn't help but think how much that sounds like the reasons why the Catholic Church conducted mass in Latin for so long, and why they were initially reluctant to have the Bible translated to English.

          Yeah, and look how that turned out! We end up with complete Christian loonies [wikipedia.org] instead of reasonable Catholics. (sarcasm intended)
        • You sound very intelligent and I'm sure you're correct. But I couldn't help but think how much that sounds like the reasons why the Catholic Church conducted mass in Latin for so long, and why they were initially reluctant to have the Bible translated to English.

          Nonsense, scientific experiments are supposed to be carried out in a reproducible way, meaning that if the guy who wrote a paper won't give you your data you should be able to just go do the experiment yourself. If the GP was arguing scientists shou

          • Nonsense, scientific experiments are supposed to be carried out in a reproducible way, meaning that if the guy who wrote a paper won't give you your data you should be able to just go do the experiment yourself.

            Except the grandparent was talking about particle physics. For any given experiment, there are likely to be at most two sites in the world where it can be reproduced and you need to book time years in advance to use them and often justify why your experiment is worth performing. If the reason for performing it is 'I don't trust this guy's results' you may well be denied. This means, unless you have a few billion dollars sitting around to build your own particle accelerator, you can't reproduce the exper

      • Re: (Score:2, Insightful)

        by dogmod ( 702959 )
        Seems to me that each of these deal killers is a red herring. 1. My data set is too large = I have no idea what's essential. 2. I worked too hard to get this data, I'm not going to give it away = I'm a mediocre scientist competing against a lot of other mediocre scientists - this data might be my one chance to win the lottery. Oh, and just for the record, I don't really give a shit about the progress of my field - fuck 'em, fuck 'em all, me, me, me. 3. Newbies will misunderstand my data and pervert it = Wha
        • Re: (Score:2, Insightful)

          by tenco ( 773732 )
          1. My data set is too large = I have no idea what's essential.

          Yes, that's exactly the point. I am a physics student and the first thing that was told to us before we began our first lab course was: "Don't throw away any data! Even if you think it's unimportant, equipment failure, ...". New discoveries have been postponed for years because someone simply threw away data which seemed to be unimportant at this time. There's simply no way of telling if some data set is essential or not. If you're thinking this

    • Keep Dreaming. Its hard enough to get the average researcher to make sure he or she includes accession numbers for mandatory deposition of data related to publication. Getting them to a contribute to a big community database is sheer fantasy. Plenty of opportunities for this already exist. Centralizing it won't help matters much. Scientists are just like anyone else. They need to make a buck and they don't give away products (data) for free and they certainly don't go out of their way to make it accessible.
    • I have been in a couple of large scientific projects, and the main problem with making the data public has been to ensure that the researchers who collect the data are getting "author credit" in scientific publications.

      The scientists who collect the data are often other people than those who analyze the data, and fit them to the models. As long as everybody is working on the same project, it is possible to ensure that the people who collect the data will be listed as authors in the papers, even if they are
    • by pjp6259 ( 142654 )
      Some data is already available for students. The organization I work for (National Center for Atmospheric Research), has almost 8 TB of data freely available to anyone that isn't trying to make a profit off of it:
      http://cdp.ucar.edu/home/home.htm [ucar.edu]

      I imagine other public domain data is already available if you just know where to look. Google might help by providing a consistent interface, and more well known portal, but we've put a lot of effort into organizing and making available this data in its present f
  • and watch in ecstasy as one of Google's suitcase drives SLURPS up the FBI's *real* datasets on 9/11, Elvis... oh, and that schematic for a site-to-site transporter beam that I knocked up a while back, which they somehow stole off my google docs.
  • Researchers I know would fill up a yottabyte if they were allowed to. I hope Google has plans for keeping growth of the datasets under control.
    • by jd ( 1658 )
      There are those who would argue that this is an open invitation by Google for scientists to try and DDoS their systems, and those who would argue it might not be a bad thing if they succeed. Personally, I disagree with the last part, but DO think that this could lead to Google developing vastly superior search technology. They can search gigantic data sets, sure, but the percentage of false hits is way too high. When you move into scientific data and multi-dimensional non-simply-connected non-linear search
    • Comment removed based on user account deletion
  • by turgid ( 580780 ) on Saturday January 19, 2008 @06:10PM (#22112940) Journal

    This is a Bad Idea. Too much of the world now depends on Google. And people are running to Google, willing to give their data and identity.

    /me shakes walking stick and creeps back into cave.

  • by teknopurge ( 199509 ) on Saturday January 19, 2008 @06:11PM (#22112946) Homepage
    Does google get ownership of anything that is uploaded? I wonder how foolish scientists will be as to unknowingly forfeit their copyrights, IP, etc.
    • by hostguy2004 ( 818334 ) on Saturday January 19, 2008 @06:56PM (#22113272)
      Google are offering this service to store PUBLIC DOMAIN data. If people don't want to release the data as public domain, then this aint the service for them. See http://en.wikipedia.org/wiki/Public_Domain [wikipedia.org]
      • Dear Scientists; please store your sensitive nuclear data with Google. We promise not to give it to the Chinese. Our company motto: Do No Evil You Can Get Caught Profiting From.
      • by mark-t ( 151149 )
        Where do you see that it says public domain? I see nothing in the article or on google's research page that suggests they must surrender their copyright(s).
    • Does google get ownership of anything that is uploaded? I wonder how foolish scientists will be as to unknowingly forfeit their copyrights, IP, etc.

      Doesn't matter how foolish the scientists are, as the contracts will have to be vetted by the various University legal departments. I'm quite confident that the lawyers will be very careful about their legal rights.

    • by Elendil ( 11919 )
      > I wonder how foolish scientists will be as to unknowingly forfeit their copyrights, IP, etc.

      I assume you're not aware that they already do just that when they publish an article in most scientific journals? The publisher owns the copyright to the article, not the authors.
  • by dysfunct ( 940221 ) * on Saturday January 19, 2008 @07:22PM (#22113420)

    [...] YouTube-style annotating and commenting features.

    I'm looking forward to "OMG, ur resrch is teh sux" comments and "CHEEP FUNDING M0RTG4GE" spam from elite universities around the world.

  • Google Everything (Score:3, Interesting)

    by Dirtside ( 91468 ) on Saturday January 19, 2008 @07:31PM (#22113462) Journal
    The other day my wife said she wants there to be Google Bank. They'd certainly get the online banking thing done right...
    • Re: (Score:2, Interesting)

      by ScrewMaster ( 602015 )
      The other day my wife said she wants there to be Google Bank. They'd certainly get the online banking thing done right...

      Not necessarily ... nobody in their right mind would trust the Google File System to anything remotely mission critical (not even Google: last I heard they use Oracle for all their in-house data processing needs.) Banks actually do pretty well keeping track of financial data.

      Now having said that, as I look at my credit card's online statement, I see several days of Avis car rental c
    • And give Google access to all my financial data? Over my dead body.
  • The Storage@Home thing that was mentioned, albeit possibly in the comments, a while back. I'm not sure, at all, whether or not the Folding@Home data is meant to be public domain but, were it so, then it'd be a preferable solution in part to using a p2p style storage alternative.

    Of course the three terabyte limit might cause problems there.

  • www.eBay.com: Buy new and used Plutonium [ebay.com] for your research, now at eBay!
  • Palimpsest [wikipedia.org]? Are they planning to routinely overwrite your data?
    • Are they planning to routinely overwrite your data?

      My first reaction as well. Yes, the name does seem oddly inapposite. I guess not that many scientitsts are also classical/medieval scholars.

  • I'd say the most useful part would be to find correlative information from disparate fields. The nice thing about a single repository with a single interface is that you can find ALL the data you may need to investigate an interesting hypothesis. Like my current senior thesis on Economic activity and it's correlation with water usage. It's attempting to bring two spatial data sets into a single framework. All the information is out there, but it's rare to find any published papers about it, let alone an
  • Why, I made three terabytes in just 15 hours of solar observing last summer.

    The Solar Dynamics Observatory, due to launch into geosynchronous orbit next summer, is a three petabyte mission.
  • This sounds like Google is creating a ManyEyes site for the scientist set. http://services.alphaworks.ibm.com/manyeyes/app [ibm.com] it's a lot of fun, but I don't see the Google version making neat things like word trees of the Grimm Fairy Tales like I did here: http://services.alphaworks.ibm.com/manyeyes/view/SmAgULsOtha65G-s4kxXL2- [ibm.com]
  • I couldn't find anything about it from an authoritative source, like Google. Anyone has a better link?
  • do you think social sciences could benefit from this as well? -that is, if they can get over they fears of opening their data to others- And if yes, how?
  • Maybe for the first time we'll have gigabytes of rainbow tables for free...

Do you suffer painful hallucination? -- Don Juan, cited by Carlos Casteneda

Working...