Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Space

Creating The UniServer 81

bmongar writes " DrDobbs has an article about a project for a mirrored universal astronomy database. Jim Gray basically wants a netowrk of observatories around the world to publish their data and mirror other observatories' data. Basically creating a quadruple redundant system of data all avaliable online. He wants to create a new type of astronomer, the astronomer that is a data miner." As the article also says, the guy behind this is the guy behind the TerraServer as well.
This discussion has been archived. No new comments can be posted.

Creating The UniServer

Comments Filter:
  • Now, all we have to do is add a piece of Fairy Cake, a marker that says "You are Here", and ship the whole thing to Frogstar, and we're set....
  • this [slashdot.org] they wouldn't need so much storage...
  • However, it leaves out a *TON*. Like, what technology are they going to use to DO data mining? What database will run this monster? Which OS will it run on?

    Well, as the guy is part of the Microsoft Research Team, I'd guess Whistler Version 7 running Access 2050 :)

  • Well there's an insightful AC.

    It would help me if you pointed out what you thought was naive and why.

    There is a whole gaggle of scientific work that at first seemed totally worthless commercially but eventually had commercial uses within 100 years or even much, much faster.
  • So what will we call the server and client software? NapSTAR?

    Seriously though -- if there really are lot of amateur astronomers out there snapping digital pictures of comets, would there be any benefit to creating an automatically indexed peer to peer server scheme?

  • For those who don't know, he's been involved in writing commercial DBMS systems and publishing research papers for decades; very important in the field

    His book with Andreas Reuter, Transaction Processing : Concepts and Techniques [amazon.com] is terrific.

  • Sorry for all the inconvenience.
  • It makes me wonder. Researchers usually have their own datasets, and they spend gobs of time working on them with their specialized prorams. It seems to me that the really valuable stuff out there is in these closed datasets, not in the everyday stuff that's available on the internet.

    As an example, you can get a gazillion CD-ROM's with the Magellan data from Venus. But what good is that raw data? Not much. You'd probably want to get a look at the data on the Venus-geologist's computer instead, because it's been analyzed and selected and generally picked over to produce something meaningful.
  • And finally, we are discovering universes
    that are farther away and therefore younger than any we have previously discovered

    I REALLY hope you meant to say galaxies instead of universes. I am a physics major with a healthy lean towards computational cosmology, and I would really really hope that if we had discovered entire new universes, I would know about it. SDSS, however, has discovered quite a few new galaxies, so I will assume that is what you were referring to

  • If you start out with mounds of raw data you aren't a scientist.

    A scientist starts out with a hypothesis.


    This is not true. The father of modern science, Francis Bacon, believed that science should be done by collecting as much data as possible and seeing what conclusions the data support.

    Hypothesis driven research is actually in a sense cheating, because in such research the data gathered is biased -- the researcher is not considering all the data which could bear upon the situation but only those data which the researcher believes could support or refute a preconcieved hypothesis. Nevertheless, hypothesis driven research is the norm in science because until recently, that was the only efficient way to do science.

    But with new techniques in data mining, we can begin to recapture the promise of Baconian science.
  • I couldn't agree more! There's nothing wrong with dreaming about Utopia. It's just that you'd have to eliminate approx. 99 percent of the human race to actually achieve it. Somebody deny the fact that for every Utopian dreamer there 33 pieces of shit and 66 ignorant morons running around...
  • Please contact the Universe Master at...

    ...root@omniverse.god?
    ...yhwh@creation.org?
    ...voice@burningbush.com
    ???

  • sorry I the kind of person who reads a lot of "Discover" articles, and so I know the basics but not many details. I did mean galaxies because I had just read an article on our discovery of galaxies that are older than previous thought, therefore challenging our current universe creation ideas.
  • I used to love astronomy as a kid. In fact, I got a scholarship and went off to Drake [drake.edu] thinking I would major in it. However, I loved it more than it loved me, I think.

    Anyway, while I was a kid, too poor to buy a telescope, I used to read astronomy books voraciously, and take notes on any stellar data I came across. Originally on ruled paper, I eventually transferred all of it to an AppleWorks db (on the Apple ][) in high school, and then into Excel (3? 4?) when I got to college. I used this to plot some very nice color H-R diagrams. This is the kind of project I really could have gotten excited about.

  • You'd probably want to get a look at the data on the Venus-geologist's computer instead, because it's been analyzed and selected and generally picked over to produce something meaningful.

    That would be true only if you were interested in doing the same type of calculations. IF you wanted to do something different you may want to calculate differently on something he had thrown out or agregated in a way that ruined your calculations

  • I wonder what research tools are proposed?

    Check out the National Virtual Observatory [caltech.edu] (really should be International VO) . This is not a M$ project; it's a new effort among astronomical data centers to do a lot of what you're asking about.
    -- tdk

  • Isn't that a server in a small cabin in Montana that sends logic bombs to more technologically advanced servers?<br>

    No wait, that's the Un<b>a</b>Server...

    Garg
  • Astronomy, nitwit, not Astrology.
  • by tjwhaynes ( 114792 ) on Wednesday November 22, 2000 @06:40AM (#607192)

    Your average astronomer is already a major data miner. From the Hubble Deep Field to the images taken in the back yard with a home-built CCD camera, much of modern observational astronomy is entirely built around being able to mine those images for correspondance, object attributes, clustering in either position, colour, or some other feature. Even with a basic catalogue built off one single wavelength plate will assign position, size, brightness, orientation, semi-major and semi-minor size, positional error, orientation error, brightness error, isophotal brightness, local background level and half-a-dozen other attributes to each object in the catalogue. There may be several thousand objects in a single frame. Making sense of this data set requires time, some ideas about what you are searching for and some luck.

    All that said, you'd be missing a lot as an astronomer if all you looked at was optical images. Going to other images for the same area of sky, be it infra-red, radio, x-ray and so on, will give you a deeper insight into the likely environment of your object and also into any likely confusions due to multiple structures along the line of sight.

    So having a vast data repository is important, and astronomers have had the tools to go and query multiple surveys at multiple wavelengths for several years. So there is nothing new here either from a data access point of view. The only really new thing in this proposal is to collate all the data together onto four super-mirrors and ensure that these supermirrors remain in sync, so if one system dies, it can be restored from the other mirrors without having to go back to tape backups.

    Cheers,

    Toby Haynes

  • this seems that this guy wants to act as a real scientist and exchange discoveries...

    an equal project has benn already made for genetics scientist's :

    when you want to search if a sequence you discovered in a kind of genome (exemple a frog) is present in other kind of living creatures (like the well studied bacteria e.coli)

    you post your sequence in a database (by the web) and it calculate the degree of similarity with sequences aldready disovered !

    and even more it can tells you what this gene/sequence is "made for" in this organism...

    a rare case of free community exchange in the great world of research...

    it's quite funny to realize that people who are thought to be the best brains in the world act as little egoist rats that want to preserve what they won... this is why i quit the "BIG" biology research for the more exciting/funny/free research in computing...

    i hope this guy will have te power an NRJ to go at the end of his project...

    ptitom

  • .. as you can throw enough storage space at the problem. Just having a giant stockpile of data isn't going to be of much use (except for archival purposes) unless we also have efficient access to the data onsite (we don't want to send Terrabytes over the newtwork) and have the correct tools to allow different datasets to be compared and correlated. The possibilities for doing large scale data comparisons or comparing a wide range of wavelength datasets is surely what is most interesting here and the major point for having an online store (as opposed to data archive). I wonder what research tools are proposed?
  • He wants to create a new type of astronomer, the astronomer that is a data miner

    Give me a break! Does this guy know anything about the field of astronomy from a professional point of view?

    Most astronomers/astrophysicists don't spend the time looking through the telescopes themselves - the majority use data that someone else has already gathered. I agree that this would greatily increase their ability find pertinent data, however, it would hardly bring about a new 'type' of astronomer, the majority are already data miners.

  • Well considering that Astronomy is mostly a Unix world, I don't forsee M$ controlling the data. Anyway, these outputs in raw form are some of the most mind numbing series of numbers. There is little entertainment value in this data. This keeps it safe from the suits. Even SETI data is not really fun. It's the irrelevent graphics that make it entertaining.
  • I hope they will cooperate with the GryPhiN [slashdot.org] project to make the worlds largest scientific distributed network.

    (This should be listed under my name instead of Anonymous Coward)

  • The thing that ticks me off about this, is that it's already being done. The Digitized Sky Survey [stsci.edu] is a survey of all parts of the sky from a couple of authoritative sources. The Medium Deep Survey [stsci.edu] is Hubble data, gathered in a sort of parasitic mode (roughly analogous to how Seti@Home gets their data - but IANAAstronomer - that's an orders-of-magnitude oversimplification). BOTH are available for access over the Web.

    Apart from having more observatories publish their data (most already do), having a central point to index it (not really here today, but if you want it you can generally find it - if it's not in the sky survey, it's not in the sky), and having M$ run things (please, no!), what does he hope to accomplish?

  • You might call Dr. Gray a researcher. He won the Turing Award in 1998 for his many contributions to the field of computer science.

    http://www. inf ormatik.uni-trier.de/~ley/db/journals/cacm/turing. html [uni-trier.de]

    Sheesh.
  • Sorry, but this isn't news. Go check out the ADASS [adass.org] conference proceedings for the last few years; the most recent meeting was in Boston [harvard.edu] (check out the group photo [harvard.edu]; that's me in the back row under the "12" :-)

    You'll find talk about data pipelines, " the grid ", and more. Of special note is that the technologies behind the actual efforts under way right now to create the NVO et al., are overwhelmingly based on Open Source technology and Unix. The fact that someone in Microsoft tries to jump on the bandwagon with what will presumably turn out to be a closed, proprietary solution, isn't really news.

  • In effect, isn't this saying "we haven't found anything useful in all these terrabytes, want a copy?"

    Not really. Most data centers don't do much analysis on the data, they just provide it to astronomers who do. The wider the data can be cast, the more science can be squeezed out of it.
    -- tdk

  • 'Data mining' approaches, sometimes known as 'gather tons of data and sift through it with statistics' are not science. They don't observe fundamental rules of how the Scientific Method works.

    So when Edwin Hubble plotted redshift vs distance of a bunch of galaxies and discovered that the universe is expanding, he wasn't doing science?
    -- tdk

  • Putting aside everything that could go wrong with a project like this (patenting, infrastructure, etc.) what about using two or more images of the same region of space taken at about the same time (or within 12 hours of each other, or whatever) to then extrapolate a finer detail that we could from the separate images? I understand this is how its done with the arrays of telescopes at some sites, and maybe it could be also used here.
  • Already done, see for example <a href="http://archive.stsci.edu/mast.html"> Multimission Archive at STScI </a>.

    Sorry Bill, try again.

    OverLord
  • if this Universe Database thingy starts spittin' out 42's left and right, I'll bloody well marry it. You don't get a more consistent partner than that...
  • by AntiPasto ( 168263 ) on Wednesday November 22, 2000 @06:26AM (#607206) Journal
    if this Universe Database thingy starts spittin' out 42's left and right, I'm headin' for the hills!

    ----

  • It might be just me, and maybe I'm paranoid due to too much Slashdotting, but this seems strange. I mean, all the Astronomers sharing their data with each other, working together on a mutual project. Next thing we know they'll be standing hand in hand singing "we all stand together!". Somehow the Utopian thought behind it makes my logical circuits sputter...
  • It's called 'SETI at Home' isn't it?
  • by karzan ( 132637 ) on Wednesday November 22, 2000 @06:32AM (#607209)
    The scientific community has always been one large, co-operative effort. This is only a technological enhancement of that. Granted, capitalism has contributed its fair share to science. But if science were based mainly in capitalism, we'd be in trouble--for one thing, how do you make money off astronomy?

    Scientists are already and have been for a long time working together, standing hand in hand. Maybe it seems Utopian from a selfish viewpoint but it's very natural to scientists.

  • Well, what I am interested about is that this would make it so that astronomers <em>don't</em> have to use a telescope. Remember, we only have one sky. For each image, there would be attributes for the time-date the image was taken, the celestrial coordinates it was taken at, the magnification, the geographical coordinates it was taken at, and perhaps even the weather conditions.

    I don't see a reason why images that aren't in the visible spectrum can't be put into this database. Then you would need perhaps a spectrum range attribute.

    The exciting thing in my opinion is what can be done with all this data. Imagine creating a starmap of the entire sky based on real observation, it may be zoomable at some points. Everytime a telescope takes a picture of the sky, it gets put into this database. That could yield a huge amount of data in relatively short time. I can very much see astronomers using this data instead their own observations. Imagine a "video" of the same part of the sky in twenty years.

    This can be done from software if all the data is there. I know I would love this kind of thing to be publicly archivable. If I see something in the sky, I can then look onto the internet to see if there was any other images of it.

    Sorry if my post is less than coherent, but this seems exctiting to me.
  • Um, I don't think this was a white paper. I think it was an idea...perhaps a proposal to astronomers.

    I don't think it matters what OS they use or what database system they use, etc. etc. until they start implementing it.

    I think the astronomers would very much appreciate this use of technology. It is one of the purist uses of technology I have known.

    But I am interested in details as well though. So for those of you who specialize in this sort of stuff, how would you go about implementing this sort of system? Would GNU/Linux be able to handle it?
  • SETI@home handels the calculation of the data, not the collection of it...and all those teams most certainly aren't going to store and mirror all that data.
  • by smack_attack ( 171144 ) on Wednesday November 22, 2000 @06:33AM (#607213) Homepage
    404 - Universe Not Found

    Please contact the Universe Master at...
  • I don't read artilces in Dr Dobbs anymore. It's a waste of time.
  • by iamsure ( 66666 ) on Wednesday November 22, 2000 @06:44AM (#607215) Homepage
    The article definitely gets the ol' geek hairs on the back of your neck standing up. Petabyte backups, tape recovery that takes 5 days..

    Lots of stuff that makes geek men howl.

    However, it leaves out a *TON*. Like, what technology are they going to use to DO data mining? What database will run this monster? Which OS will it run on?

    Further, what license/restrictions are there on the data once it gets published? Is it totally public knowledge, free of copyright?

    Fundamental questions of large scope and size, not easily ignored.

    However, the question *I* have is, why not do the data storage on online companies KNOWN for hosting data, instead of at astronomies, who have little experience at that.
  • Although I'm not aware of any database of pure raw data, NASA at least have the Distributed Astronomy Library, described here [dlib.org], which is a repository of astronomical *information*. An example is here [harvard.edu]
  • by Ektanoor ( 9949 ) on Wednesday November 22, 2000 @06:47AM (#607217) Journal
    *FUD start*Such thing reminds me of some M$ ideas on concentrating everything all around the world in one bucket. Somehow this resulted in the .NET idea. So now we are up to the Universe...*FUD end*

    Well, anyway the idea is not so bad at all. But I don't see how to realise it without making some radical changes in the system. First we have to deal with communication channels. For such volumes like astronomical databases they are highly unreliable. We are not going to run pentabytes on them but surely there will be gigabytes going back and forth. Let's note. A Mars raw image from PDS weighs sometimes up to 20 Megabytes. Processing such images leads sometimes to data volumes 10-30 times bigger. On some cases it is possible to apply JPEG to compress these images. But sometimes it is highly undesirable to do it. So we get something weighing 100-200 Megs. On a 100Mb network, that will take a few minutes to pass from station to station. Now imagine a widespread, worldwide network working such way.

    On one side we have archives all spread over the world. On the other side this rises a community of astronomers also working all over. It will be a big challenge to achieve such thing. And a big financial adventure. Maybe dumb burrocritters will think that data will be cheaper if it keeps rotting in a magnetic tape.
  • I guess this will prevent the "Giant Asteroid to Destroy Earth" story from being slashdotted.

    In effect, isn't this saying "we haven't found anything useful in all these terrabytes, want a copy?"

    If the same approach was used with /., would it mean copying all the flamewars and troll posts? How much of a waste is that?

    I'll keep reading at -1, looking for meaning, and let you know what I find.

  • It is happening in other sciences. For example, my field "bioinformatics" deals with analyzing molecular biological data, much of which is in public databases such as GenBank. Once experimental molecular biologists could be expected to analyze all their data themselves because there just wasn't very much of it. That just isn't true anymore.
  • There is already too much information for one or two astronomers to keep by themselves.

    This should have been implemented a long time ago, because the amount of information we are pulling in right now is tremendous and it will only increase with the release of the more and more satellites we send up [nasa.gov]. We need this database for three very important reasons

    • Possible Collision w/Asteroid
    • Mapping New Planets
    • New Universe Discovery

    We are all concerned, due to recent movies, that we might get hit by an asteroid, which is a valid concern, so we need to carefully track the asteroids that we find because we are only currently searching 10percent of the sky. Secondly with newer and more powerfull telescopes we are mapping more and more planets outside our solar system everyday, soon they will role in by the dozens a day. And finally, we are discovering universes that are farther away and therefore younger than any we have previously discovered

    jbischof

  • Somehow the Utopian thought behind it makes my logical circuits sputter...

    I'm sure you are jesting, but anyhoo...

    There are people that are only motivated by money that can't seem to understand that not everyone is motivated by same. If everyone were motivated solely for financial windfall, would Linux exist at all?

    Outside of the "hacker" community, I believe that the academic and scientific type communities have contributed the most effort to Linux software in the first ten years (is it 10 years old yet? Maybe eight years), so it's not that much of a stretch. Scientific papers are about trying to share information in a hope furthering knowledge.

    People wanting to get master's and doctorates were able to contribute some effort on their thesis papers.
  • I'm glad to see the collection of huge, free data sets in astronomy. I very much want to do this kind of thing in my own field with Indo-European linguistic data. The little bits I've got so far are at: http://www.ling.upenn.edu/~kurisuto/germanic/langu age_resources.html [upenn.edu]

    The biggest problem is, of course, data entry. A lot of the texts pose a challenge for OCR for a number of reasons, including the large number of special characters often used.

    Another problem is people who insist on copyrighting and refusing to freely share their collections of online documents in the older languages, which is a real shame, because it prevents me from creating all kinds of interesting derived works (e.g. web pages of Old English texts where you can click any word to get information about it). It basically means that all this work has to be repeated by anyone who wants to make those texts freely available-- never mind that we're talking about works over 1000 years old!

  • by Anonymous Coward
    You, my friend, are a fucking tool.

    If you would like every single scientist to make his own equipment, and perform every experiment from the ground up (going through all the previous experiments to validate the groundwork theory) then you're are missing the point of the scientific method alltogether.

    From the scientific communities standpoint, wouldn't it make more sense for everyone to agree on a certain apparatus to collect the observational data, and then let everyone analyze the data on their own terms? We only have a handful of particle accelerators, however we have made serious progression in our scientific understanding by sharing their collective output data.

    How do you come up with a hypothesis about 'something' if you don't even have a clue what defines the 'something' in the first place?

    After the data is 'statistically' sifted through, we can then make up hypothesis as to how it appears to be the way it is, and then consolidate the theories. You can't make an experiment in astronomy! The fundamental basis of astronomy has always been a gathering of tons of data and sifting through it.

    Don't waste the energy required to type if you don't got a clue about what the hell it is you're trying to discount.

    -An Anonymous Coward Against the Unfounded Bashing of Astronomical Methods

  • Space escalator damnit!
  • DODS if for Oceanographic Data, but could be easily adapted to Astronomy Data.

    In a nutshell, you put a CGI script on your server that maps out your database to standard format (Adapter pattern) and a web or desktop client
    can preform queries against everyone who is using DODS on the server.

    http://www.unidata.ucar.edu/packages/dods/

    --Doug
  • I don't know if it was angular momentum that Kepler figured out from his data, but he did study Tycho's data.

    Tycho Brahe may not have been much of an astrophysist, scientist, or whatnot, but he was a hell of an observer, ESPECIALLY when you consider the crappy tools he had -- an eyeball, a sextant, and an optical telescope.

    Scientists today still study his data, because there is so much of it, for such a long time, with such a high degree of accuracy. It's useful for all kinds of things; dating stars (or human events, like pyramid building :-) by using stellar precession, etc.

    --
  • The NBC sports guy? That Jim Gray? The guy who never smiles? Figures--he was done after the Pete Rose thing, I guess.
    --
  • I'm no scientist, but I don't think they should use a lossful file format for this kind of thing.

    Scientist: Hmm...what's this shady pixel on mars here? Could it be...could it be life!

    Geek: Nahh...that just a result of the JPEG algorithm just making up pixels it lost in its compression algorithm.

  • Check out the Canadian Astronomy Data Centre [hia.nrc.ca]. It has archives of the HST [stsci.edu], CFHT [hawaii.edu], JCMT [hawaii.edu], DSS, CGPS [drao.nrc.ca], ESO [eso.org], LaPalma, AAT [aao.gov.au], ATNF, USNO [navy.mil] Guide stars, UKIRT [hawaii.edu], ... Once the Gemini [hia.nrc.ca] telescopes are operational, I assume that the CADC will also archive them.

    All these archives are searchable from the web site, and (if you've registered with them) available for download. Images from HST and CADC are restricted to only the primary researcher(s) for a period of time (I think it's a year).

  • by q000921 ( 235076 ) on Wednesday November 22, 2000 @12:42PM (#607230)
    The rational design for that kind of database is to put the image data into the file system and use the relational database for indexing and lookups. Most of the open source databases are perfectly up to the task of providing indexing for that kind of data. In fact, the amount of metadata in such applications is small compared to the kinds of data encountered in many commercial applications, so this is actually not even a particularly interesting benchmark for high-end database systems.

    Putting multimedia data into the file system is the implementation strategy many commercial databases (including some versions of DB2) take behind the scenes for storing multimedia objects, even if they hide it behind a database API. They can still provide all the database facilities (transactions, indexing, access control, etc.) on top of such an implementation.

    With that kind of architecture, you don't need a very powerful machine or high performance database to be able to serve image data at disk bandwidth or network bandwidth.

  • Turing Award? How big was his dataset? I guess I should have been more clear.

    I meant to ask if he is a researcher that works with datasets larger than he can pull out of his ass?

    If he was, he'd figure out that a huge amount of data means something to just a couple people.

    He'd also figure out that researchers build their little kingdoms, and they are NOT going to want to contribute their data to the project.

    In short, the creator of this Astronomy database sounds like he doesn't understand the politics of the situation.
  • Wasn't it Kepler who looked over Brahe's work to work out his law of conservation of angular momentum?
  • If the data has similarities in all those channels, maybe specialized lossless compression for astronomic images can be developed. Compression results always get better once you have a modeler designed specifically for your class of data.
  • the academic and scientific type communities have contributed the most effort to Linux software in the first ten years (is it 10 years old yet? Maybe eight years)

    If you count from when emacs started being worked on in the mid 70s, the Linux software canon is about 25 years old.

    But the 1.0 kernel was released in mid 1994, so six years counting from then.
  • It definitely does not seem Utopian from a selfish viewpoint, sice I would be jubilant at this kind of development. It is Utopian when I use my experiences with the human being in general as a reference...
  • he was a hell of an observer, ESPECIALLY when you consider the crappy tools he had -- an eyeball, a sextant, and an optical telescope.

    Brahe didn't even have that - the telescope wasn't invented until after his death. Making his observations even more amazing!

  • by zpengo ( 99887 ) on Wednesday November 22, 2000 @06:35AM (#607237) Homepage
    There's an article out in Slashdot that pans the Space Station, but then gets into some actually interesting matter, like the increasing ability to actually do data mining. Data mining has long been a staple of hard science fiction, but the benefits of being able to /really/ do it are immense - less pollution, really clean data. There's just that nasty get-the-material to the factory issue. But that's why we need a space elevator, right?
  • He'd also figure out that researchers build their little kingdoms, and they are NOT going to want to contribute their data to the project.

    Not true actually; the data - as in the raw observations - usually belongs to the observatory where it was collected. The researcher has first use of it, obviously, but eventually it all becomes public access. Space Telescope are a prime example of this, but most big observatories do something similar these days.

  • After all, astronomy is really just the gathering of lots and lots and lots of images and the analysis of those images. I'm surprised this didn't happen earlier actually. This allows for distributed analysis of all the images gathered at all the observatories, theoretically--imagine the power behind that. If other scientific fields could follow suit our progress would be accelerated greatly.
  • Skyview [nasa.gov] and NED [caltech.edu] are also very useful resources along similar lines.

    But apart from Microsoft's involvement (and the idea has been batted around the astro community for years, it's nothing new), assembling the whole lot in one place *is* a big step up from all those disparate collections which currently exist. It's like the move from BBS's to the Internet - the "barriers to entry" are so much lower that it becomes easy to use (and so will be used) rather than tedious (and so is used only by the cognoscenti).

  • Indeed..

    Just last night I was watching 'Nova'. The main attention of the show was given to a group of astronomers from Australia who were observing supernovae in distant galaxies to determine the speed the universe is expanding.

    Anyhoo, they made mention that another group (from Berkley I believe) were working on the same project. BUT, it was not a partnership, but rather a competition. The Aussies were worried because they had limited telescope time, and the Berkley people were going to be in there right after them.

    That doesn't sound quite as friendly as the professional attitude you referred to.




    Wes

  • by q000921 ( 235076 ) on Wednesday November 22, 2000 @08:37AM (#607242)
    I think this is mostly done for Gray and Microsoft to get publicity for their database. This is the continuation of TerraServer and other projects like that. Microsoft is trying to demonstrate "scalability" of their database and servers and to get it into the hardcore scientific server area.

    As usual, Microsoft is late to the party and comes with their own agenda. Microsoft products are oriented towards small business and desktop applications. That's what their evolution is driven by and that's what they are designed for. Whether this kind of data should be in a relational database is questionable to begin with. And it certainly doesn't need to be on an expensive, proprietary operating system and in a proprietary format.

    Scientists already have excellent open-source tools to build long-term, stable, large-scale data collections. They would be foolish to tie research projects that can span decades to the fortunes of a company in the middle of a battle for the US business computing market, merely to gain some trinkets and give that company a publicity boost.

  • There are a few reasons for hosting data at astronomical sites instead of on commercial servers.
    1. Often, the data are most-often accessed at the site where they are stored. There is a growing tendency to provide the data-mining tools at the archive site. It is vastly easier to provide, test and maintain these tools at a scientific site. Check out the "grid" concept in meta-computing.
    2. Many nations that do astronomy in a big way have dedicated networks for moving academic and scientific data (consider the .ac.uk domain in the UK). Storing the data somewhere off the academic network might reduce the delivery rates to scientifc sites.
    3. Commercial storage costs extra money. The storage provider has to make some profit otherwise they'd go bust, whereas a scientific site can run happily at cost. Once you scale up to petabyte databases, commercial providers no longer have any scale advantage to make them cheaper.
  • What's wrong with dreaming of Utopia? Isn't fact strander than fiction and haven't we overachieved more than utopian dreams at many instances.
  • so that in case some of your data is corrupt
    you can get it from a mirror rather than spend
    five and a half years restoring from tape.
  • <groucho>thats the worst spell of universe i've seen in a long time </groucho>
  • I attended the Virtual Observatories of the Future [caltech.edu] conference this past summer and would like to note that:

    • Jim Gray has been collabortaing [adass.org] with the astronomical data community for some time.
    • The spacial-indexing schemes Jim helped develop for Terraserver will be key to performant queries for a Virtual Observatory
    • Jim Gray was well-known in the database community as the guru of pe rforman ce metrics [fatbrain.com], long before joining Microsoft

    The take-home lesson from the Virtual Observatories conference was that the amount of data required to do science with a "virtual observatory" leads to interesting problems in computer science, problems which are only tractable when analyzed by collaborations between statisticians, computer science people, and the astronomers themselves.

    Finally, note that this year's historic increase in the National Science Foundation budget is largely due to the new Information Technology Research Initiative [nsf.gov]. The need for new methods of data managment in the sciences is real.

  • This is nothing new. Astronomers already do this. In fact I just finished an assignment in my astro class which was 100% web data mining, and we have grad students here doing the same for their thesis.

    I also find it rather stupid to make a "network of observatories". Perhaps microsoft.COM forgets, the world wide web was invented by physicists and astronomers for that purpose. The WWW _is_ this database he wants to "create". Someone's been learning from Al Gore I think.

    And what's with MS trying to pretend they have a PARC by calling it BARC? *shaking head in shame*
  • I'm not the AC poster, but the part that seemed naive to me was "Scientists are already and have been for a long time working together, standing hand in hand. Maybe it seems Utopian from a selfish viewpoint but it's very natural to scientists."

    Scientists are sometimes co-operative, sometimes bitterly competitive. Sometimes they share their data, sometimes they guard it jealously. Sometimes they go to great lengths to sneak a look at each other's data.

    For an exampe, see The Double Helix by James Watson, where Watson and Crick win a Nobel prize, partly by gaining access to Rosalind Franklin's X-ray pictures of DNA.

  • You're painfully naive if you think the opportunists who've taken over academia in the last generation and a half are anything but self-serving parasites.

    Finally something I can work with, at least more informative than just accusing someone as being naive without doing the littlest thing to fix the problem. I had forgotten about this.
  • Right. Public access. Sure. I used to work for an agrochemical database compiler, and they used to have to sue researchers regularly to get the data. The researchers would promise to get the data to us, but it would get lost, or it would be late, or whatever.

    In business, and in science, the person who has the first crack at the data has the best crack at making a buck/making a discovery.

    Those cranky old bastards at universities weren't opposed to getting us data. They were just opposed to us getting the data in time to make money from putting it into a database.

    So, yet another situation where theory is different than practice.

    I still think that the person putting together this database is going to hit political resistance just as soon as his economy threatens the ability of astronomers to bring in grant dollars for their universities.
  • That's interesting, but irrelevant. The observatories don't let the astronomers take the data away and then beg them to give it back - they keep their own copies for archival purposes. So their is no problem here - the observatory just unilaterally releases the data to the public whenever it wants to. Also, astronomy (at its best) has much more of a pure science, data-sharing mentality than agrochemicals, which I presume is sullied by real-world concerns ;)

    I'm not saying this like "I think this is so", I'm telling you how it is. I used to be an astronomer and my old thesis supervisor occasionally tries to interest me in working on one of these virtual observatory projects.

  • Salsadot is the News for Mexicans site.
  • Great! Once this is up and running, I can use their info to build that scale model of the entire cosmos in my backyard. I was thinking of using N-guage to match my train-set.

Recent investments will yield a slight profit.

Working...