Creating The UniServer 81
bmongar writes " DrDobbs has an article about a project for a mirrored universal astronomy database. Jim Gray basically wants a netowrk of observatories around the world to publish their data and mirror other observatories' data. Basically creating a quadruple redundant system of data all avaliable online. He wants to create a new type of astronomer, the astronomer that is a data miner." As the article also says, the guy behind this is the guy behind the TerraServer as well.
You are Here... (Score:1)
Now, if they just used (Score:1)
Re:No real depth.. (Score:1)
Well, as the guy is part of the Microsoft Research Team, I'd guess Whistler Version 7 running Access 2050 :)
Re:Welcome to Utopia (Score:2)
It would help me if you pointed out what you thought was naive and why.
There is a whole gaggle of scientific work that at first seemed totally worthless commercially but eventually had commercial uses within 100 years or even much, much faster.
Nap-STAR? (Score:1)
Seriously though -- if there really are lot of amateur astronomers out there snapping digital pictures of comets, would there be any benefit to creating an automatically indexed peer to peer server scheme?
Jim Gray is a DBMS wizard (Score:1)
His book with Andreas Reuter, Transaction Processing : Concepts and Techniques [amazon.com] is terrific.
Re:WTF? Why? (Score:1)
Is the creator a researcher? (Score:2)
As an example, you can get a gazillion CD-ROM's with the Magellan data from Venus. But what good is that raw data? Not much. You'd probably want to get a look at the data on the Venus-geologist's computer instead, because it's been analyzed and selected and generally picked over to produce something meaningful.
Re:Why doesnt this exist already (Score:2)
that are farther away and therefore younger than any we have previously discovered
I REALLY hope you meant to say galaxies instead of universes. I am a physics major with a healthy lean towards computational cosmology, and I would really really hope that if we had discovered entire new universes, I would know about it. SDSS, however, has discovered quite a few new galaxies, so I will assume that is what you were referring to
Re:This seems like the natural evolution of astron (Score:2)
A scientist starts out with a hypothesis.
This is not true. The father of modern science, Francis Bacon, believed that science should be done by collecting as much data as possible and seeing what conclusions the data support.
Hypothesis driven research is actually in a sense cheating, because in such research the data gathered is biased -- the researcher is not considering all the data which could bear upon the situation but only those data which the researcher believes could support or refute a preconcieved hypothesis. Nevertheless, hypothesis driven research is the norm in science because until recently, that was the only efficient way to do science.
But with new techniques in data mining, we can begin to recapture the promise of Baconian science.
Re:Welcome to Utopia (Score:1)
Re:Error... (Score:1)
...root@omniverse.god?
...yhwh@creation.org?
...voice@burningbush.com
???
Yes Galaxies (Score:1)
sounds like what I used to do (Score:1)
I used to love astronomy as a kid. In fact, I got a scholarship and went off to Drake [drake.edu] thinking I would major in it. However, I loved it more than it loved me, I think.
Anyway, while I was a kid, too poor to buy a telescope, I used to read astronomy books voraciously, and take notes on any stellar data I came across. Originally on ruled paper, I eventually transferred all of it to an AppleWorks db (on the Apple ][) in high school, and then into Excel (3? 4?) when I got to college. I used this to plot some very nice color H-R diagrams. This is the kind of project I really could have gotten excited about.
Re:Is the creator a researcher? (Score:2)
That would be true only if you were interested in doing the same type of calculations. IF you wanted to do something different you may want to calculate differently on something he had thrown out or agregated in a way that ruined your calculations
Re:Storing the data isn't a problem... (Score:1)
Check out the National Virtual Observatory [caltech.edu] (really should be International VO) . This is not a M$ project; it's a new effort among astronomical data centers to do a lot of what you're asking about.
-- tdk
UniServer? (Score:1)
No wait, that's the Un<b>a</b>Server...
Garg
Re:WTF? Why? (Score:1)
Observational Astronomers are already data miners (Score:4)
Your average astronomer is already a major data miner. From the Hubble Deep Field to the images taken in the back yard with a home-built CCD camera, much of modern observational astronomy is entirely built around being able to mine those images for correspondance, object attributes, clustering in either position, colour, or some other feature. Even with a basic catalogue built off one single wavelength plate will assign position, size, brightness, orientation, semi-major and semi-minor size, positional error, orientation error, brightness error, isophotal brightness, local background level and half-a-dozen other attributes to each object in the catalogue. There may be several thousand objects in a single frame. Making sense of this data set requires time, some ideas about what you are searching for and some luck.
All that said, you'd be missing a lot as an astronomer if all you looked at was optical images. Going to other images for the same area of sky, be it infra-red, radio, x-ray and so on, will give you a deeper insight into the likely environment of your object and also into any likely confusions due to multiple structures along the line of sight.
So having a vast data repository is important, and astronomers have had the tools to go and query multiple surveys at multiple wavelengths for several years. So there is nothing new here either from a data access point of view. The only really new thing in this proposal is to collate all the data together onto four super-mirrors and ensure that these supermirrors remain in sync, so if one system dies, it can be restored from the other mirrors without having to go back to tape backups.
Cheers,
Toby Haynes
a real scientist ? (Score:1)
this seems that this guy wants to act as a real scientist and exchange discoveries...
an equal project has benn already made for genetics scientist's :
when you want to search if a sequence you discovered in a kind of genome (exemple a frog) is present in other kind of living creatures (like the well studied bacteria e.coli)
you post your sequence in a database (by the web) and it calculate the degree of similarity with sequences aldready disovered !
and even more it can tells you what this gene/sequence is "made for" in this organism...
a rare case of free community exchange in the great world of research...
it's quite funny to realize that people who are thought to be the best brains in the world act as little egoist rats that want to preserve what they won... this is why i quit the "BIG" biology research for the more exciting/funny/free research in computing...
i hope this guy will have te power an NRJ to go at the end of his project...
ptitom
Storing the data isn't a problem... (Score:2)
Data Miner? (Score:2)
Give me a break! Does this guy know anything about the field of astronomy from a professional point of view?
Most astronomers/astrophysicists don't spend the time looking through the telescopes themselves - the majority use data that someone else has already gathered. I agree that this would greatily increase their ability find pertinent data, however, it would hardly bring about a new 'type' of astronomer, the majority are already data miners.
Re:Great idea but . . . (Score:1)
GriPhyN (Score:1)
(This should be listed under my name instead of Anonymous Coward)
Already being done... (Score:2)
Apart from having more observatories publish their data (most already do), having a central point to index it (not really here today, but if you want it you can generally find it - if it's not in the sky survey, it's not in the sky), and having M$ run things (please, no!), what does he hope to accomplish?
Re:Is the creator a researcher? (Score:1)
http://www. inf ormatik.uni-trier.de/~ley/db/journals/cacm/turing
Sheesh.
National/Worldwide Virtual Observatory, ADASS (Score:1)
You'll find talk about data pipelines, " the grid ", and more. Of special note is that the technologies behind the actual efforts under way right now to create the NVO et al., are overwhelmingly based on Open Source technology and Unix. The fact that someone in Microsoft tries to jump on the bandwagon with what will presumably turn out to be a closed, proprietary solution, isn't really news.
Re:I once xeroxed a blank sheet of paper... (Score:1)
Not really. Most data centers don't do much analysis on the data, they just provide it to astronomers who do. The wider the data can be cast, the more science can be squeezed out of it.
-- tdk
Re:'Data Mining' isn't science. (Score:1)
So when Edwin Hubble plotted redshift vs distance of a bunch of galaxies and discovered that the universe is expanding, he wasn't doing science?
-- tdk
Extra Functionality of the Database (Score:1)
already done (Score:1)
Sorry Bill, try again.
OverLord
Re:Yeah that's fine but... (Score:1)
Yeah that's fine but... (Score:4)
----
Welcome to Utopia (Score:1)
Astronomer as data miner (Score:2)
Re:Welcome to Utopia (Score:3)
Scientists are already and have been for a long time working together, standing hand in hand. Maybe it seems Utopian from a selfish viewpoint but it's very natural to scientists.
Re:Observational Astronomers are already data mine (Score:2)
I don't see a reason why images that aren't in the visible spectrum can't be put into this database. Then you would need perhaps a spectrum range attribute.
The exciting thing in my opinion is what can be done with all this data. Imagine creating a starmap of the entire sky based on real observation, it may be zoomable at some points. Everytime a telescope takes a picture of the sky, it gets put into this database. That could yield a huge amount of data in relatively short time. I can very much see astronomers using this data instead their own observations. Imagine a "video" of the same part of the sky in twenty years.
This can be done from software if all the data is there. I know I would love this kind of thing to be publicly archivable. If I see something in the sky, I can then look onto the internet to see if there was any other images of it.
Sorry if my post is less than coherent, but this seems exctiting to me.
Re:No real depth.. (Score:2)
I don't think it matters what OS they use or what database system they use, etc. etc. until they start implementing it.
I think the astronomers would very much appreciate this use of technology. It is one of the purist uses of technology I have known.
But I am interested in details as well though. So for those of you who specialize in this sort of stuff, how would you go about implementing this sort of system? Would GNU/Linux be able to handle it?
Re:Astronomer as data miner (Score:1)
Error... (Score:3)
Please contact the Universe Master at...
Dr Dobbs sucks (Score:1)
No real depth.. (Score:4)
Lots of stuff that makes geek men howl.
However, it leaves out a *TON*. Like, what technology are they going to use to DO data mining? What database will run this monster? Which OS will it run on?
Further, what license/restrictions are there on the data once it gets published? Is it totally public knowledge, free of copyright?
Fundamental questions of large scope and size, not easily ignored.
However, the question *I* have is, why not do the data storage on online companies KNOWN for hosting data, instead of at astronomies, who have little experience at that.
Similar project (Score:2)
Good but difficult (Score:3)
Well, anyway the idea is not so bad at all. But I don't see how to realise it without making some radical changes in the system. First we have to deal with communication channels. For such volumes like astronomical databases they are highly unreliable. We are not going to run pentabytes on them but surely there will be gigabytes going back and forth. Let's note. A Mars raw image from PDS weighs sometimes up to 20 Megabytes. Processing such images leads sometimes to data volumes 10-30 times bigger. On some cases it is possible to apply JPEG to compress these images. But sometimes it is highly undesirable to do it. So we get something weighing 100-200 Megs. On a 100Mb network, that will take a few minutes to pass from station to station. Now imagine a widespread, worldwide network working such way.
On one side we have archives all spread over the world. On the other side this rises a community of astronomers also working all over. It will be a big challenge to achieve such thing. And a big financial adventure. Maybe dumb burrocritters will think that data will be cheaper if it keeps rotting in a magnetic tape.
I once xeroxed a blank sheet of paper... (Score:1)
In effect, isn't this saying "we haven't found anything useful in all these terrabytes, want a copy?"
If the same approach was used with /., would it mean copying all the flamewars and troll posts? How much of a waste is that?
I'll keep reading at -1, looking for meaning, and let you know what I find.
Re:This seems like the natural evolution of astron (Score:2)
Why doesnt this exist already (Score:2)
This should have been implemented a long time ago, because the amount of information we are pulling in right now is tremendous and it will only increase with the release of the more and more satellites we send up [nasa.gov]. We need this database for three very important reasons
We are all concerned, due to recent movies, that we might get hit by an asteroid, which is a valid concern, so we need to carefully track the asteroids that we find because we are only currently searching 10percent of the sky. Secondly with newer and more powerfull telescopes we are mapping more and more planets outside our solar system everyday, soon they will role in by the dozens a day. And finally, we are discovering universes that are farther away and therefore younger than any we have previously discovered
jbischof
Re:Welcome to Utopia (Score:2)
I'm sure you are jesting, but anyhoo...
There are people that are only motivated by money that can't seem to understand that not everyone is motivated by same. If everyone were motivated solely for financial windfall, would Linux exist at all?
Outside of the "hacker" community, I believe that the academic and scientific type communities have contributed the most effort to Linux software in the first ten years (is it 10 years old yet? Maybe eight years), so it's not that much of a stretch. Scientific papers are about trying to share information in a hope furthering knowledge.
People wanting to get master's and doctorates were able to contribute some effort on their thesis papers.
Free collections of historical linguistic data (Score:2)
The biggest problem is, of course, data entry. A lot of the texts pose a challenge for OCR for a number of reasons, including the large number of special characters often used.
Another problem is people who insist on copyrighting and refusing to freely share their collections of online documents in the older languages, which is a real shame, because it prevents me from creating all kinds of interesting derived works (e.g. web pages of Old English texts where you can click any word to get information about it). It basically means that all this work has to be repeated by anyone who wants to make those texts freely available-- never mind that we're talking about works over 1000 years old!
Re:'Data Mining' isn't science. (Score:1)
If you would like every single scientist to make his own equipment, and perform every experiment from the ground up (going through all the previous experiments to validate the groundwork theory) then you're are missing the point of the scientific method alltogether.
From the scientific communities standpoint, wouldn't it make more sense for everyone to agree on a certain apparatus to collect the observational data, and then let everyone analyze the data on their own terms? We only have a handful of particle accelerators, however we have made serious progression in our scientific understanding by sharing their collective output data.
How do you come up with a hypothesis about 'something' if you don't even have a clue what defines the 'something' in the first place?
After the data is 'statistically' sifted through, we can then make up hypothesis as to how it appears to be the way it is, and then consolidate the theories. You can't make an experiment in astronomy! The fundamental basis of astronomy has always been a gathering of tons of data and sifting through it.
Don't waste the energy required to type if you don't got a clue about what the hell it is you're trying to discount.
-An Anonymous Coward Against the Unfounded Bashing of Astronomical Methods
Re:Data Mining vs. Asteroid Mining (Score:1)
They need to check out DODS (Score:1)
In a nutshell, you put a CGI script on your server that maps out your database to standard format (Adapter pattern) and a web or desktop client
can preform queries against everyone who is using DODS on the server.
http://www.unidata.ucar.edu/packages/dods/
--Doug
We still do this today (Score:2)
Tycho Brahe may not have been much of an astrophysist, scientist, or whatnot, but he was a hell of an observer, ESPECIALLY when you consider the crappy tools he had -- an eyeball, a sextant, and an optical telescope.
Scientists today still study his data, because there is so much of it, for such a long time, with such a high degree of accuracy. It's useful for all kinds of things; dating stars (or human events, like pyramid building
--
Jim Gray? (Score:1)
--
Don't use JPEG (Score:2)
Scientist: Hmm...what's this shady pixel on mars here? Could it be...could it be life!
Geek: Nahh...that just a result of the JPEG algorithm just making up pixels it lost in its compression algorithm.
Existing DataCentre: CADC (Score:1)
All these archives are searchable from the web site, and (if you've registered with them) available for download. Images from HST and CADC are restricted to only the primary researcher(s) for a period of time (I think it's a year).
lots of multimedia data in open source databases (Score:3)
Putting multimedia data into the file system is the implementation strategy many commercial databases (including some versions of DB2) take behind the scenes for storing multimedia objects, even if they hide it behind a database API. They can still provide all the database facilities (transactions, indexing, access control, etc.) on top of such an implementation.
With that kind of architecture, you don't need a very powerful machine or high performance database to be able to serve image data at disk bandwidth or network bandwidth.
Re:Is the creator a researcher? (Score:2)
I meant to ask if he is a researcher that works with datasets larger than he can pull out of his ass?
If he was, he'd figure out that a huge amount of data means something to just a couple people.
He'd also figure out that researchers build their little kingdoms, and they are NOT going to want to contribute their data to the project.
In short, the creator of this Astronomy database sounds like he doesn't understand the politics of the situation.
Astronomers as data miners a tradition (Score:2)
Re:Good but difficult (Score:2)
Re:Welcome to Utopia (Score:2)
If you count from when emacs started being worked on in the mid 70s, the Linux software canon is about 25 years old.
But the 1.0 kernel was released in mid 1994, so six years counting from then.
Re:Welcome to Utopia (Score:1)
Re:We still do this today (Score:1)
Brahe didn't even have that - the telescope wasn't invented until after his death. Making his observations even more amazing!
Data Mining vs. Asteroid Mining (Score:3)
Re:Is the creator a researcher? (Score:1)
Not true actually; the data - as in the raw observations - usually belongs to the observatory where it was collected. The researcher has first use of it, obviously, but eventually it all becomes public access. Space Telescope are a prime example of this, but most big observatories do something similar these days.
This seems like the natural evolution of astronomy (Score:1)
Re:Already being done... (Score:1)
But apart from Microsoft's involvement (and the idea has been batted around the astro community for years, it's nothing new), assembling the whole lot in one place *is* a big step up from all those disparate collections which currently exist. It's like the move from BBS's to the Internet - the "barriers to entry" are so much lower that it becomes easy to use (and so will be used) rather than tedious (and so is used only by the cognoscenti).
Re:Welcome to Utopia (Score:1)
Just last night I was watching 'Nova'. The main attention of the show was given to a group of astronomers from Australia who were observing supernovae in distant galaxies to determine the speed the universe is expanding.
Anyhoo, they made mention that another group (from Berkley I believe) were working on the same project. BUT, it was not a partnership, but rather a competition. The Aussies were worried because they had limited telescope time, and the Berkley people were going to be in there right after them.
That doesn't sound quite as friendly as the professional attitude you referred to.
Wes
Microsoft publicity (Score:3)
As usual, Microsoft is late to the party and comes with their own agenda. Microsoft products are oriented towards small business and desktop applications. That's what their evolution is driven by and that's what they are designed for. Whether this kind of data should be in a relational database is questionable to begin with. And it certainly doesn't need to be on an expensive, proprietary operating system and in a proprietary format.
Scientists already have excellent open-source tools to build long-term, stable, large-scale data collections. They would be foolish to tie research projects that can span decades to the fortunes of a company in the middle of a battle for the US business computing market, merely to gain some trinkets and give that company a publicity boost.
Re:No real depth.. (Score:1)
Re:Welcome to Utopia (Score:1)
that is what the mirroring is for (Score:1)
you can get it from a mirror rather than spend
five and a half years restoring from tape.
uniserver (Score:1)
This is new, this is for real (Score:1)
I attended the Virtual Observatories of the Future [caltech.edu] conference this past summer and would like to note that:
The take-home lesson from the Virtual Observatories conference was that the amount of data required to do science with a "virtual observatory" leads to interesting problems in computer science, problems which are only tractable when analyzed by collaborations between statisticians, computer science people, and the astronomers themselves.
Finally, note that this year's historic increase in the National Science Foundation budget is largely due to the new Information Technology Research Initiative [nsf.gov]. The need for new methods of data managment in the sciences is real.
Astronomers already do this - That's what WWW is! (Score:1)
I also find it rather stupid to make a "network of observatories". Perhaps microsoft.COM forgets, the world wide web was invented by physicists and astronomers for that purpose. The WWW _is_ this database he wants to "create". Someone's been learning from Al Gore I think.
And what's with MS trying to pretend they have a PARC by calling it BARC? *shaking head in shame*
Re:Welcome to Utopia (Score:2)
Scientists are sometimes co-operative, sometimes bitterly competitive. Sometimes they share their data, sometimes they guard it jealously. Sometimes they go to great lengths to sneak a look at each other's data.
For an exampe, see The Double Helix by James Watson, where Watson and Crick win a Nobel prize, partly by gaining access to Rosalind Franklin's X-ray pictures of DNA.
Re:Welcome to Utopia (Score:2)
Finally something I can work with, at least more informative than just accusing someone as being naive without doing the littlest thing to fix the problem. I had forgotten about this.
Re:Is the creator a researcher? (Score:2)
In business, and in science, the person who has the first crack at the data has the best crack at making a buck/making a discovery.
Those cranky old bastards at universities weren't opposed to getting us data. They were just opposed to us getting the data in time to make money from putting it into a database.
So, yet another situation where theory is different than practice.
I still think that the person putting together this database is going to hit political resistance just as soon as his economy threatens the ability of astronomers to bring in grant dollars for their universities.
Re:Is the creator a researcher? (Score:1)
I'm not saying this like "I think this is so", I'm telling you how it is. I used to be an astronomer and my old thesis supervisor occasionally tries to interest me in working on one of these virtual observatory projects.
Re:Slsahdot (Score:1)
Model Universe (Score:1)