National Virtual Observatory 66
scubacuda writes "According to this Technology Review article, U.S. astronomers (compliments of a $10M grant from the National Science Foundation) are building a National Virtual Observatory to make accessible terabytes of astrononomical data to a web browser. One interesting challenge is how the scientists are going to query so many *different* distributed databases (which they're leaving in their respective places to avoiding clogging network bandwidth)."
virtually (Score:3, Funny)
"Dr. Quinn Medicine woman...Is there anything she can't do?"-Homer
Re:virtually (Score:1)
National Virtual Observatory? (Score:5, Funny)
Z39.50 (Score:1)
Z39.50 is also a light weight protocol and studies [deflink.dk] shows that searching many databases in parallel is not a problem, it is usually the database servers that are the bottle neck.
Walk around, talk to people, smell the flowwers (Score:4, Funny)
Thanks Ray for that endoursment of that (in)famous MS product.
This is offtopic, please mod it down.
Web Browser (Score:5, Funny)
So will the universe be viewable in the next point release or is it several years away.
Is it possible to look at the universe with, say, lynx?
Or if that is not possible with javascript turned off?
Re:Web Browser (Score:4, Funny)
Lynx users have to look out their window at night to see the universe.
Re:Web Browser (Score:2)
Re:Web Browser (Score:5, Informative)
I know this was a joke, but that's actually a topic debated by webmasters at GSFC. In theory, all NASA web pages should be accessible, e.g. all browsers, readers for the blind, etc.
For images, this means descriptive image 'alt' tags. For links, it means including a link description. But what to do for data?
It's kinda subtle. The best answer is 'give data informative tags that can be domain-specific.' "Image 5b" is useless, saying "DI Peg data, X-ray wavelengths, reduced, FITS format" is good but tedious for whomever makes the page, giving a spec like 'ASCA dataset1, DI Peg, FITS, reduced' is something that could likely be automatically generated and fits the bill.
But the issue of folks using non-visual browsers is pretty real. Besides lynx and browsers for the blind, there's also data hunting scripts and programs that need to figure out what is on a page, and so it's a problem worth solving.
"Open Source" Knowledge (Score:3, Interesting)
Re:"Open Source" Knowledge (Score:1, Insightful)
Not likely. Only about 5% of the worlds population have internet access, maybe a tad more.
Re:"Open Source" Knowledge (Score:3, Insightful)
">>The internet could prove to be the single factor which contributes greatest towards equality of educational opportunity for all around the world.
>Not likely. Only about 5% of the worlds population have internet access, maybe a tad more."
Putting that into perspective, how many people need to do serious research on Astronomy in that depth. It is a fairly abstract field that is well entrenched into academia.
Also put that 5% number into perspective for people who need to do serious research into Astronomy; of them, how many have access (at least part time) to the Internet? It's probably up there near 100%
While not a huge educational opportunity for everyone on the planet, we are looking at a serious contribution to the field.
--jaybonci
Re:Universe the Game (Score:3, Funny)
Yes, but it may affect your karma, depending on who you listen to.
The problem of data interfaces and the layman (Score:5, Interesting)
It seems that there is simply going to be a huge amount of data-cross referenced and collated. From the second page of the article, it seems to include pictoral data. I also hear talk of XML being thrown around, which is a good start, but there's a lot that goes into that transition. Are they looking to set the layman bar at "your novice astronomer", "the third grade science report", or "grad student". Where is this information really being targeted at the sub-obscure level.
While I don't want to trivialize their massive IT effort, it seems that a lot of this is going to come down to the end user of the data. Their sample study [caltech.edu] using this information isn't trivial stuff, and does seem to set the aforementioned bar at somewhere in the undergrad-graduate level. Perhaps that is the nature of the data (I'm not that familiar with it). There's an XML schema, some request examples, and other framework stuff already in place to view by potential client writers.
I'm glad to see XML being done the right way (by collaboration with its end users), and those pictures
Anyone closer to the project know of any simplification efforts?
--jaybonci
Re:The problem of data interfaces and the layman (Score:3, Informative)
Re:The problem of data interfaces and the layman (Score:5, Informative)
There are lots of databases that follows this philosophy allready, the NASA Astrophysics Data System [harvard.edu], the Digitized Sky Survey [stsci.edu], not to speak of the larger arxiv.org [arxiv.org]. You can all grab whatever you like from there.
That being said, there are a number of amateur astronomers who are extremely dedicated and are willing to obtain the skill needed to use such a system, even if there is a tough learning curve. These can be considered "laymen", but they are actually very good at what they do. That's the kind of "laymen" you would expect to use it. Not Joe Sixpack, but the people who are dedicated enough to learn how to use it.
Make Voice Controlled Virtual Astronomy Glasses (Score:1)
Re:Make Voice Controlled Virtual Astronomy Glasses (Score:2)
Yeah, there are really nice applications you can develop on the basis on all these data, but someone's gotta do it, and I doubt scientists will do it, there are too many challenging projects to work on. However, extending KStars is a good idea! :-)
Re:The problem of data interfaces and the layman (Score:4, Informative)
All you really need to know about FITS is: it is well specified, there are lots of tools for it, and it has an ASCII (human-readable) header describing the data, followed by specifically formatted binary data.
Also, since most data archives are large, single location repositories (e.g. CHANDRA data), and many data archives are already combined with other sets (e.g. HEASARC.gsfc.nasa.gov), there's a relatively small number of sites providing data (relative to, say, the number of sourceforge projects).
The astronomy community has been providing its data via the web for years now, usually localized by wavelength (e.g. radio archive in 1 place, X-ray data in another). The Virtual Observatory is just a layer on top to simplify access.
And for NASA data, it always goes public 1 year after the observation, so this isn't a new concept, just a better way to get at the data.
Funny you should mention the terraserver (Score:2)
Re:The problem of data interfaces and the layman (Score:3, Insightful)
The first priority of the Virtual Observatory (VO) is making it easier for professional astronomers to combine data from different sources, but we're also committed to involving the amateur astronomy and general public - that will involve special portals and eventually special software tools. I would caution that the whole project is at a very early stage, but I'm optimistic that a few years from now you'll see some nifty tools to let you explore the universe from your web browser (I don't know about support for lynx as one person asked about, personally I prefer wget...). Note that most astronomy analysis software is open source, and most is *only* available for Unix/Linux, so many /. readers will have a leg up on the world if
they really want to do stuff with our data.
But you don't need fancy software to play with
the pretty pictures we make.
There are already a lot of good tools around - someone mentioned Tom McGlynn's Skyview, and he's part of the VO team (perhaps a better word would be Collective, since we are trying to assimilate everyone...) and the VO will provide middleware to make it easier for those public tools to interoperate and get their hands on more data. So it'll be a real help to people writing those kinds of service (Skyview, NED, Aladin, etc.), more directly I think than to most end users at least in the short term.
To address your specific question of format, the current idea seems to be XML descriptive wrappers paired with FITS binary data for most applications. But there are usually GIF/JPEG type preview images around, and the image viewer SAO DS9 [harvard.edu]for FITS data has been ported to PCs and Macs and is pretty easy to use. In the meantime, you may want to check out NED Level 5 [caltech.edu] for an excellent overview site on extragalactic astronomy.
- Jonathan
How much will this data get re-analyzed? (Score:2)
That leaves me wondering: other than satisfying curiosity, will people actually do anything useful with this data? Will this just include "images" or will there actually be a lot of spectrographic data and other measurements? What would they be looking for? What might they find?
Overall, I guess I just don't see yet that this is a useful use of scarce research funds.
Re:How much will this data get re-analyzed? (Score:5, Informative)
1) The original proposal by the PI, e.g. 'looking for cornonal emissions from DI Peg, an Algol-type system'. Sort of the pass/fail of the research world.
2) Survey. Someone decides to do a survey study among existing data, e.g. "Light curves from all Algol-type systems".
3) Unexpected. Someone finds a new thing to look for, sometimes due to better theoretical understanding. "Coronal sources should be iron-enhanced, so let's reanalyze DI Peg, specifically looking for iron lines."
4) Data-mining. Searching an archive for a given property. "Looking for all sources with X-ray emission above a given threshold... hey, DI Peg matched!"
5) Grad students. Doing their thesis on a topic, use archival data to support. "Dissertation on coronal systems, using data from DI Peg and others".
So data is often used beyond its initial acquisition!
Re:How much will this data get re-analyzed? (Score:4, Interesting)
To elaborate on that, at my (old) institute [astro.uio.no] people are discouraged from disembarking on a thesis that requires them to obtain original data, it is too risky.
To get observation time, you would have to write a really good proposal; most major observatories have at least three times as many applications as they have time for. If you're lucky enough to get time, it is maybe half a year into the future, and you're getting three nights to complete everything.
You spend that time preparing everything, just to come down to the observatory, and you're in the fog for three nights! Tough luck, you've spent all that time preparing, and you're now one year behind schedule...
I did three observation runs during my thesis work , two as Observing Astronomer (who is kind of the guy deciding what to look at when and for how long when at the telescope, the PI is the guy who decides what the project is about). My own thesis was purely theoretical, and I was happy about that, because we experienced having a total of ten nights (it is rare to get so many nights, it was a world-wide collaboration), and we got one full night + 3 hours on two other nights worth of observation. It's extremely frustrating to sit there getting nothing because of humidity, I can tell you, and if that had been a part of my thesis, I'd be in deep trouble.
Re:How much will this data get re-analyzed? (Score:2, Interesting)
As another example, people still use the plate archives at Harvard. Many of these plates are over 100 years old. Astronomical data gets reused.
This is reminiscent of (Score:3, Informative)
Seems like $10 million might not be enough (Score:4, Interesting)
All in all, though, it seems like a good use for those tax dollars. The "Google" of astronomy research is an attractive idea, and I know we'll get some great new acronyms in the deal.
Re:Seems like $10 million might not be enough (Score:1, Insightful)
This is a good thing (Score:1)
And, appropriately enough, the text on their page is quite
The Army Understands [slashdot.org]
Re:This is a good thing (Score:1)
Paper is just a tree recycled.
Microsoft involvement? (Score:3, Informative)
http://research.microsoft.com/~Gray/JimGrayHome
Alan.
Solution (Score:2, Informative)
Some of you might disagree. I've run into a scalable piece of software which will interogate all their information sources irregardless of their storage format, index them, and still leave them all in their respective locations.
Autonomy Inc. [autonomy.com] has a product called DRE AXE which is also XML compliant. They have a pretty simple API to work with and have even seen it work on Java, PHP, and Perl. The query engine is extremely fast, and supports laymans terms. The engine supports both Boolean as well as natural language queries. Check them out, i've been administering their products for about 2 to 3 years now.
Ok, Ok, I'm giving them a plug, but hey their product works well.
Cool, but some links... (Score:4, Informative)
Also, I have to mention Celestia [shatters.net], a great Space Simulator, similar to OpenUniverse.
In closing, let me say that I think people should take more of an interest in astronomy, as the understanding and exploration of space is one of the most important goals humans should have if they wish to survive longer 500 million years or so.
So this is a flamebait? (Score:1)
Re:Cool, but some links... (Score:2)
Who cares about 500 million years from now? Leave it to a geek to stare off into the stars and think about a far-off distant future that will never come, while maintaining a complete political apathy or extreme naivete in the present.
Yeah, I want you to just stop and think for a little bit. You've just proven how unimportant your life is, and how, in the chance that humans survive 500 million years from now, your name will not have survived. Your ideals and beliefs and people like you will have, thankfully, perished long ago with countless other tomes of ignorance and self-righteousness. You insult me ("geek") because I happen to be passionate about something, a science, which you don't care about. While you think a century can be bloody, it'll be nothing compared to the global deaths caused by the serious and expected changes to this planet's geology if we aren't prepared to deal with them.
I believe that a future for humanity in 500 million years can and will exist if less people think as you do. I believe that there's a lot we can learn from our little galaxy, and that humans can have a near infinite existence among the stars, living longer and happier lives than anyone here, you least of all, can currently conceive or even deserve. I'm sorry if my optimistic future isn't depressing enough to fit in with your very narrow view of our awe-inspiring universe. If that's the case, I suggest you find another planet to live on, because I don't want you spreading any more mental poison around on this one. Thanks.
Re:Cool, but some links... (Score:2)
Doug
Re:Cool, but some links... (Score:2)
One of the things I immediately noticed was how homing in on Sol and then going to the Earth will make it simple to teach her how the seasons work. The field of view offered here is invaluable for helping young minds grasp such somewhat abstract concepts.
Cheers!
Virtual astronomy (Score:4, Interesting)
P2P as an alternative (Score:5, Insightful)
The P2P idea is interesting in that it could apply to individually collected small data sets. Here's how observational astronomy has traditionally worked:
Astronomer writes a proposal to do some research using a specific telescope(s)
Proposal gets accepted after peer review
Astronomer travels to observatory to spend many of his own nights collecting data
Astronomer takes the time to reduce and analyze his own data
Astronomer writes a paper(s) saying, "Hey - look what I did!"
(Sometimes) astronomer writes a proposal for further funding based on the merits of this work
This procedure is inefficient in that you sometimes get multiple people who are not working together, doing the same project on different telescopes. If I collect a bunch of data in one part of the sky, try to use it but don't actually get around to finishing and publishing a paper, and then archive it locally, nobody in the world knows that the data exists. So now if someone else wants to do the same project, they go to the telescope and recollect the same data. In other words, there's no central log of who's done what when it comes to individual observing.
P2P could be useful to remedy this. The problem is that astronomers tend to be very proprietary about their data. Sometimes research and publishing can be very competitive, and you don't want to give the competition an edge when it could mean that they publish a paper on a particular topic before you and reap the rewards, or get funding when you don't. So I think that most astronomers would share their data openly in a P2P network only after they were completely finished using it, and some would never do so.
The difference with the data sets being accessed by the proposed Virtual Observatory is that the people who create those sets typically get their funding with a stipulation that the data be publically accessible some time after the work is finished. They're not allowed to keep it proprietary even if they'd prefer to do so for competition reasons.
Re:P2P as an alternative (Score:2, Interesting)
No sense (Score:1)
This is just me, but, wouldn't leaving the databases where they are clog network bandwidth, as opposed to say, having them on one local LAN?
some details (Score:4, Informative)
An example of such a VO project is the Galaxy Morphology demo. We take catalogs of a cluster of galaxies from one source, identify those sources with emission form a separate catalog, fetch images of all of those galaxies, and send the images and brightness information to a grid computer service that calculates the morphology of the galaxies, sending this result to the user to visualize in a VO complient piece of software. The user did nothing but pick the cluster and then look at the results. Much more than simply putting data on the web. And once this service is developed, it can simply be put into a web page for others to use and learn from.
Most of this involves creating simple to use yet potentially powerful interfaces to services. While we are not using true RPCs like SOAP yet, the idea is to create standard interfaces to things like image servers, catalog servers, and the like. With those services, we will extend beyond to data and service discovery. Standard data and metadata formats are also being developed, as are common datamodels, all with the intent that these will make data and service exchange simpler. This all leads to service registries, where many applications will go to discover data and services that could be used for a particular project.
Jim Grey is involved with the project. He lead the Terraserver project at Microsoft Research. He found that, as he put it, images of the earth are worth money; those of the stars are not. Because of this, he found the research he was doing on distributed data with the terraserver project was running into snags where making money hindered access to the data. This not to be true for astronomical data. Hence he is now looking up rather than down now. There is in development a version of Terraserver for different parts of the VO in the works.
There will be usage points for people all the way from my mother who loves astronomical wallpaper to the hard core researcher and all points in between. Public outreach is being built in at the ground level, so this is not just for astronomers. Many of these will be web bases interfaces to the VO, but others may be simple toolkits to make your own services. Some could be simple to use to do basic science projects in school, some may be for science fair level projects, and some for people to develop educational web-based lesson plans.
Yes, 10 million dollars seems small. But its a start. And we are not the only ones working on VO technologies. The Europeans have thier own VO, as does Canada, Russia, India... The divisions are mostly political (each funding agency has its own VO title). The IVO has been establised to act as a stearing body to help us share efforts and make things interoperable from the start.
some questions (Score:2)
This looks really interesting and I'm looking forward to playing around with it. I was wondering how it compares with other similar-sounding astronomical survey projects that combine existing data such as the Sloan Digital Sky Survey [sdss.org]. Is it expected to replace the existing ones?
Re:some questions (Score:2)
[TMB]
Re:some questions (Score:2, Informative)
Making it accessible to lay people is important (Score:3, Interesting)
While the main benefits of the virtual observatory will be to researchers, the $10 million is only the start, and more money will be needed, and the way to get more money is to make it popular with voters.
There are two examples of indexing large databases for the masses that come to mind. One is Google, and the other is Amazon.
Google ranks items by how popular they are, based in large part by how many links there are to the web page. Amazon gives you a list of books other customers bought when they bought the book you found in your search.
For astronomical data and images, something like those approaches could be quite entertaining. I could go to a popularity list to see which images and data everyone else was looking at (a million flies can't be wrong...). But then, like the Internet Movie Database, it would be fun to see other images and data that was most often found in the same papers or web pages as this item. Somewhat like the Science Citation Index (or the Kevin Bacon game).
Users could also rate the images and data. Then we could have lists such as "people who liked this nebula also liked these HST photos". Images could be grouped by popular use -- "Images most often used as wallpaper", "Images most often used by science magazines", "Data most often used by newspapers", etc.
Dilbert (Score:3, Funny)
Why web browser? (Score:1, Interesting)
Just because the web exists doesn't mean that it should be used for everything, even if it can, especially since this project isn't going to be accessable to the general public. A small custom cross-platform client application would make much more sense depending on the data being accessed - it would probably allow for more efficient automation of searching and repetitive tasks as well by not having a completely dumb client.
I hope they considered what tasks the end-users will actually be doing with the data and are going to allow them the flexibility to be creative in their manipulation and searches.