Freeing and Forgetting Data With Science Commons 114
blackbearnh writes "Scientific data can be both hard to get and expensive, even if your tax dollars paid for it. And if you do pay the big bucks to a publisher for access to a scientific paper, there's no assurance that you'll be able to read it, unless you've spent your life learning to decipher them. That's the argument that John Wilbanks makes in a recent interview on O'Reilly Radar, describing the problems that have led to the creation of the Science Commons project, which he heads. According to Wilbanks, scientific data should be easy to access, in common formats that make it easy to exchange, and free for use in research. He also wants to see standard licensing models for scientific patents, rather than the individually negotiated ones now that make research based on an existing patent so financially risky."
Read on for the rest of blackbearnh's thoughts.
"Wilbanks also points of that as the volume of data grows from new projects like the LHC and the new high-resolution cameras that may generate petabytes a day, we'll need to get better at determining what data to keep and what to throw away. We have to figure out how to deal with preservation and federation because our libraries have been able to hold books for hundreds and hundreds and hundreds of years. But persistence on the web is trivial. Right? The assumption is well, if it's meaningful, it'll be in the Google cache or the internet archives. But from a memory perspective, what do we need to keep in science? What matters? Is it the raw data? Is it the processed data? Is it the software used to process the data? Is it the normalized data? Is it the software used to normalize the data? Is it the interpretation of the normalized data? Is it the software we use to interpret the normalization of the data? Is it the operating systems on which all of those ran? What about genome data?'"
Again with the IP (Score:1, Insightful)
Einstein said "If I have seen farther than most it is because I have stood on the shoulders of giants."
Where does that begin to apply in a society of lawyers, profiteers, and billion dollar industries based on exploiting shortsighted IP management?
Comment removed (Score:4, Informative)
Re: (Score:1, Informative)
;Your faith in wikipedia is misplaced; it was both, actually.
Perhaps Sir I.N. was the first, so you do earn the proverbial "first quote"
Re:Again with the IP (Score:4, Insightful)
There is a rumor that Newton meant it as an insult to Hooke. Newton had refined DesCarte's wave theory, while Hooke had backed the corpuscul theory. Also, Hooke was a short man.
Re: (Score:2)
Re: (Score:2)
Now I see how he discovered his law - hanging weights on his feet to try and get taller.
I don't know! (Score:2, Insightful)
Re: (Score:1)
What's most important to keep. (Score:3, Insightful)
What's most important to keep is quite simple and obvious really:
The results. The published papers, etc.
It's an important and distinctive feature of Science that results are reproducible.
Re: (Score:2, Insightful)
How can the results be reproducible if you don't keep the original data?
Re:What's most important to keep. (Score:5, Insightful)
The relevant results are supposed to be included in the paper, as well as the information necessary to reproduce the work. Most data doesn't fall into that category.
To make an analogy the computer geeks here can relate to: All you need to reproduce the output of a program is the source code and parameters. You don't need the executable, the program's debug log, the compilers object files, etc, etc.
The point is you want to reproduce the general result. You don't usually want to reproduce the exact same experiment with the exact same conditions. Supposedly you already know what happens then.
Re:What's most important to keep. (Score:5, Insightful)
Let's say the LHC publishes its analysis, and then throws away the data. What happens when five years later it's discovered that a flawed assumption was used in the analysis? Are we going to build another LHC any time soon, to verify the result?
For a billion-dollar experiment like the LHC, that dataset is the prize. The dataset is the whole reason the LHC was built. Physicists will be combing the data for rare events and odd occurrences, many years down the road.
Re: (Score:1, Offtopic)
Mod up this important position please.
Re: (Score:3, Insightful)
Let's stop right there. There are no general lessons to be had from the LHC. It's an exception, not the rule.
First: 99.9% of scientists are not working at LHC, or any other billion dollar, world-unique facility. They are working in ordinary labs, with ordinary equipment that's identical or similar to equipment in hundreds of other labs around the world.
Second: Primary data, actual measurement results, are already kept, as a rule.
Third: The vast majority of
Re:What's most important to keep. (Score:5, Insightful)
There are two types of science. What you're referring to is called 'Little Science' (not to be derogatory), but it's the type of thing that a small lab can do, with a reasonable amount of funding. And then there's what we call "Big Science" like the LHC, Hubble Space Telecope, Arecibo Observatory, Large Synoptic Space Telescope, etc.
I wish. Well, okay, it might be kept, but the question is by who, and have they put it somewhere that people can analyze it?
I was at the AGU last year, and there was someone from a solar observatory that I wasn't familiar with. As I do work for the Virtual Solar Observatory, I asked them if we could put up a web service to connect their repository to our federated search. They told me there was no repository for the observatory -- the data walks out the door with whoever the observer was.
Then there's the issue of trying to to tell from the published research exactly what the original data was. But then, I've been harping on the need for data citation for years now ... it's an issue that's starting to get noticed.
For the type of data that I deal with, none of it is technically reproducible, because it's observations, not experiments. And that's precisely why it's important to save the data.
In your field, maybe. But we have folks who try to design systems to predict when events are going to happen and need training data. Others do long-term statistical analysis with years or decades of data at a time. Still others find a strange feature that hadn't previously been identified as important (eg, coronal dimmings) and want to go back through all of the data to try to identify other occurrences.
Re: (Score:3, Informative)
Let's stop right there. There are no general lessons to be had from the LHC. It's an exception, not the rule.
First: 99.9% of scientists are not working at LHC, or any other billion dollar, world-unique facility.
They are working in ordinary labs, with ordinary equipment that's identical or similar to equipment in hundreds of other labs around the world.
I admit that I jumped on the LHC as an extreme example. But even in an "ordinary" lab these days, you'll find some specialized and complex equipment. This is true for the cutting edge of any field.
Second: Primary data, actual measurement results, are already kept, as a rule.
As oneiros27 notes, this is not guaranteed, either by design or circumstance.
Third: The vast majority of experiments are never ever reproduced to begin with. You're lucky enough to get cited, really. Most papers don't even get cited apart from by those who wrote them.
Not sure what kind of point you're trying to make here.
Fourth: Very little science is done by re-interpreting existing results. That only applies to the unique cases where the actual experiment can't be reproduced easily.
It's not necessarily a matter of re-interpreting existing results. You may be adding an old dataset to a new dataset, and finding new results in the combined set, or finding a glimmer
Re:What's most important to keep. (Score:5, Interesting)
With a large and expensive dataset that can be mined many ways, yes, it makes sense to keep the raw data. This is actually pretty similar to the raw datasets that various online providers have published over the years for researchers to datamine. (AOL and Netflix come to mind.) Those data sets are large and hard to reproduce, and lend themselves to multiple experiments.
But, there are other experiments where the experiment is larger than the data, and so keeping the raw data isn't quite so important as documenting the technique and conclusions. The Michelson-Morley interferometer experiments (to detect the 'ether'), the Millikan oil-drop experiment (which demonstrated quantized charges)... for both of these the experiment and technique were larger than the data, so the data collected doesn't matter so much.
Thus, there's no simple "one size fits all" answer.
When it comes to these ginormous data sets that were collected in the absence of any particular experiment or as the side effect of some experiment, their continued existence and maintenance is predicated on future parties constructing and executing experiments against the data. This is where your LHC comment fits.
Re: (Score:3, Insightful)
I agree that there is no simple answer, but I am uneasy with your "experiment is larger than the data" concept. Today we think of the Michelson-Morley and Millikan experiments as canonical and definitive investigations in Physics. But we do not often remember that each was preceded by a string of less-successful experiments, and followed by confirmations. It the accumulation of a body of data that leads to the gradual acceptance of a physical concept.
See chart:
http://en.wikipedia.org/wiki/Michelson-Morley_e [wikipedia.org]
Re: (Score:2, Interesting)
5 Insightful?
Seriously, read the OP again.
"What's most important to keep is quite simple and obvious really: The results. The published papers, etc."
He never suggested you throw out the results. No-one is going to throw out the results. Why would anybody throw out the results? Whichever body owns the equipment is bound to keep the results indefinitely, any papers they publish will include the results data (and be kept by the publishers), and copies will end up in all manner of libraries and file servers, du
Re: (Score:3, Insightful)
You seem to be using "results" in a wider sense than "published papers". Yes, nobody is going to throw out papers. But the raw data from instruments? It is not clear whether those will be kept.
You say that the analysis and interpretations can be thrown out, but those portions are precisely what go into published papers. And for small-scale science, it makes little sense to throw away anything at all.
Re: (Score:3, Interesting)
That's right I want an independent "someone else" to do that in order to make my original result more robust. If I were an acedemic I would rely on post-grads to take up that challenge, if they find a discrepency all the better since you now have another question! To continue your software development analogy - you don't want the developer to be the ONLY tester.
Re: (Score:3, Interesting)
Re: (Score:2)
Maybe you haven't noticed, but quantum mechanics seems to indicate there is not always one outcome for one set of conditions. This works on the macro scale, but not necessarily always on the subatomic level.
Re: (Score:2, Informative)
He was involved in the paper Jones et al (1990), which is where the situation begins.
After *17 YEARS* of requests, Jones FINALLY released some of the data used in Jones 1990 through demands under the terms of the U.K. Freedom of Information policy on publicly funded research.
Wang himself is free from FOI requests because Wang is an American and operates in America, where FOI requests regarding pub
Re: (Score:3, Informative)
How can the results be reproducible if you don't keep the original data?
As others noted, there are cases where raw data is king, and others where raw data is virtually useless. LHC raw data will be invaluable. Raw data from genetic sequencing is a waste of time to keep. Why store huge graphics files when the only thing we will ever want from them is the sequence of a few letters? One must be able to distinguish between these two possibilities (and more subtle, less black and white cases, too), and there is no one size fits all solution.
That said, you may be surprised how well r
Re: (Score:1)
What's most important to keep is quite simple and obvious really:
The results. The published papers, etc.
It's an important and distinctive feature of Science that results are reproducible.
At what cost? Would you suggest discarding the data sets of nuclear bomb detonations since they are easily reproduced? How about other data sets that may need to be reinterpreted because of errors in the original processing?
Re: (Score:3, Interesting)
Nobody said results are easily reproduced. But a-bomb tests are hardly representative of the vast majority of scientific results out there.
That's a scenario that only applies when the test is difficult to reproduce, and the results are limited by processing power rather than measureme
not results- grant dollars (Score:2, Insightful)
The results. The published papers, etc. It's an important and distinctive feature of Science that results are reproducible.
Having worked around academic groups that do medical research for three years now, I can tell you that is absolutely not what drives research.
Researchers will love to tell you about how it is the quest for knowledge and other pie-in-the-sky ideals, but when it comes down to it- it's mostly about making a living (or more than a living), and fame/prestige.
See, journals have what's
Re:not results- grant dollars (Score:5, Insightful)
What incentive does a massive industry have to solve cancer, when it would put them out of business? Tens of thousands of people have dedicated most of their adult lives, usually to studying specific mechanisms and biological functions so narrow that if cancer were cured tomorrow, they would be useless- their training and knowledge is so focused, so narrow- they cannot compete with the existing population of researchers in other biomedical fields. Journals which charge big bucks for subscriptions also would be useless. Billions of dollars of materials, equipment, supplies, chemicals- gone. "Centers", hospitals, colleges, universities which each rake in hundreds of millions of dollars in private, government, and non-profit sourced money would be useless.
That's an old argument and although it sounds reasonable it is completely unsound. An industry does not function as a single cohesive entity with wants and desires. It is composed of many different individuals with their own wants and desires.
I know enough academics to say for certain that if any one of those individuals could discover a cure that would put their entire employer out of business then they would leap at the chance. The fame that would follow would make another job easy enough to get, and the recognition is what they're really in it for anyway.
Re: (Score:1)
I'm a cancer researcher and I agree. Though I'm in it more for the good of society and because it is an engaging problem. I would jump at the chance to cure cancer even if it put my institution out of business and I didn't get the recognition. The reality (of this fantasy) is that most institutions and researchers could easily move on to other diseases/problems. We do it all the time.
In addition, there is BIG money to be made from a drug that cures cancer. Even the ones that cure a small percent of cancer c
Re: (Score:3, Informative)
eh (Score:1, Informative)
That's not true. Any tax funded study requires more documentation and publication then a private one. Anyone who reads them knows.
All studies worth anything are aimed at a audience proficient in the subject, they are not meant for general audiences, and are often proven wrong, you need repeatable results.
And the scientists goes mooo! (Score:2)
"And if you do pay the big bucks to a publisher for access to a scientific paper, there's no assurance that you'll be able to read it, unless you've spent your life learning to decipher them. "
I predict the dumbing down of science.
Re: (Score:3, Interesting)
Although likely, not necessarily...
I'd be happy with a Wiki-Style, where the actual article can be as complex (in the know) as desired, but with a glossary of sorts.
There are geniuses of all sorts, someone might be completely lost trying to understand it linguistically, but may find a fault in it instantly visually, or audibly.
However that is somewhat redundant, as the original (as it is now) can be converted into that by people, but a mandate saying it must contain X, Y and Z, will open it up to more peopl
Re: (Score:3, Interesting)
Don't count on that being at all helpful.
Take the math articles on Wikipedia: I can read one about a topic I already understand and have no idea what the hell their talking about in entire sections. It's 100% useless for learning new material in that field, even if it's not far beyond your current level of understanding. Good luck if you start on an article far down a branch o
Re: (Score:3, Insightful)
Why should science be more complex than necessary? For every String Theory area (where complexity is unavoidable) there are plenty of theories like economics, which just rely on weird jargon to fence out the interlopers.
Re: (Score:2)
Or the scientists just stop writing in third person passive, and start writing in a manner people outside of the scientific community are used to. Though I think the summary refers more to trying to extract data you do understand from complicated papers that talk a lot about things you neither understand nor care about.
What? Nobody has ever read... (Score:2)
However, in the case of the non-physical, I guess noone can "waste" or "steal" it, only copy and use.
Re: (Score:1, Funny)
Nope, can't afford the fees.
Re: (Score:2)
I'm quite familiar with it, and I'm not seeing the connection.
Help?
Re: (Score:2)
Re: (Score:2)
I'm not sure what your point is but I don't see libraries turning to dust because nobody cares.
Libraries are closing because nobody cares.
Re: (Score:2)
Re: (Score:2)
They saved Salinas libraries [latimes.com], but look into the story: Since 2002, cuts in library funding have approached $100 million around the country, with more than 2,100 jobs eliminated and 31 libraries closed, according to the American Library Assn.
Re: (Score:1, Offtopic)
OT (?): Sig reply (Score:2)
What about when an AC says something smart?
One format to gouvern them all (Score:1)
On a more serious note, a common ground for data format would be nice. You already have some generic formats, like HDF5 and other, but i must admit right now, it is a bit of a jungle in the astrophysic department, and it is not going to change anytime soon (unless someone make a awesome generic, one-fit-all library
Re: (Score:1)
The format is the least important issue (Score:2)
What is a lot harder is knowing how the data sets were measured and whether it is valid to combine them with data sets measured in other ways.
At least half the Global Warming bun-fight is about the validity of comparison between different data sets and the same goes for pretty much any non-trivial data sets.
What's the goal, really? (Score:5, Insightful)
I'm a working scientist (ok, PhD student), so I read journal articles pretty often. I can understand the rub in principle, but let's say that we come up with some way for all scientific data to be freely shared. So what? In almost all cases, the only people who actually benefit from access to particular data are a small handful of specialists. Could someone explain to me why this is a real problem and not just something that people with too much time on their hands (and who would never actually read, let alone understand, real research results) get worked up about?
It reminds me of the XKCD this morning...
Re: (Score:2, Interesting)
Re: (Score:2, Insightful)
And how would you read them if your institution did not foot the bill for subscriptions?
"In almost all cases, the only people who actually benefit from access to particular data are a small handful of specialists."
When you amalgamate "almost all cases" you end up with "almost all publications". The rest of your post smacks of elitisim, trivializes scientific curiosity and completely ignores the social and scientifi
Re: (Score:2)
To be honest, if your institution does not foot the bill for subscription, try inter-library loans. That's easy. Most credible institutions in the US do have some subscription for more mainstream journals. Unless you're in third world countries.
The problem with scientific publication is that you need to be terse. They're limited to 8-12 pages. If you are required to spend time for background knowledge for the uninitiated, you'll produce a 1000 page book instead. Moreover, the reviewers will think that you s
Re: (Score:2)
I don't think anyone in TFA is seriously suggesting that hand holding noobs be a requirement for publication and this is probably where the confusion sets in. I also understand that you may want to keep your own data close to your chest until you have extracted a paper out of it (ie: publish or perish).
"To be honest, if your institution does not foot the bill for subscription, try inter-library loans...[snip]...The problem with scientific publication is that you need to be ters
Re: (Score:2)
Einstein managed to get away with three elegant pages and zero refrences
Science has evolved much from 1905. Even with his zero references, he's still implicitly citing the results of Lorentz. By today's standard, no citation like that is unacceptable.
Let me ask you this: Can you honestly ask a high school student or a freshman to understand even that paper without grasping the concepts of differential equation (DE)? They can't. Sure, you can understand the motivation and introduction of that paper, just li
Re: (Score:2)
Document procedures have evolved (precicely what TFA is banging on about), the philosophy and methodology of science are pretty much the same, no?
"Let me ask you this: Can you honestly ask a high school student or a freshman to understand
I could but as you say they may have diffuculty understanding. More puzzling is why are you asking me? - I'm 50 and I am talking about myself and other educated laymen (particularly those in the less developed countries), w
Re: (Score:2)
The original post made a point that "In almost all cases, the only people who actually benefit from access to particular data are a small handful of specialists." I completely agree with him. Public mostly has no use to any of such data unless they know how to process the data and all the rationale behind them (which implies that they must know all the underlying scientific process). I agree to that as well. However, you stressed the communication issue to the uninitiated--which I think is misleading. And t
Re: (Score:2)
Re: (Score:2)
To be honest, if your institution does not foot the bill for subscription, try inter-library loans. That's easy. Most credible institutions in the US do have some subscription for more mainstream journals. Unless you're in third world countries.
Anything that complicates the retrieval of knowledge ends up reducing access to that knowledge. Why should someone have to put up with manual process, when we have this things called the internet. The internet is designed to facilitate access of knowledge, so it is t
Re: (Score:2)
Anything that complicates the retrieval of knowledge ends up reducing access to that knowledge. Why should someone have to put up with manual process, when we have this things called the internet. The internet is designed to facilitate access of knowledge, so it is the tool of choice.
Yes, and there are open-access journals already. Guess what? The scientists (i.e. the paper authors) are required to pay much more for the open access. Heck, they're required to pay for non-open journals as well. Don't believe
Re: (Score:2)
So your average high school student can understand almost any science paper, if you just wait for him to get a degree, PhD and ten years postdoctoral experience in the relevant field?
Re:What's the goal, really? (Score:4, Insightful)
Typical comments from someone in the first world.
First, just on the side, I know lots of people who got PhD's but did not really stay in research and academia. They still want to read papers, though, as they still maintain an interest.
But the main benefit of opening up journal papers is for the rest of the world to benefit. Yes, if you have a very narrow perspective, you could just dismiss that as charity. If you're open minded, you'll realize that shutting out most of the world to scientific output means much less science globally, and much less benefits to you as a result.
Imagine if all researchers in Japan published papers only in Japanese, and the journals had a copyright condition that prevented the content from ever being translated to another language, and you'll see what I mean. Whereas current journals require a lot of money for access, these ones also have a price: Just learn Japanese. It's not exactly promoting science.
Then again, of course, journals do need a base amount of money to operate. Just that Elsevier kind of companies charge so much more than is needed to make a profit.
Re: (Score:1)
Re: (Score:2)
Sucks to live in the developing world and be told that if you want to publish your results it's $1000 a paper.
Re: (Score:2)
Find me an open access journal that does not lower the price for people in the developing world.
Re: (Score:2)
You're right that every single modern scientific publication has a very small intended readership, yet the argument for opening up everythin
Re: (Score:3, Interesting)
I'm a poor medical student, but a medical student with--quite frequently--interdisciplinary ideas. I can't tell you the number of times I have been interested in
Re: (Score:2)
Some fields require access to the data more than others. In the case I'm talking about, you should take a look at the MIAME (Minimal Information About a Microarray Experiment) checklist [mged.org] publish
Re:What's the goal, really? (Score:4, Informative)
Trickle-down. Dissemination of knowledge.
You don't know it yet (not meant as a jibe but it is something that clicks in after your PhD) but your primary function as a scientist is not to make discoveries. It is spreading knowledge. Sometimes that dissemination will occur in a narrow pool, through journal papers between specialists in that narrow pool of talent.
This is not the primary goal of science, although it can seem like it when you are slogging away at learning your first specialisation well enough to get your doctorate. Occasionally a wave from that little pool will splash over the side - maybe someone will write a literature review that is read by a specialist in another field. A new idea will be found - after all sometimes we know the result before we know the context that it will be applied to.
The pools get bigger as you move further downstream. Journal articles pass into conference publications, then into workshops. Less detail but carried through a wider audience. Then after a time, when the surface seems to have become still textbooks are written and the knowledge is passed on to another generation. We tend to stick around and help them find the experience to use it as well. This is why all PhD students have an advisor to point out the best swimming areas.
That was the long detailed answer to your question. The simple version is that you don't know who your target audience is yet. And limiting it to people in institutions that pay enormous access fees every year is not science. As a data-point - a lot of European institutes don't bother with IEEE fees. They run to about £50k/year which simply isn't worth it. As a consequence results published in IEEE venues are cited less in Europe. So even amongst the elite access walls have an effect.
Re: (Score:2)
Could someone explain to me why this is a real problem
I'm a physicist who runs a business that amongst other things does data analysis in the life sciences, mostly genomics [predictivepatterns.com]. In this area data collection is relatively expensive (hundreds or thousands of dollars per sample) and disease states are relatively generic--follicular lymphoma is pretty much the same regardless of whether you are in Kansas or Karachi.
I recently invented a new algorithm for combing gene expression data for patterns of expression tha
Re: (Score:2)
I'm a working scientist (ok, PhD student), so I read journal articles pretty often. I can understand the rub in principle, but let's say that we come up with some way for all scientific data to be freely shared. So what? In almost all cases, the only people who actually benefit from access to particular data are a small handful of specialists. Could someone explain to me why this is a real problem and not just something that people with too much time on their hands (and who would never actually read, let alone understand, real research results) get worked up about?
Replace "scientific data" with "satellite imagery".
There's nothing to gain by letting anyone look at it? Only highly trained experts can decipher it?
People have found hidden forests, ancient ruins, and a few meteor impacts. You don't know what's to find in the data until you let people look.
Re: (Score:1)
Actually, the few times I know of that a good data set was put up on the web, it generated a lot of research and progress. I'm thinking of Pat Brown putting up some of the first data on gene expression arrays. Probably hundreds of people worked on that data - everything from statistical methods, to reverse engineering the gene network. It was great. This is probably most valuable when the data is from a new type of experiment that is likely to be widely used.
I hope to do something similar but there is a big
The value of Data (Score:1)
Re: (Score:1)
Re: (Score:1)
Do you HONESTLY think that publications such as Nature and Science have teams of people sifting over supplied data?
Look into what these publications require of the researchers sometime. They do not require the data. They instead require that the data be made available upon reque
Re: (Score:1)
Is storage an issue? (Score:2, Interesting)
Re: (Score:2, Interesting)
Not as staggering as it was five years ago only means it is not as staggering as five years - not that it still isn't staggering. Especially when you consider a petabyte a day means 36.5 exabytes a year.
Re: (Score:1)
Re: (Score:2)
Data storage is something we've gotten very good at and we've made it very cheap. A Petabyte a day is not as staggering as it was even five years ago.
It still has to be paid for. It still has to be actually stored. It still has to be backed up. It still has to be kept in formats that we can actually read. It still has to have knowledge about what it all means maintained. In short, it still has to be curated, kept in an online museum collection if you will. And this all costs, both in money and effort by knowledgeable people.
The problem doesn't stop with copying the data to a disk array.
Hello? Is there a fucking editor in the house ... (Score:1)
I'm sorry, but that makes no sense. 'Points of'???? Come on.
Science is hard - news at 11 (Score:3, Insightful)
I know that this is a real shock to you humanities majors, but science is hard. And yes, for the record, I do have degrees in both [physics and philosophy, or will as of this May — and the physics was by far the harder of the two].
Here's another shocker. If you think the papers are hard to read, you should see the amount of work that went into processing the data until it's ready to be written up in an academic journal. Ol' Tom Edison wasn't joking when he said its "1% inspiration and 99% perspiration." If you think seeing the raw data is going to magically make everything clear, well, I'm sorry, the real world just doesn't work that way. Finally, if you think professional scientists are going to trust random data they downloaded off the web of unknown provenance, well, I'm sorry but that isn't going to happen either. I spend enough time fixing my own problems; I certainly don't have time to waste fixing other peoples' data for them.
-JS
Re: (Score:1, Insightful)
I fully agree.
Furthermore, I've read the entire, long interview and get the feeling this is a person looking for a problem. Yes, taxpayer-funded research should be freely available. Yes, we could all benefit from more freely available data. But he builds up a massive and poorly defined manifesto with very little meat around a few good points.
I'd love to have access to various data sets that I know exist, because others have published their results and described the data collection. But they likely invested
Re: (Score:2)
Bravo.
There ARE big shared datasets, when it makes sense, from trustworthy sources. They tend to cost a lot to assemble, make available, and maintain. I'm starting a post doc at a new lab and they showed me one they're working on: the price tag was $40 million.
We also have a mechanism by which anybody can read scientific papers, for free, if they choose to put in a little effort. They're called libraries.
Yes, the journal publishers probably need to cut their prices now that nobody actually wants the prin
Euclids... (Score:2)
Is a science or religion goal that the universe is made in such way that should be easy to explain it to humans?
Re: (Score:2, Informative)
Cumbersome... (Score:2)
I can't count the number of times I've seen attempts to 'standardize' data, or even just notation, in a given field. It all works very well for data to that point, but then the field expands or changes, or new assumptions become important, and the whole thing becomes either unwieldy or obsolete. This is one reason why every different field, it seems, has their own standards in their literature.
Speaking of the literature, most of these proposals
Well Science excluding Maths and the Hard Sciences (Score:2)
Excluding experimental data, those fields don't really have the problem that this guy is talking about. Perhaps someone should give him/her a lesson in the Scientific Method. Then maybe his/her words would reflect some rigour. Well, that and a link to the ArXive (http://arxiv.org/).
Why is this so? Because, these communities are so small, that just about everyone knows or knows of everyone else('s work). Of course, that's a slight hyperbole. BUT, /just/ a *slight* one.
This sort of project only really a
arXiv!!!!! (Score:1)
By the way, it's quite funny to see all these guys telling somebody how to do his job better, mostly when they have absolutely no idea what they're talking about.
Some nice sentences from the article:
-"It's taken me some time to learn how to read them"... what!!??
-"Because you're trying to present what happened in the lab one day as some fundamental truth", hahaaha, that one is good.
-"So what we need to do is both think about the way that we write those papers, and t
like, ummm (Score:2)
Re: (Score:2)
Yes. Hey, maybe we should all write our scientific papers that way!
Scientific data is niether free nor cheap... (Score:3, Insightful)
The problem is with such raw data, ie from a radio telescope, is you need all of it, you can't really cut any out before it's even processed.
This is a lot less of a issue today with research networks all hooked into multi-gigabit pipes. But there are still very large datasets researchers are attempting to work with that are simply not cheap to handle.
I think this is a great idea, it's nice being able to share it but as far as the really sexy big research going on these days I don't see it being much of a point-click-download service!
Re: (Score:2)
Grant applications in my field typically have at least one line item for "storage." It's not cheap.
Scientific papers are not "compressed" (Score:1)
From the article, regarding scientific literature: "Because you're trying to present what happened in the lab one day as some fundamental truth. And the reality is much more ambiguous. It's much more vague. But this is an artifact of the pre-network world. There was no other way to communicate this kind of knowledge other than to compress it."
A statement like this suggests that the speaker either unfamiliar with the way scientific data is actually turned into papers, or inappropriately optimistic about the
Raw data is necessary for validation... (Score:1)
What matters? Is it the raw data? Is it the processed data? Is it the software used to process the data?
The original data is of paramount importance, software for processing and analysis not so much... Science requires the ability to independently redo experiments and analyze data... getting the same result IS the method of verification that makes the "Scientific Method" valid. Getting the same result using different tools for analysis is even better... Mann's "Hockey Stick" graph is one of the failures o