Why the Cloud Cannot Obscure the Scientific Method 137
aproposofwhat noted Ars Technica's rebuttal to
yesterday's story about "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete." The response is titled "Why the cloud cannot obscure the Scientific Method," and is a good follow up to the discussion.
datasource != process (Score:5, Insightful)
Because a datasource isn't a process?
missing link (Score:4, Insightful)
I like the fact that the web and search/aggregate engines may combine vast amounts of data in ways we now
cannot imagine - it expands the field for new scientific research enormously. Replace science? No.
It's a good rebuttal (Score:5, Insightful)
I'd say that the models are the science. They're how you explain your data. They provide evidence that the experiments make sense, and they guide you by making predictions you can test.
Moreover, SIMPLIFIED MODELS are good science. Understanding which details can be omitted without impacting the predictive ability of your model shows you know which effects are important and which aren't.
All models are wrong, but .... (Score:4, Insightful)
All models are wrong, but some are useful.
We still need scientific methods to develop useful models and understand and refine the existing models. When Newton defined his mechanics that was the state of the art in his era, and now we have progressed to quantum mechanics which might be refined tomorrow.
But mere observation of some phenomena is not sufficient to postulate the behaviour in a changed condition. A scientific model and its rigorous application is required for this. Correlations drawn from the cloud cannot substitute it.
gopla
Marketing is not a Science (Score:5, Insightful)
Mr. Anderson was not prescient in any way, he was just speaking his perspective. The only thing is we must be careful to even consider his proposition as a valid reality worth pursuing. Not for true scientists, but from a social perspective, or it will truly be the end of science. There are some in power as it is already attempting to make this happen.
That said, I almost consider responding to yesterday's article as falling for the argument. But, since it hit the
Duh! (Score:5, Insightful)
I agree, but... (Score:4, Insightful)
What you say is true, Hoplite3. The big issue I see is how people define "model". My guess is that quite a few unfortunately define it as "I got 3 asterisks in the significance test", whether the "model" (say, linear regression) makes sense or not.
I forget where I read it, but I've been studying linear regression, and there was a fascinating example were if they'd have used linear regression techniques on the early "drop the canonball and time it's fall" data, they would have come up with a nice, highly-significant linear regression for gravity.
Then there is the whole issue of explanation versus prediction. Something can be predictive while providing no explanation, and perhaps that's where the petabyte idea is going: who cares about explanation if prediction is accurate enough? (Not my philosophy, BTW.)
Re:Don't blame the author's incompetence (Score:4, Insightful)
Truly what yesterday's article was saying is that causation or correlation is meaningless if you have a mimic of the real world in the form of a collection of data. You don't need a model that is accurate or valid or anything. You just need to run the data in the exact replica of reality. This is the simulacrum. The first problem is that data does not just run itself. At the least it needs an algorithm to be processed to a result. Thats the model, without its just useless data, which has been mentioned already yesterday in comments. But second, the problem with even ATTEMPTING such an idea is that you lead yourself into a situation where you "predict" the future and then operate to become that future thus destroying the creative nature of humanity and become the self-fulling prophecy of machine code!
Keep in mind i speak mostly of social sciences that try to pattern human behavior. For hard sciences, etc., all you have done is created a simulation of reality, but it tells you nothing about the reality. It merely mimics it. There is no insight into creating a map the size of the United States, at best it is a work of art.
knowledge != understanding (Score:4, Insightful)
I have a problem with the google generation, sure, they can parrot facts and find things in an instant, as can any slashdotter I'm sure, but knowing something is not the same thing as understanding something.
I coworker asked me yesterday "how do you call a C++ class member function from C [or java]?" The question is an example of pure ignorance.
If they "understood" computer science, as a profession, this would be a trivial question, like how do I or can I declare a C function in C++. The second question is what google can help you with while having to ask the first question means you are screwed and need to ask someone who understands what you do not. Not understanding what you do for a living is a problem.
How programs get linked, how environments function, virtual machines vs pure binaries, etc. These are important parts of computer science, just as much as algorithms and structures. You have to have a WORKING knowledge of things, i.e. an understanding.
Google's ease of discovery eliminates a lot of the understanding learned from research. Now we can get the information we want, easily, without actually understanding it. IMHO this is a very dangerous thing.
Re:I agree, but... (Score:5, Insightful)
Thank you. Sure, there's a ton of data out there, but how was it collected? What statistical methods were used to analyze the data? How did you select the data set you're analyzing? Nothing I understand about science really applies to data mining a so-called "cloud". Prediction without explanation is just observation. Observation in and of itself is not science. You might have data, but is it the right data?
I see all this petabyte stuff as interesting and even as a valuable adjunct to real science, but a basic requirement of science is reproducibility and you can't reproduce the data collection.
Re:Rise of Engineering over Science? (Score:3, Insightful)
I have a theory that some of the best engineers are scientists, and some of the best scientists are engineers.
Scientists often need to build crazy stuff to figure things out, and engineers often need to figure things out to build crazy stuff. Because they are each result oriented, they don't get hung up on the things that someone in field would.
science-open , clouds-? (Score:3, Insightful)
Science and openness go together.
Without openness, we all are reinventing private wheels, which we destroy the plans to when there is no profit.
If you work in software, consider for a moment how scientific your work is, considering the work of other companies doing similar work.
This Clouds thing is the "billion monkeys/humans typing on keyboards" model.
Yes, it really can work (with humans).
But, as with science, the chaos development model only works with openness.
Of course, organized science along with a little chaotic development work work even better.
There are forces in our society that do not like any open model. The Microsoft's, the MPAA, the RIAA. These type of organization thrive from closed models. More copyright controls, more DRM, longer copyright and patent terms.
These forces would prefer to own,control and close science and clouds of data. They are unaware of the inevitable impact of such actions.
In a free capitalist society, we are naturally driven my contrary forces.
A desire to hide discoveries, to maximize profits, even at the expense of innovation.
A desire to share discoveries, to contribute to society and for credit.
While it is possible to profit when ideas are shared,
It is more difficult to contribute to society by hiding information indefinitely.
Re:Correlation is not causation (Score:3, Insightful)
In science, the phrase usually used is "correlation does not imply a specific causation." It does, of course, imply some correlation and most of modern science is noticing correlations and testing for causation.
Re:All models are wrong, but .... (Score:5, Insightful)
All models are wrong, to some degree. A better way to put it is all models are imprecise, but some are precise enough to be useful. 'Wrong' is a very flexible word and can easily lead to a misunderstanding in this context.
Too much information can be a bad thing too (Score:1, Insightful)
Another point missed here is that background noise can obscure real results. Much of the data cloud is utter garbage. Picking out the useful information is often a complicated and difficult process, in some cases it's easier to just go and do the measurement yourself. I've heard the "a few days in the library can save you weeks at the bench" about as often as the reverse. I think they're both true.
-sk
Re:FYI (Score:1, Insightful)
The worst part would be that we can 'leave out mass' in the E=mc^2 formula because the total amount of mass in the universe is so tiny (it isn't?).
So let's assume mass = 0 for all things (even though that makes no sense at all, she thinks it does). That means E = 0*c^2 = 0 -> there is no energy. Since she claims homeopathy works by changing you energy, AND that people have no energy (because they have no mass (?!) and Einstein's formula E=mc^2 applies), homeopathy cannot work.
Yes, but... (Score:1, Insightful)
Chris' article was nonsense and the Ars article shows very well, at least, that Chis has drawn some inappropriate conclusions regarding "the Cloud" by citing contradictions in the very article that was posted on Wired. However I found another article (link below), written apparently by a Physics Ph.D. student, that goes into a little more depth regarding the nature of Chris' misunderstanding. He raises the question: is what Chris is referring to actually "knowledge"?
http://thatsprettylame.blogspot.com/2008/06/end-of-reason-why-data-deluge-will-not.html
Wired + Ars Technica owned by same company (Score:1, Insightful)
They are both owned by Conde Nast. It's sorta funny seeing them duking it out. I believe Ars Technica has a better team of journalists than current-day Wired. Wired is pretty much run by graphic designers now...
Actually, He seems to support a weak version... (Score:2, Insightful)
Using big words to explain something simple (Score:2, Insightful)
From a junior high school site about the scientific method:
"Six steps of the S. M.
State the problem: Why is that doing that? Or Why is this not working?
Gather information: Research problem and get background info
Form a hypothesis: a possible explanation for the problem using what you know and what you observe.
Test the hypothesis: Make observations, build a model and relate to real-life or experiment.
Experiment: testing the effects of one thing on another using controlled conditions.
Variable: a quantity that can have more than a single value. (Dependent vs independent)
Constant: a factor that does not change when other variables change.
Control: the standard by which the test results can be compared
Analyze data: recording data and organizing it into tables and graphs.
Draw conclusions: based on your analysis of your data, you decide whether or not your hypothesis is supported."
This "cloud" is just a buzz-word for massive amounts of data collected for no good reason other than to collect it, IE before you perform a hypothesis. Using this junior high model, a hypothesis is created from observation (seeing a correlation in the data), then you go back to the data or collect more data to prove or disprove that hypothesis.
Massive amounts of data and algorithms that sift through it are TOOLS in the box for performing the scientific method. They don't replace it.
I think his argument would be better if he stated that these tools, in certain cases, allow you to reasonably prove and create a hypothesis in a single step.
It was an easy job, really. (Score:2, Insightful)