Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Google Businesses The Internet Science

Google Begat the End of the Scientific Method? 387

TheSauce writes "In a fairly concise one-pager from Chris Anderson, at Wired, the editor posits that all of our current (or now previous) models for collecting data are dead. The content is compelling. It notes that we've entered the Age of the Petabyte — where one can collect immense amounts of data that are paradigm agnostic. It goes on to add a comment from the head of Google's R&D, that we need an update to George Box's maxim: 'All models are wrong, and increasingly you can succeed without them.' Have we reached a time where all of our tool-sets are now made moot by vast clouds of information and strictly applied maths?"
This discussion has been archived. No new comments can be posted.

Google Begat the End of the Scientific Method?

Comments Filter:
  • Bullshit (Score:1, Interesting)

    by Anonymous Coward on Wednesday June 25, 2008 @11:53AM (#23935907)

    This might have been true if all of your data was in the same order of magnitude. But consider things like the hyperfine structure. A petabyte is pretty large, but it is nothing compared to the orders of magnitude needed to randomly sample the entire electromagnetic spectrum that would detect hyperfine levels. When things like physics deal with subjects with over 40 orders of magnitude difference, random sampling isn't going to displace intelligent sampling.

  • by peter303 ( 12292 ) on Wednesday June 25, 2008 @11:57AM (#23935973)
    There are still several computing problems from earlier, smaller eras that havent been solved by the "more" paradigm. One example is realistic synthetic voice. The bandwidth is megabytes, achieved by mp3 players some years ago. However voice is the last part of the "real world" we have to capture instead of synthesize to implement computer-generated feature movies or video games. This keeps the need for having some "flesh" actors around, at least for a few more years :-)

    Then there was Slashdot's retrospective of Artificial Intelligence a few days ago. Many of the interesting advances where made in the kilobyte and megabyte eras. It seems the gigabyte and terabyte eras have barely made a dent in progress.
  • Re:Ahem (Score:5, Interesting)

    by eln ( 21727 ) on Wednesday June 25, 2008 @12:00PM (#23936037)

    It's simple really: The article seems to be saying that we have access to such a ludicrously large amount of data that trying to draw any real meaning from it is pointless. So, we employ a "shotgun" approach at reading the data, and voila, we get data that at least appears to be interesting.

    Of course, since we have no particular purpose in mind when we do this, and no particular method other than "random", we end up with mostly useless data (in the example given, we have a bunch of random gene sequences that must belong to previously unknown species, but we know nothing about those species other than that we found some random DNA that probably belongs to them, and have no particularly good way of finding out more).

    The article seems to be saying that since we have so much data, we can now draw correlations between different pieces of data and call it science. No reason is given why this is useful other than that we have so much of it, and Google is somehow involved. Apparently when you have enough data, "correlation does not equal causation" is no longer true. Again, no coherent reason is given for this stance.

    I think the article makes the same mistake a lot of ill-informed people that get excited by big numbers make: It seems to believe that data is in and of itself an end goal, when really vast amounts of data are useless unless it can help us as humans answer questions that we want answered. Yes, knowing that there are lots of species of organisms in the air that we didn't know about before is sort of interesting I guess, but it doesn't really tell us anything useful.

    Above all, the article proves that you can be almost entirely incoherent and still get your article published in Wired if it says something about how Google is changing the world.

  • by ColdWetDog ( 752185 ) * on Wednesday June 25, 2008 @12:08PM (#23936163) Homepage

    The article is utter nonsense. But it's such a rambling mess it's hard to know where to start picking it apart.

    I suppose you could start where he, again, tries to present the argument that correlation really is "good enough" - causation be damned. What he is blattering on about is that you can infer lots of things via statistical analysis - even complex things. That's certainly true. Where he fails (and it's an EPIC fail) is his assertion that this method is a general phenomena, suitable for every day use.

    The other major failure of TFA is that I can't find a car analogy anywhere.

  • Re:Ahem (Score:5, Interesting)

    by nine-times ( 778537 ) <nine.times@gmail.com> on Wednesday June 25, 2008 @12:19PM (#23936355) Homepage

    Yeah, I don't know what "paradigm agnostic" means specifically, but I think it's a mistake to think that "data is data".

    Not all data is created equally. You have to ask how it was collected, according to what rules, and with what purpose. I can collect all sorts of data by stupid means, and have it be unsuitable for proving anything. It's even possible that I could collect a bunch of data in an appropriate way, accounting for the variables which matter for my particular experiment, and have that data be inappropriate for other uses.

    Of course, if what's intended by "paradigm agnostic" is that we no longer pay attention to those things, then I hope we're not becoming paradigm agnostic. I'm just bringing this up because I think some people think numbers don't lie, and that when you analyze data, either your conclusions will be infallible or your analysis is flawed. On the contrary, data can not only be bad, but it can be inappropriate.

  • Re:Ahem (Score:4, Interesting)

    by MightyMartian ( 840721 ) on Wednesday June 25, 2008 @12:20PM (#23936361) Journal

    It's an idiotic notion. We've had vast amounts of data for well over a century now, more than we can hope to fully measure and catalog in a life time. Everything from fossils to space probe readings to seismic measurements fill up data archives, in some cases literally warehouses full of data tapes, artifacts and paper. The way you deal with this sort of thing never changes. Providing the data is stored in a reasonable fashion, if you have a theory, you can go back and look at the old measurements, artifacts, bones, whatever and test your theory against the data. The only difference is that rather than going out and making the observations yourself, your using someone else's (or some computer that just transmitted its data).

  • Re:Say what now? (Score:1, Interesting)

    by Anonymous Coward on Wednesday June 25, 2008 @12:25PM (#23936483)

    For a long time we've known that causality is a broken paradigm. Correlation is all there really is. Your "causal" laws of physics are just an expression of very very high correlation. People like to talk about "mechanisms" but the mechanism is defined in terms of other imponderables (such as "forces", whatever they may be). It's all just to make things look like how we want them to look. Causation is make-believe. Useful make-believe, but it doesn't generalize, while correlation also extends down into complex systems where "cause and effect" are impossible to observe.

    As an example, we know perfectly well that if you smoke you are *a lot* more likely to get lung cancer than if you don't smoke. But there is no evidence whatsoever that smoking causes lung cancer. The problem is not that we can't prove that smoking causes lung cancer, but that our concept of causation does not apply to systems as complex as the human body.

    So in the words of the original article, the Scientific Method in that sense has been dead for at least 100 years.

  • Foundation Series (Score:3, Interesting)

    by __aagmrb7289 ( 652113 ) on Wednesday June 25, 2008 @12:27PM (#23936527) Journal
    Just finished rereading the Foundation series for the one millionth time. Anyone remember some of the signs of the decay of the first empire? The idea that these "scientists" were no longer experimenting, no longer looking for new ways to do things - just spending their time looking at old books and old experiments and trying to squeeze a "new" thought or two out of them? That a sociologist would study a society through books written about it? An archeologist would explore the ruins of a world by reading descriptions written by someone centuries before?

    Anyway catching the parallels here? The "search engine" is a great tool for gathering existing data - but our current tools help us:

    1. Analyze that data
    2. Gather more data

    Can you honestly say that those aren't important anymore? The summary seems pretty crazy to me.
  • by maxume ( 22995 ) on Wednesday June 25, 2008 @12:27PM (#23936531)

    There are third party apps to add similar functionality to XP. Launchy is the one I use:

    http://www.launchy.net/#download [launchy.net]

    I think they are all clones of some Mac app though.

  • Re:Ahem (Score:5, Interesting)

    by Randle_Revar ( 229304 ) * <kelly.clowers@gmail.com> on Wednesday June 25, 2008 @01:14PM (#23937273) Homepage Journal

    Just undoing a slip of the mouse moderation.
    That's one disadvantage of the current mod system - no chance to fix mistakes

  • by javilon ( 99157 ) on Wednesday June 25, 2008 @01:14PM (#23937277) Homepage

    Well I think the point they make is that with this kind of mathematical tools running against this huge sets of data, you get models out that you couldn't have thought of about. This is real AI. During the last days we had entries here on Slashdot about how AI is not advancing, but this kind of thing is very advanced AI and it is new.

    I'll explain myself. The biggest job that a brain does (lets not consider a human brain so we don't get into the consciousness/mind type of conversation) is to find statistical correlations from the input data and extracting models from this correlations that can be used to predict the future. This is exactly what this tools are doing.

    Before this tools, by looking at the data you would go: mmmm, this is interesting, lets check it out. That is, you would come up with a model and try to find out if it predicts the data. Then we started to use computers to check our models, and from what this WTFey article says, it is the computer the one coming out with the model now, starting from raw data.

  • by LrdDimwit ( 1133419 ) on Wednesday June 25, 2008 @01:23PM (#23937393)
    You can't get good data unless you control the makeup of your data population. Even if you applied this technique to all the data in the cloud, it wouldn't mean the "end of the scientific method", it would be scientifically studying the cloud.

    So no. Even if everything he wrote is all true, you still apply science to study things, just in a different way. The internet doesn't make science obsolete any more than it made economics obsolete, and saying otherwise is as much hubris now as it was then.
  • by fictionpuss ( 1136565 ) on Wednesday June 25, 2008 @01:39PM (#23937697)

    Learning how to look for correlation in huge uncontrolled data sets will require a new paradigm... or it will ultimately be useless and even perhaps, unsuccessful.
    The ability to find statistically significant correlation (i.e. not Mary-in-Toast) in huge datasets is a prerequisite condition.

    But that goes for any visualisation technique - look to Edward Tufte or Stephen Few for detailed examples of how even the simple xy-graph can be abused.

  • Re:Ahem (Score:3, Interesting)

    by sm62704 ( 957197 ) on Wednesday June 25, 2008 @01:42PM (#23937747) Journal

    all of our current (or now previous) models for collecting data are dead.

    I guess I have to R this FA. ALL the models for data collection? No more controlled double-blind studies?

    It notes that we've entered the Age of the Petabyte -- where one can collect intense amounts of data that is paradigm agnostic.

    Science has always at least tried to be paradigm agnostic. It can't always succeed of course, but I don't see how... Ok, I guess I'd better RTFA.

    OK, I'm back. The article is horseshit. It is a whole bunch of words that add up to essentially what the summary said, only in a really long winded fashion.

    "No theory needed, now we have models". How do you make the model without theory?

    Have we reached a time where all of our tool-sets are now made moot by vast clouds of information and strictly applied maths?"

    No. In the first place, no data that comes from the internet can be taken at face value (and this Wired article is a good example of how the internet is full of crappy data). Secondly, I hate inaccurate yuppiespeak, like talking about "couuds of information." It's stupid. Information doesn't gather in clouds, it's gathered in big heaps of paper and on hard drives and optical disks. The only clouds are the clouds of crack smoke surrounding the heads of the people who say things like "clouds of information".

    We still use the same tools to analyse data. We just have more data to analyse. Th escientific method itself is nowhere near dead.

    Oh, and the parent is not offtopic - It hit the nail on the head. I guess a Wired editor had mod points.

  • Re:Say what now? (Score:3, Interesting)

    by DamnStupidElf ( 649844 ) <Fingolfin@linuxmail.org> on Wednesday June 25, 2008 @01:49PM (#23937833)
    There are two reasons you're wrong. One is entropy, and the other is one way functions.

    Entropy forces causality to appear in physical systems. A boiled egg is highly correlated with a heated raw egg, but I challenge you to explain away the causation from one state to the other.

    One way functions are quite similar, and probably a result of the same physical properties of matter. When a key is used to encrypt data, there is a high correlation between the original data, the key, and the encrypted data, but causation clearly flows from encrypting data with the key to the encrypted data state, and not from the encrypted state to a derived key and the original data. It's just a limitation of human (and our machines) abilities, but it nevertheless presents very strong evidence for the practical existence of causation.
  • by Hoplite3 ( 671379 ) on Wednesday June 25, 2008 @02:03PM (#23938043)

    I must admit, as an applied mathematician who makes models of physical things for a living, this sort of research threatens to steal my bread and butter. It may be self-centered, but I think modeling is, beside experiment, half of science.

    Simplified models are so valuable to our understanding because they tell us what information we can remove, which parts of a problem are important and which parts may be ignored. They allow us to not just make predictions, but they guide future experimentalists as to what sorts of changes will impact the system and which won't.

    To be fair, it's more of a cycle: experiments generate data, models are constructed to explain the data. These models make predictions (and hopefully useful simplifications) that can be checked by further experiments to validate them. At the end of the process, we've produced a clearer picture of how a system works. Enough information maybe for someone building something slightly different to not have to test the aspects covered by the model.

    I view these data-mining techniques like the scientific computing techniques of the last 30 years or so, only the inverse. Sci Comp nerds wanted to do away with experiments. They thought they could numerically simulate (relatively) exact models (like Navier-Stokes for fluid motion rather than one of its more tractable, understandable simplifications) and use the generated data instead of experimental data. The trouble was that no one will believe that the crazy new phenomenon discovered by your program is real until they see it in the lab, until they construct a simplified model that has the same behavior -- i.e. the same science as before.

    The new data-mining idea is the same, but for the modeling end of things. "No models, please," they say. They'll just data-mine the experimental results and "discover" whatever the model missed. Except people will want to do experiments to verify the discovery. They'll want to build models so they can know they're doing the right experiments, and so on.

    At the end, I think Sci Comp and data-mining are fantastic new tools that have a lot to offer science, but I don't think either eliminates the need for old fashioned modeling.

  • Comment removed (Score:3, Interesting)

    by account_deleted ( 4530225 ) on Wednesday June 25, 2008 @06:20PM (#23941865)
    Comment removed based on user account deletion

Lots of folks confuse bad management with destiny. -- Frank Hubbard

Working...