Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Math It's funny.  Laugh. Media Television

Busting the MythBusters' Yawn Experiment 397

markmcb writes "Most everyone knows and loves the MythBusters, two guys who attempt to set the story straight on things people just take for granted. Well, maybe everyone except Brandon Hansen, who has offered them a taste of their own medicine as he busts the MythBusters' improper use of statistics in their experiment to determine whether yawning is contagious. While the article maintains that the contagion of yawns is still a possibility, Hansen is clearly giving the MythBusters no credit for proving such a claim, 'not with a correlation coefficient of .045835.'"
This discussion has been archived. No new comments can be posted.

Busting the MythBusters' Yawn Experiment

Comments Filter:
  • by EvanED ( 569694 ) <{evaned} {at} {gmail.com}> on Monday April 23, 2007 @09:37PM (#18848573)
    The example I like to use, though apparently they revisited this one (I "can't" afford cable unfortunately), is they were trying to figure out whether the aerodynamic drag of running your car with your windows down was greater than the engine drag of running the A/C.

    But to test this, they used SUVs (if you are concerned about fuel efficiency, are you driving one?) going at about 40 mph (air drag I think increases by the square of the speed at those speeds, so highway speeds could significantly change the results), and, most stupidly, running the A/C cold enough that Jamie was commenting that he was glad that he was wearing a fairly heavy jacket and (IIRC) a scarf!

    Yeah, real useful result that test was.
  • Not quite, OmniNerd (Score:5, Informative)

    by Miang ( 1040408 ) on Monday April 23, 2007 @09:38PM (#18848587) Journal
    TFA's conclusion is correct but their methods are wrong. For these kind of data, correlations aren't the appropriate test; they should have used a chi-square distribution test. Using TFA's assumptions -- total sample size of 50, 4 yawners out of 16 not seeded, 10 yawners out of 34 seeded -- the chi-square value is .10, which pretty strongly misses the critical value of 3.84 for significance. Not that it matters anyway, but it's pretty funny to read an article debunking statistics that employs inappropriate statistics itself...
  • by Ichoran ( 106539 ) on Monday April 23, 2007 @09:49PM (#18848681)
    Not only was MythBusters embarassingly statistics-free, but the "busting" was done using a wholly inappropriate statistical technique. Hansen used a correlation-based test, which assumes that the data follows a Normal distribution (which a bunch of 1s and 0s do not).

    There is a very well-known test, the chi-square test, that deals with exactly this case. (Given the small sample sizes, the Fisher exact test may give better results.) Someone should point Hansen to the Wikipedia page on the topic.

    For example, if there are 16 non-primed people, with 4 yawning and 12 not (for 25%), and there are 34 primed people, with 10 yawning and 24 not (for 29%), the chi square test gives a p value of 0.74.

    The values Hansen supposes are significant 4,12 and 12,24 are not: p = 0.29.

    You have to go all the way to 4,12 and 17,19 (i.e. 47% on a sample of 36) to get significance.

    MythBusters was wrong to conclude that their results were significant, but Hansen was equally wrong to conclude that he had shown that Mythbusters was wrong.
  • by ingo23 ( 848315 ) on Monday April 23, 2007 @09:58PM (#18848733)
    Actually, the article shows only a basic understanding of statistics. Correlation is indeed a measure of a relationship between a cause and effect, but it's only a part of the picture. Yes, a correlation of 0.04 is far from obvious dependency, but that's not the point.

    MythBusters numbers may mean that someone is 20% more likely to yawn if seeded. Now, what's important is to evaluate the margin of error for this statement given the sample size.

    What the article is definitely wrong about is that the sample size does not change anything. The sample size basically reduces the probability of error. The higher the sample size, the more likely that the statement "someone is 20% more likely to yawn if seeded" is true. However, at their sample size, it is not unlikely that the error marging is comparable with that 20% difference, which would invalidate the experiment.

    The detailed calculations for sufficient sample size are left as an excercise for the reader.

  • by gumbi west ( 610122 ) on Monday April 23, 2007 @10:02PM (#18848773) Journal
    You were actually right that it's Fisher's exact test that you want, it's similar to doing a complete permutation test which is exact. Because this is a 2x2 table, there's no reason not to use the exact test. The actual result has a p-value of 1.0 in a two-tailed test (whoops!) and even 4,12 and 17,19 has a p-value of 0.22 in the two-tailed test. In deed, it would have to go all the way to 4,12 and 21,15 to be significant at the 5 percent level for the two-tailed test. The two-tailed test is the right one because you had better believe that they would have made a big stink if it had come out the other way!

    But all this aside, I'm not sure I like the experiment. Why bore people? Why have so many in the room. the 4,12 number is way too high, I'd say the were better off looking at narrow time slices and natural yawns (i.e. do yawns happen at random or do they set off avalanches). Then there is only one group and you're just testing the Poisson process assumption of uncorrelatedness.

  • Re:Science (Score:5, Informative)

    by Excors ( 807434 ) on Monday April 23, 2007 @10:04PM (#18848781)

    Unfortunaely I can't find the name of a program that aired in the UK about 6 months ago. It took a team of 4 people to a deserted island and each week they had a task to complete each, they were only allowed to use what was on the island and what was given to them each week (as well as a tool set because, well no tools = screwed). They had to do things like make fireworks, record a song and various other "minor" things which required them to render down various things to achieve the chemicals they needed to complete each task. What they did and what it resulted in was very clearly labeled, having real science explained behind it.

    Would that be Rough Science [open2.net]? In particular, it sounds like the second series [wikipedia.org]. I've seen a couple of the series over the past few years, and I believe it did a pretty good job of being a science show – the interest comes from watching people who actually know what they're doing, designing and building ingenious solutions (admittedly with very convenient tools and materials available) to problems that aren't inherently interesting (like making toothpaste or measuring the speed of a glacier), rather than relying on 'interesting' problems that are large/dangerous/explosive and lacking focus on the solution process.

  • by Frank Battaglia ( 787673 ) on Monday April 23, 2007 @10:40PM (#18849105)
    YMMV (hah!), but in my last car, the AC was either "on" or "not on," and temperature was controlled by the cooled air being blown over the engine to a varying degree (standard heater). You may have a little dial on the interface that you think is adjusting "how much cold," but the reality and energy consumption may be functionally quite different. In other words, having the AC on full or not does not always have an appreciable effect on gas consumption. Just something to think about.
  • by wesmills ( 18791 ) on Monday April 23, 2007 @11:04PM (#18849321) Homepage
    They have stated both on the show and in other interviews that a lot more testing goes on than just what we see on the show. For the "showcase" experiment on each show (the one that opens and closes the program), the producers have taken to placing video of most or all of the tests on their Discovery website: http://www.discovery.com/mythbusters [discovery.com]
  • Whoa there... (Score:3, Informative)

    by Winawer ( 935589 ) on Monday April 23, 2007 @11:13PM (#18849393)
    Well, a chi-squared test would have worked too, but so would Phi correlation (a correlation between two dichotomous / binary variables), which can be computed exactly the same as ... Pearson correlation, which TFA used. In fact, if you take the chi-square value you worked out to a few more decimal places: 0.10504 (from R), divide by 50 (=N, the sample size), and then take the square root, you get 0.046, which is the phi (and hence Pearson) correlation coefficient for the TFA's data. I can't tell if OmniNerd knew this or if s/he got lucky, but there's nothing wrong with the test employed.

    So, TFA's conclusion was correct, and so - whether intentionally or not - was the method. It's just not nearly as common as chi-squared test for a 2x2 table.
  • by tlhIngan ( 30335 ) <[ten.frow] [ta] [todhsals]> on Tuesday April 24, 2007 @12:27AM (#18850123)

    it always seems to me that their conclusions are specious. I can't think of any specific episodes right now but they over simplify the data, build elaborate setups that are prone to error, and use inadequate controls.


    Or, more likely, test a subset and claim it applies to the entire set.

    E.g., the Cell phones on a plane [wikipedia.org] episode, where they claim a cellphone will not interfere with the avionics of a jet. Unfortunately, this is true for the cellphones they tested, plus the jet they tested. It unfortunately doesn't apply to the general case (sure, they did find 800MHz phones interfered, but modern phones don't...). The IEEE did a more elaborate series of experiments [ieee.org] and found some very surprising results, including loss of GPS satellite lock, to instrumentation drift. Or heck, I've even heard interference of the cellphones while flying (the characteristic buzzing was easily heard over the radio).
  • by Bronster ( 13157 ) <slashdot@brong.net> on Tuesday April 24, 2007 @12:43AM (#18850277) Homepage
    No, it's 2. When in doubt round even.

    The mean of 1.0 and 4.0 on the other hand is 2.5. Assuming you know your '1' and '4' in the above to that level of accuracy. Otherwise especially your 4 could be anywhere from 3.0_1 to 4.9_9 because it might only be accurate to half a digit.
  • by jandrese ( 485 ) <kensama@vt.edu> on Tuesday April 24, 2007 @01:42AM (#18850673) Homepage Journal
    Did you watch the same one I did? The conclusion I got from it was "both the Hydrogen and the Paint had something to do with it." They were debunking the myth that the Hydrogen had little or nothing to do with the fire and it was the paint that did the Hindenburg in.

    That's one thing I've seen a lot online. People watch maybe 80% of the show and then go online and say "they made a gross procedural error!" when in fact they're testing something subtly different or not doing what those people think. From what I've seen they usually set up something that tests their myth, although occasionally they do screw up. They also tend to word their problems such that they can be solved without overly elaborate test procedures.

    An example of a badly botched experiment was the light blub experiment, in particular the part they tacked on at the end about how much wear and tear turning lightbulbs on and off is compared to leaving them on. Since they both failed to create a control group and failed to set up any way to document when each bulb burned out, the test was a complete loss. At least they pretty much admitted it afterward (when they came back 2 months later and only the LED based bulb was still working). The other half of the test, where they compared the energy use of the various forms of lightbulbs was pretty good though, if a little basic.
  • by martin-boundary ( 547041 ) on Tuesday April 24, 2007 @02:20AM (#18850975)
    Who marked this informative?

    The number of significant figures in an answer depends on how the function propagates errors. It's INCORRECT in general to think that if the inputs are given with two significant digits (say), then the output is only good for two significant digits.

    The CORRECT way is to perform error analysis [wikipedia.org] on the function being computed. If the function is linear, then the error magnitude is essentially multiplied by a constant. If that constant is close to 1 (and only then) will the output accuracy be close to the input accuracy.

    In general, a function being computed is nonlinear, and the resulting number of significant digits can be either more or less than for the input. Examples are chaotic systems (high accuracy in input -> low accuracy in output) or stable attractive systems (low accuracy in input -> high accuracy in output).

  • by WaZiX ( 766733 ) on Tuesday April 24, 2007 @03:39AM (#18851485)
    I dunno, the fact that he's willing to state the correlation coefficient so precisely makes me leary of his own statistical expertise.

    The Fact that they use a standard deviation to test an Hypothesis, you know, instead of Hypothesis Testing [wikipedia.org] makes me certain that he doesn't know jack about statistics.

    you do _NOT_ use descriptive statistics to study samples!!!

    I can't believe how wrong this analysis is... What you're supposed to test is that when seeded with a yawn, you're more susceptible then when not seeded, and this is a whole other set of calculations...
  • by TheThiefMaster ( 992038 ) on Tuesday April 24, 2007 @03:41AM (#18851503)
    They DID revisit this one, and they did say that it changes based on speed. For the car they used, at under 50mph the window open was better, and above 50mph AC was better. So as a broad statement, AC is better when you're going fast and having the window open is better when you're going slow.

    Wiki has the details: http://en.wikipedia.org/wiki/MythBusters_(season_3 )#AC_vs._Windows_Down [wikipedia.org]

"Gravitation cannot be held responsible for people falling in love." -- Albert Einstein

Working...