Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
Math Science

Big Talk About Small Samples 246

Bennett Haselton writes: My last article garnered some objections from readers saying that the sample sizes were too small to draw meaningful conclusions. (36 out of 47 survey-takers, or 77%, said that a picture of a black woman breast-feeding was inappropriate; while in a different group, 38 out of 54 survey-takers, or 70%, said that a picture of a white woman breast-feeding was inappropriate in the same context.) My conclusion was that, even on the basis of a relatively small sample, the evidence was strongly against a "huge" gap in the rates at which the surveyed population would consider the two pictures to be inappropriate. I stand by that, but it's worth presenting the math to support that conclusion, because I think the surveys are valuable tools when you understand what you can and cannot demonstrate with a small sample. (Basically, a small sample can present only weak evidence as to what the population average is, but you can confidently demonstrate what it is not.) Keep reading to see what Bennett has to say.

The smallest sample I've ever used to make an argument was when I submitted some legal briefs, each no longer than five pages, in the anti-spam cases that I'd been filing in Washington State small claims court. Since I suspected the judges were not taking the cases seriously, I filed the briefs with the third and fourth pages stuck together in the center, by a tiny thread of paper joining the back of the third page to the front of the fourth page. (If someone were to turn the pages and actually readthe brief, the thread would break.) I did something similar in six different cases, and when the motions were all rejected, I went to the courthouse to look at the paper motions still in the file. In three out of six cases, the judge had rejected the motion without reading it first.

Now, the point was not to make any accurate estimation of the actual proportion, in the total population of small claims court judges, who would reject a brief in an anti-spam case without reading it. There's no basis for saying that the proportion of such judges is close to 50%. But we can still probably reject any contention that the proportion of such judges is very low. If only 10% of judges were rejecting motions without reading them, then there is only about a 1.4% chance of taking a random sample of six rejected motions and finding that in three or more cases, the judge did not read the motion. Even if 20% of judges were doing so, for an event with a probability of p=0.20 you would still only see it occur in three out of six cases, about 8.2% of the time. (If an event has probability p, the exact probability of that event occuring three or more times in six trials is given by 20*(p^3)*((1-p)^3) + 15*(p^4)*((1-p)^2) + 6*(p^5)*((1-p)^1) + 1*(p^6)*((1-p)^0).) So we can say that the proportion of such judges is quite probably more than 20%. I did this repeatedly because even after I had "caught" the first judge, I wanted to head off any objection that this was just an isolated case of rare behavior.

And, as always, it's important not to generalize too much about the behavior whose probability we're estimating. I don't think that 20% or more of judges, even in small claims court, are throwing most types of cases without reading or listening to the arguments. My impression was that most judges see view small claims court as a place to redress injustices, and that they see anti-spam and anti-telemarketer plaintiffs as just trying to "make money" at it, so they take those suits less seriously. I disagreed with this stance because (1) anti-spam plaintiffs usually really have been harmed and are not just "whining about one email" which they are trying to "cash in" (I still get so much spam that it interferes somewhat with the operation of my server and with my ability to get through my daily email); and (2) the law is intended after all as a deterrent, with disproportionate damages in order to discourage spammers from spamming in the first place. However, the charitable reading of the results is to assume that judges are merely biased against anti-spam plaintiffs -- but at least they probably don't treat all cases as casually as they treat anti-spam suits!

Back to the issue of small samples. My previous article was prompted by an editorial about the online response that had been elicited by two different photos -- one showing a black woman breastfeeding, and a nearly identical photo showing a white woman breastfeeding. The author asserted that the photos had received vastly different responses, which she attributed to racism. I presented a survey to a sample if users recruited from Amazon's Mechanical Turk, randomly showed each survey-taker one of the two photos, and asked:

Our academic department has asked everyone to submit a "fun" photo of themselves, so that our photos can be displayed together on the department home page. One of our employees submitted a photo that has caused some internal debate about whether the photo is inappropriate. I wanted to do a poll to get the opinion of a random sample of Internet users of different backgrounds.

Do you think this is an appropriate picture to be used in a photo collection on our academic department home page?

Out of 47 respondents who saw the black woman's photo, 36 of them (77%) said it was inappropriate. Out of 54 respondents who saw the white woman's photo, 38 of them (70%) said it was inappropriate.

As before, these samples are to small to say precisely what the relevant proportions in the background populations are, but we can probably reject certain statements about the populations -- for example, that the percentage of users offended by the black woman's photo is 20 percentage points higher than the percentage of users offended by the white woman's photo. This is where the counterintuitive part comes in. Suppose that in the background population, 81% of respondents would find the black woman's photo offensive, but only 61% would be offended by the white woman's photo. What are the odds of getting 77% or less "yes that's offensive" responses from a sample of 47 users shown the black woman's photo, and getting 70% or more "yes that's offensive" responses from a sample of 54 users shown the white woman's photo? It doesn't sound unlikely at all, because the percentages are quite close to the originals -- but you can verify, either with statistical calculations or with a quickly written computer program, that the odds are only about 2.5%.

Two main factors contribute to this counterintuitive result. First, even with a sample size of a few dozen, the frequency of an event starts to tend very closely to the frequency in the background population (if 80% of your population has some trait, and you take a sample of size 50, there's about a 95% chance that the number with that trait in your population will be between 34 and 46). Second, to find the odds of seeing both of these deviations at the same time (deviating from an assumed 81% in the background population down to 77% in the first sample, and deviating from an assumed 61% in the background population up to 70% in the second sample), you have to mutiply the probabilities of these two unlikely events. The probability of the first deviation is about 19%, the probability of the second is about 13%, and so the probability of them both occurring is about 2.5%.

The reason I calculated the odds of getting 77% or less "offended" responses for the black woman's photo while also getting 70% or more "offended" responses for the white woman's photo, is that in calculating the "unlikeliness" of a statistical result, it's customary to calculate the odds of getting "this result or a more extreme one". For example, suppose you want to know if a company's hiring process is gender-balanced (assuming a 50/50 gender split in the population), and you notice that in a random sample of 100 recent hires, 61 were men. You wouldn't ask "What are the odds of there being exactly 61 men in this sample?", because the odds of getting any particular number, are small. You'd ask, "What are the odds of getting this result or a more extreme one -- i.e. the odds of getting 61 or more men out of a random sample of 100, if the population were truly gender-balanced? As this calculation tool shows, the odds are only about 1.7%.

Similarly, in the case of the two populations being measured, the author of the original editorial hypothesized that there was some significant gap between the percentages of the population that were offended by the two photos, which I arbitrarily assumed to be 20 percentage points. Under that assumption, showing the two pictures to two different groups and having them be offended at similar rates, is the unexpected, "extreme" result, and the closer the rates are to each other, the more extreme the result is. That's why I calculated "77% of less" for the first group vs. "70% or more" for the second group.

And out of the pairs of numbers that I tested which were separated by 20 percentage points, 81% and 61% were the numbers which made the given result the least unlikely. 80/60 and 79/59 give odds of about 2.5% and 2.4%; 82/62 and 83/63 give odds of 2.4% and 2.2%.

You can do the statistical calculations directly, but in case you won't believe it unless you see the results unfold with your own eyes, you can run this perl script, which iterates through a million trials of the experiment, counting the number of times that the unexpected result occurs.

Why did I assume a 20-point gap? That was the most subjective leap that I made. Looking through the original editorial, I figured that on the basis of inflammatory statements like

"Only one woman was called 'adorable' by the media and portrayed with girlish innocence, and it wasn't the black one. It never is."

and

"The contrast in headlines is so stark, it deserves to be examined" [I assume here she meant the contrast in responses]

the author meant to imply a difference in people's attitudes that was at least that large. But the results suggest that it isn't.

For all of this effort, of course, I could have just expanded the original experiment to a sample of several hundred and mollified some people's concerns. But I wanted to argue for what you can show, even with small samples, because I would like to try (and would like others to try) similar experiments in the future, and do not think people should be discouraged if they can't afford to pay a thousand Amazon Mechanical Turk workers to take their survey. I paid my 100 respondents $0.25 each; naturally, one experiment I'd like to do soon is to figure out what's the lowest I can get away with paying them.

This discussion has been archived. No new comments can be posted.

Big Talk About Small Samples

Comments Filter:
  • by Anonymous Coward on Monday November 17, 2014 @11:44AM (#48402747)

    Slashdot is trying to move their user base from news for nerds and geeks to news for normals.

    Seriously, I've noticed the Register getting more active as people move over there.

    We, geeks, view this entire article as a bunch of shenanigans that waste our time. Please stop spitting in my face.

    Give me an article about Intel latest and greatest chipset plans or how AMD screwed the poorch or about how one can modify a blackberry to run android applications. Those things are Useful.
    \
    Infotainment designed to incite does not nor should enter my world, it makes my world more stressful and wastes my time.

    • by i kan reed ( 749298 ) on Monday November 17, 2014 @11:51AM (#48402839) Homepage Journal

      I'm glad you think explaining, mathematically, a statistical forecasting process is for "normals".

      Whereas us "geeks" are only interested in short little blurbs about software pathces, right?

      Now, I absolutely understand everyone who is concerned about a single contributor dominating the submission queue, possibly hurting the richness of available information, but your complaint seems so petty. Actual critical reasoning about previous information that was questioned is the good kind.

      • Considering this is really really basic statistics I'd say anybody we want on Slashdot is already very familiar with these things. If we wish to introduce post quality standards I suggest we give it an IEEE Spectrum approach: low quantity of quality articles with decent journalism about technical subjects. Not to say they never publish bullshit, but in general they seem to get it right. Then again, I think I'll just move over there.

        What we should really do is shut down the psychology and sociology departm
    • Re: (Score:3, Insightful)

      by Anonymous Coward

      Slashdot is trying to move their user base from news for nerds and geeks to news for normals.

      No normal person is going to be the least bit interested in Bennett Haselton's inane ramblings.
      I have no idea why Slashdot is posting this garbage, but attracting "normal" readers certainly is not why.

    • Infotainment designed to incite does not nor should enter my world, it makes my world more stressful and wastes my time.

      Proof that the terrorist have won.

      (asking why that is proof, is proof that the terrorist have won.)

  • by WillAffleckUW ( 858324 ) on Monday November 17, 2014 @11:46AM (#48402769) Homepage Journal

    In a recent poll conducted randomly via the Internet among people who are girl gamers, we found that 99 percent think breastfeeding images are none of your business.

    Equally sound on a statistical basis, and just as randomized, with a t value of 42.

  • by QuietLagoon ( 813062 ) on Monday November 17, 2014 @11:46AM (#48402777)
    Why? . Really, why?

    .
    He already wasted ten minutes of my life with his last episode of keyboard effluent, why should I waste my time with him anymore?

  • by Anonymous Coward on Monday November 17, 2014 @11:47AM (#48402789)

    I guess it's kinda cool that you took over what use to be a major tech-news website and turned it into your personal blog.

    • by Matheus ( 586080 )

      So... he took what used to be Commander Taco's personal blog and turned it into his own personal blog... A slight degradation in the quality of the blogging but I'm not sure what my standard deviation or margin of error is on that slide. :-)

  • by sconeu ( 64226 ) on Monday November 17, 2014 @11:49AM (#48402821) Homepage Journal

    That way the rest of us don't have to hear about his bullshit.

  • Is sounds like Haselton is missing the point, which is why people oppose to see women breastfeeding to begin with.
  • by DickBreath ( 207180 ) on Monday November 17, 2014 @11:51AM (#48402835) Homepage
    If you like to use sample sizes that are too small, then I would like to interest you in another useful technique.

    Correlation is causation.

    For example:

    The tides cause the moon. The correlation proves it.

    Similarly, murder rates are higher in the summer, and ice cream sales are higher in summer months. Therefore ice cream causes murder.

    I hope that was helpful.
    • If tylervigen.com [tylervigen.com] has tought me anything it's that you are 100% correct.

      For example [tylervigen.com], did you know?

      Motorcycle riders killed in collision with stationary object correlates with Corporate Political Action Committees (US)

      Obviously PACs are bad for motorcyclists!

    • now i know how the onion does it... they had an exclusive deal with bennett... they basically transcribed his view of the world.

      voila, humor... unless you're the one that needs to deal with the crazy fucker.

  • tldr (Score:5, Insightful)

    by phantomfive ( 622387 ) on Monday November 17, 2014 @11:52AM (#48402851) Journal
    There's a really good book [amazon.com] that talks about brevity and how to communicate your ideas more concisely with fewer words. I suggest Bennett read it.
    • Re: (Score:3, Insightful)

      by Flavianoep ( 1404029 )
      I second you, but think he needs a grammar book, too. I could spot so many mistakes I couldn't keep reading, and English is not even my first language!
    • Strunk and White: The Elements of Style.
      • Absolutely not. Strunk and White's little book has probably done more to destroy knowledge of actual English grammar than any other book. The authors demonstrate again and again that they are not only completely ignorant of many concepts they are talking about, but they violate their own principles as much as they conform to them. (For a review by an actual expert in grammar, see here [chronicle.com].)
    • There's a really good book [amazon.com] that talks about brevity and how to communicate your ideas more concisely with fewer words. I suggest Bennett read it.

      A book on brevity is almost 300 pages long.

      No doubt brought to you by the author of the Procrastinators Tomb, volumes I - IV.

  • by gstoddart ( 321705 ) on Monday November 17, 2014 @11:54AM (#48402869) Homepage

    I'm sorry, but this is getting absurd.

    If Slashdot is going to be Bennett "aint I smug and pointless" Haselton's personal blog ....

    Give us a STORY EXCLSUION for this clown.

    I do not see value in Bennett and hit shit, and I don't care.

    But apparently at least samzenpus and timothy with post any of the shit this idiot writes.

    Seriously, just fucking make it stop. Nobody here gives a shit about Bennett Haselton. So give us a fucking way to stop reading his crap.

  • by TheRaven64 ( 641858 ) on Monday November 17, 2014 @11:54AM (#48402873) Journal
    It might be different if Bennet were a frequent poster and would be actively engaged in discussions, but he's not. He's just some guy who once heard that brevity is the soul of wit and went off to write ten thousand words explaining what it meant.
  • Not even wrong. (Score:5, Interesting)

    by mbone ( 558574 ) on Monday November 17, 2014 @11:54AM (#48402881)

    Poisson statistics. I have to wonder if Mr. Haselton has ever heard of the term.

    If by some weird alignment of forces I were to become a Judge, and Mr. Haselton presented this to me in a brief, I would try and have him disbarred for abuse of statistical process. I know that the actual legal profession is soft about such abuses, but by God they wouldn't be in my courtroom.

    • I suspect he is trying to justify the terrible voting intention polls that have occurred recently in Brazil. Where they interview 0.00125% of the population to know the voting intentions of the whole country, and that IF there were any interviews.
    • Here here.
      Instead of getting some advice from someone who understand stats, he just vomits out a crappy "justification" as to why his bullshit....erm...is not bullshit.

      C'mon Haselton - start here, it's all free.

      https://en.wikipedia.org/wiki/... [wikipedia.org]

      https://en.wikipedia.org/wiki/... [wikipedia.org]

  • by waspleg ( 316038 )

    I don't read his long articles, generally speaking, but he has been an advocate against censorship and I respect that much.

    No one makes anyone read the articles, and without even checking, I'd guess you can configure /. not to even show them.

    The Haselton hate reminds me of the Jon Katz days, which is kind of amusing ;)

    • by gstoddart ( 321705 ) on Monday November 17, 2014 @12:03PM (#48403001) Homepage

      I'd guess you can configure /. not to even show them

      No, you can't, and that's the problem.

      I can't click something and be done with this clown. Because multiple Slashdot editors post his crap.

      Short of stringing up some editors, or a lot of really loud angry posts, we do not have any easy means to say "do not wish to see this crap".

      Which means you can guarantee every one of this posts will get this kind of response.

      If they would give us a check box to say "do not wish to see any shit from Bennett Haselton", that would be preferable. Instead we're all forced to read his opinion on everything.

      Hey Bennett, what's your opinion of getting kicked in the nuts? Have you done extensive testing to tell us it hurts?

      • How about this - we can block posts by particular editors. Next time one posts a Benshit post, we add them to the list. If enough of them are posting his crap, and enough of us are blocking it, they'll pretty quickly see pageviews go down.

    • by jfengel ( 409917 )

      It may be that it's JUST him. No other contributors get that kind of preferred place, not even people who participate in the community. It's kind of galling to see his name pop up every couple of weeks, and everybody instantly knows that the comments are going to be primarily about just how bad the contribution is, simultaneously wordy and wrong.

      Perhaps if Slashdot spread it around a bit more, it might aggravate less. Instead, it's one of a mere two dozen or so stories posted per day. Few of them will be re

  • by Anonymous Coward

    What is this bizarre Slashdot alternate universe, where uninteresting shitposting becomes the headline article? This troll couldn't even reply to his own stupid post, had to make a new one to explain himself.

  • by DumbSwede ( 521261 ) <slashdotbin@hotmail.com> on Monday November 17, 2014 @11:59AM (#48402961) Homepage Journal

    Make it stop, dear God make it stop!

  • Your sample size is the least of my problems.

    FOR FUCK'S SAKE SLASHDOT, MAKE IT STOP!

  • Know your audience. Armchair social science on Slashdot? /facepalm
  • Boycott Bennett! (Score:5, Insightful)

    by sootman ( 158191 ) on Monday November 17, 2014 @12:06PM (#48403041) Homepage Journal

    Slashdot by now has OBVIOUSLY seen how much we don't like this guy. The fact that they keep posting him means they're just trolling us, or going for pageviews, or both. Or maybe Bennett has some kind of deal with the site, or has something on one of the editors. Whatever. I don't care. From now on, NO ONE post any comments on one of his stories. Not even to say how much you hate his stories. This will be my last comment on one of his stories. Hope this takes!

  • Still no, sorry (Score:5, Insightful)

    by Anonymous Coward on Monday November 17, 2014 @12:07PM (#48403055)

    But we can still probably reject any contention that the proportion of such judges is very low. If only 10% of judges were rejecting motions without reading them, then there is only about a 1.4% chance of taking a random sample of six rejected motions and finding that in three or more cases, the judge did not read the motion.

    But you DIDN'T HAVE A RANDOM SAMPLE. In particular, you had a sample from Washington State small claims court. So you can ONLY draw conclusions about Washington State small claims court. You have no idea what happens in New York, or in England. But that's only one example of how non-random your sample was. The problem is, ANY small sample is going to have non-random attributes, because it's a small sample. You can roll a dice three times and the results will appear highly non-random - no instances at all of some values - you have to roll it a hundred times to get a good distribution and the dice is random. If you start with a non-random dice - like your "sampling only from one court" or your "using Mechanical Turk" - your small sample size gives you results that are simply MEANINGLESS.

    Go and study stats and stop posting this drivel on Slashdot where people might believe it.

  • Confidence levels (Score:5, Insightful)

    by Okian Warrior ( 537106 ) on Monday November 17, 2014 @12:09PM (#48403071) Homepage Journal

    38 out of 54 survey-takers, or 70%

    Bennett, try this experiment.

    Make a program that flips 54 coins and notes the number of heads and the number of tails at each round. Then run this program for one million rounds.

    When you're done, note the number of rounds the random generator saw 38 or more heads and frame this as a proportion; ie - "the random generator reached this level X% of the time".

    Then compare your results with the random generator. If your results are unlikely to come from the random generator, then perhaps you have something.

    Now, " unlikely" is an arbitrary measure with no compelling foundation (it's the wrong measure to determine the significance of a result(*)), but in scientific circles we use a "rule of thumb": results are considered significant when they are less likely than 95% of the random results.

    Even at this level, we expect 1-in-20 studies to be due to random chance, but then follow-on studies should confirm or deny the findings (and 1-in-20x20 of *those* will be due to random chance as well).

    If the results might lead to potentially catastrophic decisions we might use a higher level of significance; for example, 99% confidence when deciding whether a drug is safe. Physics uses an insanely high [physics.org] level of confidence.

    Try that and get back to us - we await your next post with baited breath.

    (*) The correct measure is the number of bits saved by compressing the original data by factoring out the result (glossing over some details).

  • by Anonymous Coward

    Bennett, what the hell are you doing? This is a straightforward difference between proportions test. In R it's simply
    > prop.test(c(36,38),c(47,54))
    which gives a 95% confidence interval for the difference as (-0.13 to 0.25), meaning loosely that we wouldn't be terribly surprised if the white women were thought to be inappropriate at 13 percentage points higher or if the black women were thought to be inappropriate at 25 percentage points higher.

  • by sideslash ( 1865434 ) on Monday November 17, 2014 @12:15PM (#48403121)
    Hassle ton of Slashdot readers, get flamed in the comments. Seems to be a pattern, huh editors?
  • I mean, a picture of a black woman or a white woman breast-feeding her baby wouldn't interest me.

    A picture of a black woman or a white woman simultaneously breast-feeding both George W. Bush and Bill Clinton would interest me. That would be a hoot and a half.

    Quotes from the romp . . .

    "I did NOT suckle on that woman!"

    "Who said anything about breast milk costing $4 a gallon?"

    I've tried to offend both major political parties in the US with this post. I could try to also offend the Libertarians, Green

  • my gripe about the first story wasn't the small sample size. It's the source of the sample. The scope of the question seemed framed in terms of US society. Who knows where you are sampling on mechanical turk?
  • Even a large sample can be bad is the sample doesn't represent the population as a whole. Sample size only goes so far when you have a skewed sample based on the demographics of the population as a whole.
  • by pla ( 258480 ) on Monday November 17, 2014 @12:44PM (#48403453) Journal
    Slashdot is not a your blog. Go away.
  • by luckymutt ( 996573 ) on Monday November 17, 2014 @12:57PM (#48403637)
    You want to see a more meaningful sample size? Look at the number of comments in Bennett's "submissions" that are complaining about this waste of time. Compare that to the number that actually gives a shit.
    It was bad enough that the first sorry the other day had NOTHING whatsoever to do with news for nerds, nor was it well written, nor was it well conducted.
    But /. now needs to post a whiny follow-up piece???

    Few people care about this Miley Cyrus' opinion on things that do matter, and fewer still care about his opinion on all the crap that doesn't matter.
    Breastfeeding pictures? Burning Man parking? Burning Man Ice distribution? How come 5th Ammendment?
    Fuck this clown.
  • ... because every time he posts, it's the great American novel, about the page count of War and Peace, and I read one once.

    ONCE!

    Sure, his stuff is fucked up, but it's become an iconic meme and I love to see it appear.

    Scanning through the comments is a pop-corn and Dr. Pepper moment and humor abounds nd I m greatly amused.

    To Bennett Haselton: I want you to have my babies and stuff.

    • by swb ( 14022 )

      What surprises me is that he keeps coming back for more and Slashdots editors keep letting him.

      I can't help but think he'd be an interesting guy to meet at least once. I'd rather talk to someone with ideas and opinions I don't like that someone without any of either.

  • Dude, it isn't that you had a small sample size, it's that you're extrapolating from independent contexts. There's a large difference between what people will say on posts IN facebook among friends (i.e. a semblance of privacy) versus what some warm body clicks for a quarter on mechanical turk (no semblance of privacy and perhaps an intent to please the questioner and adjust to the bias of the question).

    Your "fun picture" at a "department party" or whatever (I RTFirstA but not this justification drivel)
  • For lowering the editorial quality of the /. website.
  • Even though this course has "public health" in the title, it is really quite generic. The methods used and very(!) well explained by the very likable John McGready (Johns Hopkins University) are exactly the same as what is relevant to understand for what is being discussed here.

    Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

    A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

    https://www.coursera.org/cours [coursera.org]

  • I'm not sure why /. is posting the ramblings of a non-researcher, non-statistician as though he knew what he was talking about.

    Has it gotten to this? Really?

    Bennett took 1,800+ words to describe what a normal research would take under 100 to say. This is what happens when someone thinks they know what they're talking about, and need to rationalize the heck out of it in order to make sense.

  • I'd like to know what percentage of people at Burning Man are offended by various women breastfeeding, and also how we can optimize the queues so that everyone can see it happen without waiting too long.

  • The sample was a set of legal briefs, but the conclusions were about judges. Small samples may work, but you can't sample population A to make an inference about unrelated population B.

    By analogy, the fact that my ice cream truck only sells half as much ice cream as I expect doesn't tell me that there aren't many kids in the neighborhood. Maybe my prices are crazy. Maybe my only flavor is chocolate-chutney ripple. Maybe the scary clown on the top of my truck frightens children away. From looking at my in

"For a male and female to live continuously together is... biologically speaking, an extremely unnatural condition." -- Robert Briffault

Working...