Social Science Journal 'Bans' Use of p-values 208
sandbagger writes: Editors of Basic and Applied Social Psychology announced in a February editorial that researchers who submit studies for publication would not be allowed to use common statistical methods, including p-values. While p-values are routinely misused in scientific literature, many researchers who understand its proper role are upset about the ban. Biostatistician Steven Goodman said, "This might be a case in which the cure is worse than the disease. The goal should be the intelligent use of statistics. If the journal is going to take away a tool, however misused, they need to substitute it with something more meaningful."
Mis-use=reviewer don't do their job (Score:5, Insightful)
Re: (Score:3)
Re: (Score:3)
Re: (Score:2)
.
Frankly, I've always been a bit confused by the p value. It just seems more straightforward to provide your 95% confidence interval limits.
Re: (Score:2)
Your 95% confidence interval (roughly*) indicates an interval containing 95% of the probability. The p-value indicates how much probability lies within a cutoff region. What most people do with a 95% CI is look to see if it overlaps the null value (zero, or the mean of the other group, for example). The p-value gives the same information, except quantitatively.
* yes, Bayesians, technically the 95% credible interval, from a Bayesian analysis, contains the area of 95% probability. The confidence interval,
Three puzzles (Score:5, Interesting)
It is the job of the reviewer to check that the statistic was used ion the proper context. not to check the result, but the methodology. It sounds like social journal simply either have bad reviewer or sucks at methodology.
That's a good sentiment, but it won't work in practice. Here's an example:
Suppose a researcher is running rats in a maze. He measures many things, including the direction that first-run rats turn in their first choice.
He rummages around in the data and finds that more rats (by a lot) turn left on their first attempt. It's highly unlikely that this number of rats would turn left on their first choice based on chance (an easy calculation), so this seems like an interesting effect.
He writes his paper and submits for publication: "Rats prefer to turn left", P<0.05, the effect is real, and all is good.
There's no realistic way that a reviewer can spot the flaw in this paper.
Actually, let's pose this as a puzzle to the readers. Can *you* spot the flaw in the methodology? And if so, can you describe it in a way that makes it obvious to other readers?
(Note that this is a flaw in statistical reasoning, not methodology. It's not because of latent scent trails in the maze or anything else about the setup.)
====
Add to this the number of misunderstandings that people have about the statistical process, and it becomes clear that... what?
Where does the 0.05 number come from? It comes from Pearson himself, of course - any textbook will tell you that. If P<0.05, then the results are significant and worthy of publication.
Except that Pearson didn't *say* that - he said something vaguely similar and it was misinterpreted by many people. Can you describe the difference between what he said and what the textbooks claim he said?
====
You have a null hypothesis and some data with a very low probability. Let's say it's P<0.01. This is such a good P-value that we can reject the null hypothesis and accept the alternative explanation.
P<0.01 is the probability of the data, given the (null) hypothesis. Thus we assume that the probability of the hypothesis is low, given the data.
Can you point out the flaw in this reasoning? Can you do it in a way that other readers will immediately see the problem?
There is a further calculation/formula that will fix the flawed reasoning and allow you to make a correct inference. It's very well-known, the formula has a name, and probably everyone reading this has at least heard of the name. Can you describe how to fix the inference in a way that will make it obvious to the reader?
Re: (Score:2)
Here's my take on this issue: Just because something is prone to be misused and misinterpreted doesn't mean it should be banned. In fact, some of the replacement approaches use the very same logic just with a different mathematical calculation process. However, it does illustrate the need for researchers to clearly communicate their results in ways that are less likely to be misused or misinterpreted. This wouldn't exclude the use of p-valu
Re: (Score:2)
As always there is an xkcd comic that answers your question in a nice and easy to understand fashion.
I leave it to you to find the relevant link ;p
Re: (Score:2)
He writes his paper and submits for publication: "Rats prefer to turn left", P 0.05, the effect is real, and all is good.
There's no realistic way that a reviewer can spot the flaw in this paper.
Actually, let's pose this as a puzzle to the readers. Can *you* spot the flaw in the methodology? And if so, can you describe it in a way that makes it obvious to other readers?
I guess I don't see it. While P 0.05 isn't all that compelling, it does seem like prima facie evidence that the rats used in the sample prefer to turn left at that intesection for some reason. There's no hypothesis as to why, and thus way to generalize and no testable prediction of how often rats turn left in a different circumstances, but it's still an interesting measurement.
You have a null hypothesis and some data with a very low probability. Let's say it's P 0.01. This is such a good P-value that we can reject the null hypothesis and accept the alternative explanation. ...
Can you point out the flaw in this reasoning?
You have evidence that the null hypothesis is flawed, but none that the alternative hypothesis is the correct explanation?
The scie
Re: (Score:3)
The answer is simple. He's taken dozens, if not hundreds of measurements. The odds are in favor of one of the measurements turning up a correlation by chance. The odds against this particular measurement being by chance are 19 to 1--but he's selected it out of the group. The chances that one of *any* of his measurements would show such a correlation by chance are quite high, and he's just selected out the one that got that correlation.
Re: (Score:2)
Re: (Score:2)
He writes his paper and submits for publication: "Rats prefer to turn left", P 0.05, the effect is real, and all is good.
There's no realistic way that a reviewer can spot the flaw in this paper.
Actually, let's pose this as a puzzle to the readers. Can *you* spot the flaw in the methodology? And if so, can you describe it in a way that makes it obvious to other readers?
I guess I don't see it. While P 0.05 isn't all that compelling, it does seem like prima facie evidence that the rats used in the sample prefer to turn left at that intesection for some reason. There's no hypothesis as to why, and thus way to generalize and no testable prediction of how often rats turn left in a different circumstances, but it's still an interesting measurement.
Another poster got this correct: with dozens of measurements, the chance that at least one of them will be unusual by chance alone is very high.
A proper study states the hypothesis *before* taking the data specifically to avoid this. If you have an anomaly in the data, you must state the hypothesis and do another study to make certain.
You have a null hypothesis and some data with a very low probability. Let's say it's P 0.01. This is such a good P-value that we can reject the null hypothesis and accept the alternative explanation. ...
Can you point out the flaw in this reasoning?
You have evidence that the null hypothesis is flawed, but none that the alternative hypothesis is the correct explanation?
The scientific method centers on making testable predictions that differ from the null hypothesis, then finding new data to see if the new hypothesis made correct predictions, or was falsified. Statistical methods can only support the new hypothesis once you have new data to evaluate.
The flaw is called fallacy of the reversed conditional" [wikipedia.org].
The researcher has "probability of data, given hypothesis" and assumes this implies "probability of hypothesis, given d
Re: (Score:2)
I assume you're getting at multiple comparisons because you said "he measures many things."
You're right, the researcher should correct his p-value for the multiple comparisons. Unfortunately, alternatives to p-values ALSO give misleading results if not corrected and, in general, are more difficult to correct quantitatively.
Re: (Score:2)
Re: (Score:2)
p-values are inherently bad statistics. You can't fix them with 'good methodology.' Can they be used properly in some situations? Maybe, if the author knows enough statistics to know when or when not to use them. But the people who use p-values are likely not to have that level of knowledge.
p-values are like the PHP of statistics.
> "This might be a case in which the cure is worse than the disease. The goal should be the intelligent use of statistics. If the journal is going to take away a tool, however m
Re: (Score:2)
People put far to much faith in "science" and mostly scientists. We are just people, like everyone else. We have the same failings and just because we never left University, it does not make us special.
Past APA president Kimble turns over in his grave (Score:2, Insightful)
At least one president of the American Psychological Association published a statistics book intelligent enough that it used to be required in university statistics intro classes: http://books.google.com/books/about/How_to_use_and_misuse_statistics.html [google.com]
Not that he would have disagreed with the comment about social psychologists...
Re: (Score:2, Interesting)
used to be required in university statistics intro classes: http://books.google.com/books/about/How_to_use_and_misuse_statistics.html [google.com]
I suspect that book is still foundational in most University advertising/marketing progams.
Re: (Score:2)
I suspect that book is still foundational in most University advertising/marketing progams.
I think historically, a more influential book has been Darrell Huff's "How To Lie With Statistics", the second book in this list [google.com].
It was originally written in 1954. And while less rigorous, it is an entertaining read and probably gets its point across to a much wider audience.
I know for a fact that Huff's book is still used as a text in college statistics courses... but probably only the lower-level classes.
A Bayesian Conspiracy (Score:5, Funny)
It's a war, I tell you, a war on frequentists! I'm 95% certain!
Re: (Score:2)
Interesting questions (Score:2)
Re: (Score:2)
Basic and Applied Social Psychology (Score:2)
"Look at this experimental evidence and tell me what you see?"
My Paper (Score:5, Interesting)
Ok, let me enlighten the readers a bit. The reviewers tend to be the typical researcher within the field. The typical social researcher does not have a very strong math background. There is a lot of them into qualitative research and quantitative tends to stop at ANOVA. I have multiple masters in business and social science and worked on a Ph.D. in social science (Being vague here for a reason). However, I have a dual bachelors in comp sci and math. I know statistical analysis very well. My master's thesis for my MBA was an in-depth analysis of survey responses. 30 pages of body and really good graphs. My research professor, an econometrics professor, and I submitted it to a second tier journal associated with the field I specialized in...
... 6 pages got published. 6?!? They took out the vast majority of the math. Why? "Our readers are really bad at math," said the editor. If you knew the field... you would be scared shitless. The reviewers suggested we took out the math because it confused them. This is why they want P value out... it is misunderstood and abused. The reviewers have NO idea if it is being used correctly.
Re: (Score:2)
I have multiple masters in business and social science and worked on a Ph.D. in social science (Being vague here for a reason).
And what reason is that? You're not even close to identifiable from this information, you know...
Re: (Score:3)
Re: (Score:2)
because that is what I got from that. they don't understand the material they are approving or rejecting and so they serve no useful purpose.
Re: (Score:2)
I think he is saying the field is shit and should be disregarded.
p-values are routinely misused ... (Score:5, Funny)
Plural or singular - pick one (Score:2)
While p-values are routinely misused in scientific literature, many researchers who understand its proper role are upset about the ban.
Do they also know whether "p-values" is plural or singular?
Graphing the data would help a lot of the time (Score:5, Insightful)
I don't think you even need to be pushing people to do Bayesian stats. You just need to force them to graph their data properly. In *a lot* of biological and social science sub-fields it's standard practice to show your raw data only in the form of a table and the results of stats tests only in the form of a table. They aren't used to looking at graphs and raw data. You can hide a lot of terrible stuff that way, like weird outliers. Things would likely improve immediately in these fields if they banned tables and forced researchers to produce box plots (ideally with overlaid jittered raw data), histograms, overlaid 95% confidence intervals corresponding to their stats tests, etc, etc.
Having seen some of these people work, it's clear that many of them never make these plots in the first place. All they do is look at lists of numbers in summary tables. They have no clue in the first place what their data really look like, and know good knowledge of how to properly analyse data and make graphs. Before they even teach stats to undergrads they should be making them learn to plot data and read graphs. It's obvious most of them can't even do that.
Re: (Score:2)
They have no clue in the first place what their data really look like, and know good knowledge of how to properly analyse data and make graphs. Before they even teach stats to undergrads they should be making them learn to plot data and read graphs. It's obvious most of them can't even do that.
That........
Explains why some people struggle horrifically in statistics, and others can sleep through class and still get an A.
Re: (Score:2)
Graphs can lie just as easily as statistics themselves.
Re: (Score:2)
Re: (Score:2)
Perfectly understandable move... (Score:5, Informative)
...and this isn't even the first journal to do this. It's probably happening now because an entire book has just come out walking people how universally abused p-values are as statistical measures.
http://www.statisticsdonewrong... [statisticsdonewrong.com]
The book is nice in that it does give one replacements that are more robust and less likely to be meaningless, although nothing can substitute for having a clue about data dredging etc.
rgb
Creative thinking (Score:2)
p-value research is misleading almost always (Score:5, Interesting)
I studied and tutored experimental design and this use of inferential statistics. I even came up with a formula for 1/5 the calculator keystrokes when learning to calculate the p-value manually. Take the standard deviation and mean for each group, then calculate the standard deviation of these means (how different the groups are) divided by the mean of these standard deviations (how wide the groups of data are) and multiply by the square root of n (sample size for each group). But that's off the point. We had 5 papers in our class for psychology majors (I almost graduated in that instead of engineering) that discussed why controlled experiments (using the p-value) should not be published. In each case my knee-jerk reaction was that they didn't like math or didn't understand math and just wanted to 'suppose' answers. But each article attacked the math abuse, by proficient academics at universities who did this sort of research. I came around too. The math is established for random environments but the scientists control every bit of the environment, not to get better results but to detect thing so tiny that they really don't matter. The math lets them misuse the word 'significant' as though there is a strong connection between cause and effect. Yet every environmental restriction (same living arrangements, same diets, same genetic strain of rats, etc) invalidates the result. It's called intrinsic validity (finding it in the experiment) vs. extrinsic validity (applying in real life). You can also find things that are weaker (by the square root of n) by using larger groups. A study can be set up in a way so as to likely find 'something' tiny and get the research prestige, but another study can be set up with different controls that turn out an opposite result. And none apply to real life like reading the results of an entire population living normal lives. You have to study and think quite a while, as I did (even walking the streets around Berkeley to find books on the subject up to 40 years prior) to see that the words "99 percentage significance level" means not a strong effect but more likely one that is so tiny, maybe a part in a million, that you'd never see it in real life.
Re: (Score:2)
So does this prove that Tom Cruise, John Travolta (Score:2)
and the other crazies were right all along, that psychiatry is not a real science?
Or does it just prove that the general understanding of math and statistics (except among matematicians) are fields that are in free fall, and that a few years from now, college graduates won't even be able to recite the multiplication table up to 10?
Even more obligatory (Score:5, Interesting)
https://xkcd.com/882/ [xkcd.com]
Re: (Score:2)
I hate to admit it, but I don't think I truely started to understand pvalues until reading that comic when it was released. I actually started using it as a discussion point in study groups.
Re: (Score:2)
A useful exercise (if you can use basic statistics software) that illustrates this is to generate a bunch (say, 10 or 20) of series of random numbers and then compute the matrix of correlations (or t-values, if you prefer) between all of them. You'll find that roughly 5% of the correlations are "significant" at the p.05 level, even though the series are really random and independent. It's a trivial result and just what you'd expect by chance, but it does drive the point home that you can't rely on p-values
Re: (Score:2)
It's a trivial result and just what you'd expect by chance, but it does drive the point home that you can't rely on p-values alone if you're testing multiple hypotheses.
On the other hand, TFA is proposing to replace this with Bayesian probabilities, which are likely even less understood, even more abused, and it could open the door to subjectivism.
Re: (Score:2)
Actually, no. TFA article doesn't like Bayesian techniques either. They want to use purely descriptive statistics.
So basically, they're replacing something that a lot of people misinterpret with something else that essentially cannot be interpreted properly due to lack of information.
Re: (Score:2)
I did not mean to suggest it was the only alternative offered. But it was one, and I didn't see enough discussion of its shortcomings for my taste.
Re: (Score:2)
You know that in a lot of statistical testing the null hypothesis is the output of a theory, right? Just because you didn't ever advance beyond the most basic t-test doesn't mean nobody else did.
Re:What's the problem? (Score:5, Funny)
This is social science. Mathematics and statistics aren't even relevant.
Correlation between low intelligence and uninformed statements of this nature is p<0.01.
Re: (Score:2)
You misspelled "English".
Re: (Score:3)
Also "grammar."
Re:What's the problem? (Score:5, Insightful)
This is social science. Mathematics and statistics aren't even relevant.
Yes they are. Get quantitative data, use quantitative methods.
Just because most social 'scientists' are not experts at statistical inference, it doesn't mean it can't be done correctly.
p-values are just a probability of something. Do you experiment well and 'something' makes sense.
Re:What's the problem? (Score:5, Insightful)
I agree with you. Yet no need for the quotes around social 'scientists.' Psychologists, socialists, etc. employ the same experimental designs and mathematical techniques in experiments as doctors or others performing drug efficacy or medical outcome experiments, for example.
P-Value: It's intervention versus control group. Standard, basic scientific experimental design and statistical analysis stuff.
It's an uninformed and naive view to think that people looking at the behavior of humans at the level of social organization are somehow intellectually or scientifically less able than those examining them at the biological level.
Re: (Score:2)
I used 'scientists' in quotes in the same sense I'd put computer 'scientistis' in quotes. My degree is computer science, but I dispute that it's a science in the conventional sense.
I find debugging hardware is closer to science. You can't really see inside the chip, but you can develop hypotheses about what it wrong and come up with tests that will refute (or not) the hypotheses. Iterate until you think you probably know the truth.
Doing things well in social sciences is hard. The field (human subjects, IRB
Re: (Score:2)
Hey there. Again, we're generally on the same page here, and I agreed with your comment, and my counterpoint was directed not so much at you as at the general idea of the folks here with a dismissive view of what social science means. BTW, Interesting your comment about computer science. ;-)
I realize now we may not even have been using the same definitions. I was thinking more like psychology (let's say a stress coping training study, for example) versus biology (let's say a cancer treatment study). So, it'
Re: (Score:2)
I lived through my wife's PhD in education. I helped with the statistics. It was mind curdling stuff. But her thesis had rigor. S-Plus, Excel and everything else doesn't have MANOVAs. R does. We used R.
Re: (Score:2)
I lived through my wife's PhD in education. I helped with the statistics. It was mind curdling stuff. But her thesis had rigor. S-Plus, Excel and everything else doesn't have MANOVAs. R does. We used R.
Right. So you know first hand how ignorant it is to say math and statistics have nothing to with social science. :-)
Re: (Score:2)
I lived through my wife's PhD in education. I helped with the statistics. It was mind curdling stuff. But her thesis had rigor. S-Plus, Excel and everything else doesn't have MANOVAs. R does. We used R.
Right. So you know first hand how ignorant it is to say math and statistics have nothing to with social science. :-)
It wasn't me who said that.
Re: (Score:2)
Right. So you know first hand how ignorant it is to say math and statistics have nothing to with social science. :-)
It wasn't me who said that.
I know. I wasn't trying to imply you did. =)
Re: (Score:2)
Your false assumption is that doctors, chemists and physicists get things right with any greater frequency. It's not that social scientists are misusing statistics but that a large number of scientists is most disciplines simply do a poor job of quantifying things. It's a little more obvious when it happens in social science, but accurate measurement is hard or often impossible, so bad proxy measures a pervasive feature of most scientific disciplines. That's one of may reasons why most "experts" usually get [amazon.com]
Re: (Score:2)
Your false assumption is that doctors, chemists and physicists get things right with any greater frequency.
Did you mean to reply to me? It's a bit surreal to see you seemingly support what I wrote but tell me about a false assumption I made. In case you were speaking to me, I would like to point out that I made no such assumption. I argued that social scientists were as rigorous as any others but made no claims about either group's infallibility in absolute terms.
Re: (Score:2)
That sounds like an excellent reason to use scare quotes around "scientists". When only 25% of published biomedical results can be reproduced [footnote1.com], that field needs to do work to justify the claim to be science as well.
Re: (Score:2, Funny)
Not a big fan of college, eh?
Re: (Score:3)
Unfortunately academia was taken over by quacks a long time ago..
Re: (Score:2)
Psychiatry is a medical profession where the practitioners go to medical school and then train in the profession as per other medical professions. Psychology is not medicine. Psychologist study human emotions, thought, mental illnesses and disorders (overlapping Psychiatrists) but cannot prescribe unless they also train as a doctor or a Psych nurse. Psychologists do more counseling and group dynamics. Psychiatrists are more focused on drug treatments, but often work in tandem with Psychologists.
Psychologica
Re: (Score:2)
Whereas scientists principally use deduction
To all autodidacts: Imagine if YOU were to make a statement this absurd, without even a hint of self doubt. Worse, what if this is the kind of thing you actually believe as a result of your online "learning" adventures?
This is why a formal education is important. On your own, you could very well end up the the AC above -- so deeply misinformed that there's little hope for recovery.
Re: (Score:2)
p-values are not probabilities. What people would like it to be are probabilities that one hypothesis is correct compared to another. But that is not what it does, and because people ignore that gap and mis-interpret them it has become such a problem; that's why they are being banned. Many experiments with acceptable p-values (p0.05) are not reproducible.
Actually the inventor of p-values never intended them for a test, only to uncover that there is perhaps worth of further investigation.
p-values tell you, i
Re: (Score:2)
P-values certainly are probabilities. You just argued they aren't probabilities, but they are probabilities of this other thing. You contradicted yourself. I was specifically vague when I called it 'something' because it changes with the type of test and there are many to choose from and I didn't want to write a whole book. That book has already been written by smarter people than I.
Re: What's the problem? (Score:5, Informative)
Yes they can, in some cases. There was a very well-controlled study where two sets of anonymous letters of application were sent to various positions at a large number of companies from a large number of applicants. The letters included similar random credentials from random institutions, random cosmetic variations of the same cover letter, and so on, to avoid tipping the hand of the researchers. The only difference between the two groups of letters was that one were given names sampled uniformly from African-Americans, and the other given names sampled uniformly from everyone else. The names were assigned in a blind way, literally a random form insertion, to avoid introducing bias.
I'm sure you can guess where this is going. The response and offer rate to the blacks was significantly lower, both statistically and practically. It's rather hard to explain that away, though I'm sure someone here will try without having even read the study.
Re: (Score:2)
Dammit, I meant the letters were written anonymously and then labeled with names later. I guess "pseudonymous" would have been a better word. Oh well.
Re: What's the problem? (Score:5, Informative)
There was a very well-controlled study where two sets of anonymous letters of application ...
This study was conducted by Stephen Levitt, and is described in his book Freakonomics [wikipedia.org], which is a fantastic book for anyone interested in the application of statistics to social science. Here [nber.org] is the original paper.
Re: What's the problem? (Score:4)
yes, i am.
true randomization allows you to control for everything (intuitively: since it's randomized, there is no way for you to introduce bias), at the cost of increased variance. however, you can make up for increased variance by increasing the sample size, which is what they did here. i forget the exact numbers, but they sent out hundreds of letters.
far from what you assert, randomization is fundamental to experimental control, and randomness is quite easily generated in a controlled manner. here's a general hint for you and everyone else: don't say things like "randomness cannot be controlled because then it wouldn't be 'true' randomness". it just makes you seem like an idiot.
Re: (Score:2)
"Randomness is not compatible with experimental control."
You have no clue about how to set up an experiment.
Re: (Score:2)
Re: (Score:2)
Are you sure that the term "well-controlled study" applies, given how you repeatedly used the term "random" when describing this experiment?
Randomness is not compatible with experimental control. Additionally, randomness itself cannot be controlled, because doing so would prevent it from being true randomness.
Quick! Someone is wrong on the internet.
Re: (Score:2)
"Racism", "sexism", "patriarchy" and related topics of study within the social "sciences" inherently can't be quantitatively analyzed in any meaningful way.
You sound as silly now as the people who used to think atoms were the *inherent* limit of divisibility and exploration. Then electrons...
In science, as in politics, innovation tends to come from the death of the old stalwarts rather than their enlightenment.
Even Einstein became an obstructionist to quantum mechanics in his later years.
Re: (Score:2)
Even Einstein became an obstructionist to quantum mechanics in his later years.
"God does not play dice with the universe." ;-)
Re: (Score:2)
Except that those terms are subjective and, really, based on emotions.. Atoms are not.
Re: (Score:2)
Except that those terms [racism, sexism, ...] are subjective and, really, based on emotions.. Atoms are not.
That the mind uses stereotypes to classify and categorize information is neither "subjective" nor an "emotion." This is what researchers actually study. Subsequently, inferences can perhaps be generalized that relate to the functioning of the subjective terms the previous poster used.
Re: (Score:2)
SJWs do not apply it equally however..
Re: (Score:2)
Citations please.
Re:What's the problem? (Score:5, Insightful)
Actually, p-values are about CORRELATION. Maybe *you* aren't well-positioned to be denigrating others as not statistical experts.
I may be responding to a troll here, but, no, the GP is correct. P-values are about probability. They're often used in the context of evaluating a correlation, but they needn't be. Specifically, p-values specify the probability that the observed statistical result (which may be a correlation) could be a result of random selection of a particularly bad sample. Good sampling techniques can't eliminate the possibility that your random sample just happens to be non-representative, and the p value measures the probability that this has happened. A p value of 0.05 means that there's a 5% chance that your results are bogus in this particular way.
The problem with p values is that they only describe one way that the experiment could have gone wrong, but people interpret them to mean overall confidence -- or, even worse -- significance of the result, when they really only describe confidence that the sample wasn't biased due to bad luck in random sampling. It could have been biased because the sampling methodology wasn't good. I could have been meaningless because it finds an effect which is real, but negligibly small. It be meaningless because the experiment was just badly constructed and didn't measure what it thought it was measuring. There could be lots and lots of other problems.
There's nothing inherently wrong with p values, but people tend to believe they mean far more than they do.
Re: (Score:2)
There really aren't any good ways to measure those other effects. If you knew how your experiment was biased, you'd try and fix it.
Criticisms of p-values usually fall into two groups. Some people believe that p-values are bad because some people interpret them as the false positive rate. Personally, I think that's a problem with some people, and not p-values. The other criticism, which is particularly prevalent in social sciences, epidemiology and some of the squishier medical-type areas, is that if you
Re: (Score:2)
There really aren't any good ways to measure those other effects. If you knew how your experiment was biased, you'd try and fix it.
Randomized sampling goes a long way, but only if you have a large enough population. This is one of the problems of social sciences. A randomized 10% subsample from 100 subjects ain't gonna cut it. A randomized subsample from 10,000,000 people isn't going to get funded.
Re: (Score:2)
There really aren't any good ways to measure those other effects. If you knew how your experiment was biased, you'd try and fix it.
Randomized sampling goes a long way, but only if you have a large enough population. This is one of the problems of social sciences. A randomized 10% subsample from 100 subjects ain't gonna cut it. A randomized subsample from 10,000,000 people isn't going to get funded.
Why wouldn't a randomized subsample from 10M people get funded? The required sample size doesn't grow as the population does.
Re: (Score:2)
Because identifying the 10 million and sampling the 1 million will be expensive. Worse, that many people in the class may not exist. If your class is 'residents of Boring, Oregon', there may simply be too few of them to randomize away the confounders and drive the p-value down.
Top tip. If you want to find something in the data, it helps if it sticks out above the noise floor like a sore thumb. If you're having to push the noise floor down with sample size to make something visible, the odds you got somethin
Re: (Score:3)
Actually, it is increasing utilization of improved statistical methods leading to the phase-out of earlier, cruder methods. It's standard advancement of the scientific method and applies to all experimental design analysis regardless of field.
That this journal is throwing out the baby with the bath water and abdicating its responsibility to review quality of content in favor of blanket rules is another matter.
Re: (Score:2)
Well, I need to read TFA, but I am going to assume they are provding alternatives? I would hope nothing with out any stat test would pass review
Re: (Score:3)
Well, I need to read TFA, but I am going to assume they are provding alternatives? I would hope nothing with out any stat test would pass review
Facepalm. Oh goodness, are people reading this headline to think they are removing p-values in favor of just accepting speculation with no statistical analysis!?!
YESSSS, they are forcing submitters to UPGRADE their statistical analysis to employ more robust mathematics.
Re: (Score:2)
Oh goodness, are people reading this headline to think they are removing p-values in favor of just accepting speculation with no statistical analysis!?!
This is a social science journal. Statistics are obviously a tool of the Patriarchy and should be shunned. (This mockery has become a meme now - you can buy "logic is a tool of the Patriarchy" t-shirts for goodness sake.)
Re: (Score:2)
But their replacement is even more subject to bias that p-Values.
At least with P-Values I don't have to delve into a dozen things that are not in the paper, to see the error. With their proposal, I have to investigate at least a dozen factors that are not mentioned dn the paper, to determine where, and why the errors that are made are present.
IOW their proposed replacement makes lying using statistics so much more trivial, that you can now say that lies and statistics are synonyms.
Re: (Score:2, Troll)
It's the opposite really. You can publish any fucking thing by mining for a low p-value (through multiple comparisons, outright biased sampling techniques, etc., etc.) and then turning your brain off.
Of course, just getting rid of the p-value outright won't solve this, but at the very least, the problem isn't what you're saying it is. Blind math fetishism isn't solving anything.
Re: (Score:2)
It's the opposite really. You can publish any fucking thing by mining for a low p-value (through multiple comparisons, outright biased sampling techniques, etc., etc.) and then turning your brain off.
Of course, just getting rid of the p-value outright won't solve this, but at the very least, the problem isn't what you're saying it is. Blind math fetishism isn't solving anything.
Yeah, but then when nobody can replicate your findings, you become that lab that publishes crap all the time. Reviewers start asking for more confirmatory evidence, grant reviewers already ding you before they've even read you application, etc. Sure you can abuse the system for awhile, but eventually it catches up to you.
Re:Blind math fetishism??? (Score:4, Informative)
speak for yourself. i've never tried using a Springer book as a nipple weight, though; i'll give it a try sometime. thanks.
Re: (Score:2)
Why should the math be towing the groupthink? Can't the groupthink move on its own?
Or did you mean "toe the groupthink" as in "toe the line". No, that expression isn't about pulling barges, it's about standing in the right place in a formation....
Re: (Score:2)
eh, you are right about the origin of the phrase, but really it works either way. in this case, i like the imagery of a small yoked vehicle having to pull a lot of dead weight; it's an apt description of math/stats as it relates to the social sciences right now.
Re: (Score:2)
The math works fine; the problem is choosing the appropriate method. My hunch is that the biggest mistake in the use of stats in the social sciences is failing to correct p-values for multiple comparisons [wikipedia.org]. That is, if you're hypothesis is limited to predicting an association between two variables, then p-values are just fine. But if you sent out a questionnaire with 20 questions on it and compute all 190 pairwise correlations between them, you'll get around 9 or 10 "significant" (p 0.05) but meaningless as
Re: (Score:2)
yes, you are correct: social psychology done rigorously becomes economics. as for the rest, however...
Re: (Score:2)
i am a statistician and i've worked closely with a sociologist (one of the few who uses math correctly, if a bit pedantically). you are correct, it is not intrinsically impossible to do sociology correctly. however, the mathematical literacy standards for the field are woefully lacking even in the ivy league.
this song by Tom Lehrer [youtube.com] holds true today, just replace "sigma and chi-square" by "social network analysis".
Re: (Score:2, Insightful)
, in the real world they are simply a way of inserting subjective intuition into numerical judgements.
That is actually one of the selling points. You're going to insert subjective intuition into your judgements and methods regardless of what method you use. With proper use of Bayesian methods you can more explicitly state your assumptions, even if you don't do much about them.