Dozens of Recent Clinical Trials May Contain Wrong or Falsified Data, Claims Study (theguardian.com) 66
John Carlisle, a consultant anesthetist at Torbay Hospital, used statistical tools to conduct a review of thousands of papers published in leading medical journals. While a vast majority of the clinical trials he reviewed were accurate, 90 of the 5,067 published trials had underlying patterns that were unlikely to appear by chance in a credible dataset. The Guardian reports: The tool works by comparing the baseline data, such as the height, sex, weight and blood pressure of trial participants, to known distributions of these variables in a random sample of the populations. If the baseline data differs significantly from expectation, this could be a sign of errors or data tampering on the part of the researcher, since if datasets have been fabricated they are unlikely to have the right pattern of random variation. In the case of Japanese scientist, Yoshitaka Fuji, the detection of such anomalies triggered an investigation that concluded more than 100 of his papers had been entirely fabricated. The latest study identified 90 trials that had skewed baseline statistics, 43 of which with measurements that had about a one in a quadrillion probability of occurring by chance. The review includes a full list of the trials in question, allowing Carlisle's methods to be checked but also potentially exposing the authors to criticism. Previous large scale studies of erroneous results have avoided singling out authors. Relevant journal editors were informed last month, and the editors of the six anesthesiology journals named in the study said they plan to approach the authors of the trials in question, and raised the prospect of triggering in-depth investigations in cases that could not be explained.
Study May Contain Wrong or Falsified Data, (Score:1)
claims another study
Authors of the first study were promptly sacked
Re: Study May Contain Wrong or Falsified Data, (Score:2)
I have been out of academia since the very early 1990s. My publications, those that went on to peer review and journal publications, are not nearly as numerous as the guy listed. No, he's a whole order of magnitude more prolific than I.
Which makes me kinda giggle. How the hell does he even have that many publications?!?
Re: (Score:2)
How the hell does he even have that many publications?!?
Probably doing a study on the effects of amphetamines.
Marcia Angell & Skepticism on Mainstream Scien (Score:4, Informative)
http://pdfernhout.net/to-james... [pdfernhout.net] "The problems I've discussed are not limited to psychiatry, although they reach their most florid form there. Similar conflicts of interest and biases exist in virtually every field of medicine, particularly those that rely heavily on drugs or devices. It is simply no longer possible to believe much of the clinical research that is published, or to rely on the judgment of trusted physicians or authoritative medical guidelines. I take no pleasure in this conclusion, which I reached slowly and reluctantly over my two decades as an editor of The New England Journal of Medicine. (Marcia Angell)"
Re: (Score:1)
Re:Is anyone surprised? (Score:4, Insightful)
Thanks for that! (Score:5, Insightful)
Thanks for that! Now I can use that tool to generate data for my upcoming fabricated studies.
Re: (Score:2)
Well, JBS Haldane showed this technique for exposing fraud in 1939, so it's not revealing anything smart fraudsters wouldn't already know. A lot of the anomalies (though not necessarily all) are probably down to carelessness rather than fraud.
Re: Thanks for that! (Score:2)
LOL If you're gonna put that much effort into it, you might just as well do the damned study.
Re: (Score:2)
"90 of the 5,067" (Score:5, Insightful)
That's... less than 2%. Naturally, we want it to be 0%, but 1.8% is nothing to generate scare headlines over.
Re:"90 of the 5,067" (Score:4, Funny)
You stole the words right from my mouth: 90/5067? That's significant at the p < 0.02 level!
Re:"90 of the 5,067" (Score:5, Insightful)
That's... less than 2%. Naturally, we want it to be 0%, but 1.8% is nothing to generate scare headlines over.
They only caught the dumb ones. It would have been easy to generate fake data that fits a known distribution. For instance, in Python, just use numpy.random.normal instead of numpy.random.uniform.
The 2% is just the floor. The actual fraud and/or incompetence rate is likely higher.
Re: (Score:3)
One other thing to consider.
I am a funeral director. I see deaths first hand from medical mistakes and malpractice. In fact, I only see that "2%".
You know that old joke "an undertaker is somebody who cleans up the doctor's mistakes"? Well, more truth to it than you might want to know.
So, for all you people who say "it's only 2%", you come, sit with me when I deal with a family who has had a death because they were part of that "2%". You look them in the eyes, and say" hey, science is still good
Re: (Score:2)
The good frauds can't be caught. The ba
Re: (Score:3)
>You can't fake the expected results if you use a RNG
Yes you can.
>Also, RNGs don't generate Normal results.
They most certainly can. Look up the Box Muller method or the Ziggurat algorithm.
>They generate random results.
If the designer knew what he or she was doing. Usually they don't.
>Most real-world "random" events are not "random", but are "normal".
That depends on how you measure it. Poisson distributions are an example of something you can force by choosing your measurement method.
>So a com
Re: (Score:2)
Most real-world "random" events are not "random", but are "normal".
And when you are doing a clinical trial on some potential new drug, your test population is never "random" or "normal". It is selected for the disease or condition that you are trying to fix. It will not be unusual for 100% of test population to have a BMI of 50 when you're testing a new diet drug for significantly overweight people, for example. The fact that a very large proportion of those same people will also have high blood pressure and high cholesterol is not unusual, it is to be expected. And oh, my
Re: (Score:2)
Re: (Score:1)
Clickbait aside, I support investigating these 90 cases of suspected fraud. The fact that the suspicious studies are all related to medicine does not make this less urgent.
In an ideal world all studies would be repeated by multiple independent teams for confirmation, but the remaining 4977 ones will probably be given low priority in reality.
On a sidenote, /. seems not to like the HTML code for a non-breaking space.
Re:"90 of the 5,067" (Score:5, Informative)
Seems like artifact of randomness - Prosecutor's Fallacy [wikipedia.org].
Yes, some will be genuine falsifications. But some WILL be genuine results.
You write a paper on a list of 1000 tosses of a coin, noting each result. The chance for the coin to land on edge in one toss is around 1 in 100,000.
Then your paper is reviewed along with 100,000 others. If you have the coin land on edge more than once in your dataset, it's flagged as a falsified dataset.
Roughly 10 papers in the 100,000 tested flagged as falsified will be false positives.
------
Statistical results are subject to the same randomness as single tests contributing to these results. The scale of the randomness is reduced by a factor related to the number of tests, but still exist. And take enough correctly obtained statistical results, and you WILL find outliers.
Re: (Score:1)
This sort of effect is always a concern. But, per TFS, they found 43 trials in which the "measurements that had about a one in a quadrillion probability of occurring by chance". Unless there were a quadrillion trials (unlikely), the Prosecutor's Fallacy isn't relevant here.
Re: (Score:2)
Some of the ones that fail the 1 in 10,000 test are quite possibly an effect of randomness, although 82 out of 5015 is a much higher failure rate than would be expected. And 43 of those 5015 having a probability of less than 1 in 10^15 really isn't a plausible random artefact.
Re: "90 of the 5,067" (Score:2)
Huh... Thanks. I'd never read/seen that fallacy. In my defense, I was on the debate team at the collegiate level, but this fallacy was coined more recently than my experiences in said team. Yeah, I'm that old...
Anyhow, I am trying to wrap my head around it.
The odds of winning without cheating at 1:1000.
The odds of winning with cheating are 1:100.
They won, ergo the most probable reason for their winning was they cheated. Which, while true, doesn't actually mean that they cheated - it just means someone doesn
Re: (Score:2)
Yes. It's a good heuristic to point out suspects for more thorough tests (which may be too expensive to conduct on the whole statistic base) but it isn't a proof by itself.
Re: (Score:3)
If you do a binomial calculation with
Re: (Score:2)
Seems like artifact of randomness - Prosecutor's Fallacy [wikipedia.org].
Yes, some will be genuine falsifications. But some WILL be genuine results.
You write a paper on a list of 1000 tosses of a coin, noting each result. The chance for the coin to land on edge in one toss is around 1 in 100,000.
Then your paper is reviewed along with 100,000 others. If you have the coin land on edge more than once in your dataset, it's flagged as a falsified dataset.
Roughly 10 papers in the 100,000 tested flagged as falsified will be false positives.
You're assuming the authors of the study weren't very good at stats.
If their standard for false data was 2/1000 coins landing on edge then yes, they got false positives.
If their standard was 100/1000 coins landing on their edge then I'm pretty sure those data sets were wrong.
Re: (Score:2)
Re: (Score:3)
See, or... statistically estimate?
Re: (Score:2)
Outlier data (Score:4, Interesting)
So 90 of the 5,067 were outlier-like data and this is concluding results on these outliers.
I knew that publish or perish was ruining science, but this is actually the most heartening news I've heard of its credibility.
I've learned that less than 1.8% of these studies used non-extreme crazy data. My faith in science is restored!
Re: (Score:2)
Re:Only in Clinical studies ..... (Score:4, Interesting)
Luckily we can trust those implicitly, especially the model based ones.
Trust is not required. Full source code and input data is available for your inspection and verification.
Re: (Score:2)
I don't know which scientists you listen too, but the last time I heard from those they refused to share exactly that because of "reasons".
I bet that "last time you heard" was a while ago ? https://www.newscientist.com/a... [newscientist.com]
Re: Only in Clinical studies ..... (Score:4, Insightful)
Hmm...
I am not a climate scientist. I am a retired scientist. What did I do? I modeled traffic. As strange as it might sound, there is a lot of similarity between the two. I will try to give some history, as it may help this make more sense. Sorry for the lack of brevity.
In my case, I helped bring traffic modeling to the age of computers. In this process, it was learned that you could improve the model results, significantly, by increasing the amount of data available. Even seemingly trivial things can impact throughput. Simple things, such as signage fonts, can impact throughput. Even the frequency of lane markings, reflectivity of lane markings, and coloration all have an impact on throughput.
To try to put this in perspective, I was working with data sets in the full TB size, before the turn of the century. We did distributed computing, before it even really had a name.
Why is that important?
Well, traffic is a bit like climate. It is a chaotic system. To be clear, a chaotic system is not a system that is random. It appears random but, with more data, you can tease out patterns and make deterministic predictions based on a variety of variables, with some levels of consistency and success.
I am not suggesting, for the record, that the climate science models are 100% accurate. In fact, they have confidence ratings. That goes underreported, but they will tell you how confident they are in the results.
Anyhow, that's besides the point. I just want to make it clear and avoid confusion.
What is important is that you have to massage the data. You have to make corrections to the data. You have to remove outliers.
See, we'd collect data and then run it against the models. We'd compare the model output with what was really happening. Sometimes, the results are pretty close. This means you can have greater confidence in the results. Sometimes, it isn't even remotely close.
At that point, you usually start by poking at the model itself. However, you will also poke at the data. You will throw some of that data right into the trash. You will normalize the numbers, and adjust the impact factor. You will also probably swear, like a lot. You will invent whole new languages, just to swear in them.
Either way, you will massage that data until you get the results that most closely match reality. You take existing data and run your models to see how well they match reality. When you get it to the point where you're confident, you use those methods to make predictions about the future, given new variables. This will have varied confidence levels, and pinpoint accuracy isn't expected by anyone versed in the science.
The truth is, you can model all you want but some drunk guy is still going to drive, in reverse, the wrong direction down a one way street. So, you only have so much confidence in the predictions.
The whole point is, you have to massage the data. If you don't, you get horrible results that don't match reality. The expected outcome isn't certainties. The expected outcome is predictions for which you can assign a confidence level.
I suspect part of the problem is poor communication and bad journalism. I've taken some time to examine the models, methods, and reasons. I am not a climate scientist, but I have taken a reasonable amount of time to study it in a scholastic manner. You can download their data AND their models, for free, and run them yourself. You can massage that data any way you want, too. You can apply all the adjustments you want and run the models yourself - for just the cost of hardware you already own and electricity.
Anyhow, I hope this clears a few things up. Correcting and massaging data is pretty normal. It's pretty much required, if you want meaningful results. I am pretty sure the uncorrected data sets are also available. You can get so many data sets, for free. They'll even give you the models. Hell, they'll even give you the source code for the models.
I do want to make it clear, the goal isn't a perfect pre
Re: (Score:3)
Re: (Score:2)
See, you'll never get 100% certainty. You can model, massage, and model again - until you can match reality as closely as you can - but you can never account for outliers such as the drunk guy mentioned in my original post.
I'm not sure that it should weaken anything - it just means you may have been led to believe that the confidence levels are higher than they are. (They're pretty high, by the way.) This in no way implies certainty regarding exact timing. The models, and their results, have different confi
Re: (Score:3)
I am not a climate scientist - I feel this needs to be made clear. I have made a quasi-scholastic study of climate science, largely to see for myself what the fuss was about. I've read a whole lot of papers, a whole lot of research, and watched a whole lot of talks. Oddly, I never watched the Gore movie. I prefer to listen to the scientists.
Which leads me to...
The theory behind it is pretty sound. It can be reduced to some pretty simple physics and math. If you put more energy into a system, it's going to t
Re: (Score:3)
It is fortunate that fraud (or incompetence) like this never occurs in other areas. For example think of the implications of this happening in Climate Science papers and studies. Luckily we can trust those implicitly, especially the model based ones.
Because when confronted with the evidence that 90/5,067 studies in one field (likely) contain fabricated data the obvious implication is that an entire field is fabricated?
Your conspiracy theory seems to be missing a few steps.
Unbiased followups needed (Score:2)
A study should be repeated by an org that has no skin in the game per results. They should be paid to test and get the same amount of compensation regardless of outcome. A random lottery should decide the head managers/researchers for any given repeat.
Opposite of the headline would be more newsworthy (Score:2)
Re: (Score:3)
That's not fraud. Most of those studies are primary biology or animal studies, non-blinded. They tend to have sample sizes of around 10, and use sketchy stats. It's not particularly surprising they can't be replicated.
The stats should be improved, and they need to be more cautious in their conclusions (as does anyone reading them), but the scientific literature is supposed to be more about "hey guys, look at this, what do you think?" and less about "this is the truth!".
Bonferroni... (Score:2)
...But, did he use a bonferroni correction to compute his p-values, since this is a classic data dredge? Sure, his method will turn up true positives (and did, for at least one known offender) but what remains to be seen is the false positive rate and the lawsuit rate, since skewed distributions could have many causes some of which are benign and this is pretty serious defamation of character if one casts aspersions without secondary supporting evidence of malpractice.
In other words, are his "positives" re
This won't be a problem fir much longer (Score:2)
Scott Gottlieb, the current head of the FDA, wants to end drug trials. "The free market will put the bad actors out of business."
Unblinded (Score:2)
Most of those studies are probably not blinded. I wouldn't be surprised if all of the flagged ones are not. There doesn't have to be any fraud at all. If you know what the groups are, your brain will introduce its own bias, without you even knowing about it.
Isn't that within the statistical expectation? (Score:2)
The usual threshold of statistical certainty used for publishing scientific results is 95% (sometimes 98%). That is, a result becomes noteworthy enough to publish if there's a 5% or lower chance of it happening simply due to random chance.
90 studies out of 5,067 is 1.8%. Which is below the 5% you'd expect from a 95% threshold, and even the 2% you'd expect with a 98