Epidemiologist Says Most Published Research Findings Are False – Psychiatrist Addresses Credibility Crisis

Epidemiologist Says Most Published Research Findings Are False – Psychiatrist Addresses Credibility Crisis

Sat, 8 Oct 2005

An article published in PLoS Medical (an Open Access Peer Reviewed Medical Journal) by an epidemiologist who holds academic positions on both sides of the Atlantic–at the University of Ioannina (Greece) and Tufts University (Massachusetts)– knocks the socks off much of what passes the peer review process of medical journals.

Dr. John Ioannidis examines, what he says are the key factors that have resulted in mostly false, published medical research reports: “There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims.”

Dr. Ioannidis notes that the probability that a research finding is indeed true depends on “the prior probability of it being true (before doing the study), the statistical power of the study, and the level of statistical significance.”  He notes the high rate of nonreplication–a sure sign of the findings being false.

“What is less well appreciated is that bias and the extent of repeated independent testing by different teams of investigators around the globe may further distort this picture and may lead to even smaller probabilities of the research findings being indeed true….Several independent teams may be addressing the same sets of research questions. As research efforts are globalized, it is practically the rule that several research teams, often dozens of them, may probe the same or similar questions. Unfortunately, in some areas, the prevailing mentality until now has been to focus on isolated discoveries by single teams and interpret research experiments in isolation. An increasing number of questions have at least one study claiming a research finding, and this receives unilateral attention. The probability that at least one study, among several done on the same question, claims a statistically significant research finding is easy to estimate.”

Dr. Ioannidis defines bias as “the combination of various design, data, analysis, and presentation factors that tend to produce research findings when they should not be produced.

Let u be the proportion of probed analyses that would not have been “research findings,” but nevertheless end up presented and reported as such, because of bias.”

He differentiates bias from chance variability “that causes some findings to be false by chance even though the study design, data, analysis, and presentation are perfect.”

“Bias can entail manipulation in the analysis or reporting of findings. Selective or distorted reporting is a typical form of such bias.”

Dr. Ioannidis’ observation that “claimed research findings may often be simply accurate measures of the prevailing bias,” is especially applicable to psychiatry.

The focus of a forthcoming presentation by Dr. David Healy, “Psychopharmacology in Turmoil: A Scientific or Ethical Crisis?” is bias, faulty data analysis, and manipulated clinical trial finding reports as manifested in the psychiatric literature. In his lecture, at Columbia University Medical Center (October 20), Dr. Healy will demonstrate how current clinical practice guidelines that purport to be “evidence-based” are not not based on scientifically valid evidence at all.

For those who are unaware of Dr. Healy’s contribution to the credibility crisis in psychiatry: his analysis of previously undisclosed company data from SSRI drug trials contradicted the published reports about these trials. His findings of a drug-induced suicide risk, challenged the mindset and prescribing practices of the psychiatric establishment in the UK, Canada, Australia, and the US. By bringing the undisclosed hazards to public notice, the debate about the efficacy and safety of SSRs–and the validity of the process by which they were tested–reached a crescendo.

Families whose children became suicidal after being prescribed an SSRI–some becoming casualties of drug-induced suicide–came to Washington from coast to coast to testify before two FDA advisory committee hearings (in February and September, 2004). Their compelling testimonies coupled with independent analyses replicating Dr. Healy’s, resulted in mandatory Black Box warnings about the drugs posing a twofold risk of suicidal behavior for adolescents.

For those who are unable to attend–as well as those who want to familiarize themselves with the issues–as they stood in 2003–a video of Dr. Healy’s Grand Rounds presentation at UCLA, Neuropsychiatric Institute, “How Pharmaceutical Companies Mold our Perceptions of Mental Illness”  (October 28, 2003)

is on view at: http://www.mentalhealth.ucla.edu/opce/gr0304.html

“Psychopharmacology in Turmoil:A Scientific or Ethical Crisis?”
Date: Thursday, October 20
Time: 12:30 – 2:00
Location: Columbia Presbyterian Hospital –622 West 168th Street
10th floor – Irving Conference Room

Directions from Penn Station: take the #1 train (red line) uptown or the A train (blue line) uptown to 168th street.
At 168 Street station (#1 train) take the elevator up to 168th and Broadway
A train: just go up the stairs to 168th and Broadway
From trains: walk west on 168th street 1/2 block to circular driveway: Presbyterian Hospital 622 west 168th street
get a pass from the guard. take elevator behind the guard desk to the 10th floor
go through the doors to the Irving Conference Room.

After the lecture there will be aboout 40 minutes for Q & A

Contact: Vera Hassner Sharav
212-595-8974

http://medicine.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pmed.0020124

PLoS Med 2(8): e124
Why Most Published Research Findings Are False August 30, 2005 John P. A. Ioannidis

Excerpt

Summary

There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.

Published research findings are sometimes refuted by subsequent evidence, with ensuing confusion and disappointment. Refutation and controversy is seen across the range of research designs, from clinical trials and traditional epidemiological studies [1-3] to the most modern molecular research [4,5]. There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims [6-8]. However, this should not be surprising. It can be proven that most claimed research findings are false. Here I will examine the key factors that influence this problem and some corollaries thereof.

Modeling the Framework for False Positive Findings

Several methodologists have pointed out [9-11] that the high rate of nonreplication (lack of confirmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 0.05. Research is not most appropriately represented and summarized by p-values, but, unfortunately, there is a widespread notion that medical research articles should be interpreted based only on p-values. Research findings are defined here as any relationship reaching formal statistical significance, e.g., effective interventions, informative predictors, risk factors, or associations. “Negative” research is also very useful. “Negative” is actually a misnomer, and the misinterpretation is widespread. However, here we will target relationships that investigators claim exist, rather than null findings.

Bias

First, let us define bias as the combination of various design, data, analysis, and presentation factors that tend to produce research findings when they should not be produced. Let u be the proportion of probed analyses that would not have been “research findings,” but nevertheless end up presented and reported as such, because of bias. Bias should not be confused with chance variability that causes some findings to be false by chance even though the study design, data, analysis, and presentation are perfect. Bias can entail manipulation in the analysis or reporting of findings. Selective or distorted reporting is a typical form of such bias. We may assume that u does not depend on whether a true relationship exists or not. This is not an unreasonable assumption, since typically it is impossible to know which relationships are indeed true. In the presence of bias (Table 2), one gets PPV = ([1 ’àí ‘¾]R + u‘¾R)/(R + ‘± ’àí ‘¾R + u ’àí u‘± + u‘¾R), and PPV decreases with increasing u, unless 1 ’àí ‘¾ ’⧠‘±, i.e., 1 ’àí ‘¾ ’⧠0.05 for most situations. Thus, with increasing bias, the chances that a research finding is true diminish considerably. This is shown for different levels of power and for different pre-study odds in Figure 1.

Box 1. An Example: Science at Low Pre-Study Odds

Let us assume that a team of investigators performs a whole genome association study to test whether any of 100,000 gene polymorphisms are associated with susceptibility to schizophrenia. Based on what we know about the extent of heritability of the disease, it is reasonable to expect that probably around ten gene polymorphisms among those tested would be truly associated with schizophrenia, with relatively similar odds ratios around 1.3 for the ten or so polymorphisms and with a fairly similar power to identify any of them. Then R = 10/100,000 = 10’àí4, and the pre-study probability for any polymorphism to be associated with schizophrenia is also R/(R + 1) = 10’àí4. Let us also suppose that the study has 60% power to find an association with an odds ratio of 1.3 at ‘± = 0.05. Then it can be estimated that if a statistically significant association is found with the p-value barely crossing the 0.05 threshold, the post-study probability that this is true increases about 12-fold compared with the pre-study probability, but it is still only 12 ˆó 10’àí4.

Now let us suppose that the investigators manipulate their design, analyses, and reporting so as to make more relationships cross the p = 0.05 threshold even though this would not have been crossed with a perfectly adhered to design and analysis and with perfect comprehensive reporting of the results, strictly according to the original study plan. Such manipulation could be done, for example, with serendipitous inclusion or exclusion of certain patients or controls, post hoc subgroup analyses, investigation of genetic contrasts that were not originally specified, changes in the disease or control definitions, and various combinations of selective or distorted reporting of the results. Commercially available “data mining” packages actually are proud of their ability to yield statistically significant results through data dredging. In the presence of bias with u = 0.10, the post-study probability that a research finding is true is only 4.4 ˆó 10’àí4. Furthermore, even in the absence of any bias, when ten independent research teams perform similar experiments around the world, if one of them finds a formally statistically significant association, the probability that the research finding is true is only 1.5 ˆó 10’àí4, hardly any higher than the probability we had before any of this extensive research was undertaken!

See entire article at: http://medicine.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pmed.0020124

John P. A. Ioannidis is in the Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece, and Institute for Clinical Research and Health Policy Studies, Department of Medicine, Tufts-New England Medical Center, Tufts University School of Medicine, Boston, Massachusetts, United States of America.

Copyright: © 2005 John P. A. Ioannidis. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.