Merle-Marie Pittelkow

Psychometrics and Statistics
Faculty of Behavioral and Social Sciences
University of Groningen

Academic webpage Merle-Marie Pittelkow


Back to Bayesics: Solving to Reproducibility Crisis in Biomedicine

The overarching aim of this project is to examine how adopting a Bayesian framework could foster evidence-based medicine. More precisely, I will be exploring how the use of a bayes factor (henceforward BF) might aid the process of determining efficacy of pharmacological treatments (i.e., drugs).


For a new drug to get marketing approval in the US, it has to be registered at and evaluated by the Federal Drug Administration (FDA). The evaluation of a new medication is complex and entails various considerations. A multidisciplinary team of physicians, statisticians, chemists, pharmacologists, and other experts at the FDA’s Center for Drug Evaluation and Research (CDER) reviews evidence for the drug’s efficacy and safety provided by a pharmaceutical company, commonly in the form of randomized controlled trials (RCTs; U.S. Food and Drug Administration, 2019). Although many aspects of a drug’s profile must be considered in the drug approval process, the statistical evaluation of the drug’s efficacy plays a central role in the FDA’s decision process. In the FDA’s (1998) Guidance for Industry on Providing Clinical Evidence of Effectiveness for Human Drugs and Biological Products it is stated that “sound evidence for effectiveness is a crucial component of the Agency’s benefit-risk assessment of a new product or use” (p. 1). According to the current guidelines, substantial evidence for efficacy is provided by “at least two adequate and well-controlled studies, each convincing on their own” (Federal Drug Administration, 1998, p.3). In more concrete terms, currently efficacy is demonstrated with two independent RCTs with a p-value lower than 0.05 (which quantifies the probability of obtaining the observed data or more extreme data under the assumption that the null hypothesis, commonly no difference between the placebo and drug group, is true). In rare cases the agency might also consider “data from one adequate and well-controlled clinical investigation and confirmatory evidence” (Federal Drug Administration, 1998, p.3) as sufficient to establish efficacy.

For over half a century, Null Hypothesis Significance Testing (NHST) and the reliance on p-values has been criticized heavily (for an overview, see Kline, 2013; Van Ravenzwaaij & Ioannidis, 2017; Wagenmakers, 2007). In the case of the FDA’s decision process regarding drug endorsement, the current reliance on p-values may lead to suboptimal decisions in some cases. Simulation studies from van Ravenzwaaij and Ioannidis (2017, 2019) demonstrated how the reliance on p-values lower than 0.05 can lead to inconsistent strength of evidence in different circumstances. In their first simulation, the authors were able to demonstrate that  for a non-trivial proportion of cases the criterion of two studies with p<.05 lead to endorsement, when statistical evidence was actually favoring the null-hypothesis as classified by BFs (van Ravenzwaaij & Ioannidis, 2017). This was especially true in cases where many clinical trials (≥5) were conducted out of which exactly two were statistically significant.

The Bayesian framework offers a practical alternative to NHST (Wagenmakers, 2007). In contrast p-values, BFs allow researchers to combine evidence quantify evidence in favor of either hypothesis (Gronau, Ly, & Wagenmakers, 2019; Jeffreys, 1961; Rouder, Speckman, Sun, Morey, & Iverson, 2009; Van Ravenzwaaij, Monden, Tendeiro, & Ioannidis, 2019). In the Bayesian framework, the predictive evidence of two competing hypotheses is compared (for an elaborate discussion of Bayesian statistics we refer the interested reader to Etz, Gronau, Dablander, Edelsbrunner, & Baribault, 2018) and the resulting ratio is referred to as the BF.

BFs provide a more nuanced inference depending on the presumed underlying population effect compared to p-values. In a second simulation, van Ravenzwaaij and Ioannidis (2019) compared true and false positive rates for the usage of p-values, frequentist meta-analytic confidence intervals, and BFs considering similar cases as in 2017. The results suggested that BFs outperformed both p-values and frequentist meta-analytic confidence intervals, meaning that the true positive rate for a given false positive rate was higher. In other words, BFs more precisely captured whether an effect was present or not.  Overall, BFs outdid p-values in correctly reflecting evidence regarding the underlying effect. BFs may be considered a good alternative compared to the traditional “two positive clinical trials” rule and could improve the FDA endorsement process.

Not only simulations, but also re-analyses of efficacy trials associated with medication endorsed by the FDA (e.g., Monden et al., 2016, 2018), suggest the BFs would provide a good alternative to p-value in the process of evaluating drug efficacy . In my dissertation I aim to build on this work and illustrate advantages of adopting the Bayesian framework over the traditional NHST framework in the FDA endorsement process. To this end, I will be extracting data for several groups of medication (e.g., psycho-active drugs, medication approved for the treatment of cancer, medication for physical conditions such lipid regulators, narcotic analgesics, beta blockers, and ace inhibitors), from the FDA database (freely available at Using BFs and Bayesian meta-analyses, I will be examining the following questions:

  • How often are new medications endorsed based on a subset of significant trial data (quantified with p<.05), where the majority of clinical trials is non-significant (quantified with p>.05)?
  • What is the typical strength of evidence quantified by BFs in favor of efficacy for existing medication, taking both significant and non-significant trials into account?
  • How often would we reach a similar or different decision than the FDA when considering BFs?

These are important questions when considering how to quantify strength of evidence in favor of a medicine’s efficacy.

Prof. dr. R.R. Meijer, dr. D. van Ravenzwaaij, dr. Y.A. de Vries

Financed by
VIDI fellowship grant dr. Don van Ravenzwaaij


1 September 2019 – 31 August 2022