Replication, Preregistration & Open Science

\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]

relevant concepts covered today

part 1

(discrete) probability distributions

\(p\)-value

statistical significance

part 2

Bayes rule

\(\alpha\)- and \(\beta\)-errors

Ioannidis (2005): "Why most false"

probability: discrete

definition

a discrete probability distribution over a finite set of mutually exclusive world states \(\States\) is a function \(P \colon \States \rightarrow [0;1]\) such that \(P(\States)=1\).

for finite \(\States\), \(P(\state)\) is \(\state\)'s probability mass

example

okay, winter is coming; but who will sit the Iron Throne next spring?

## Targaryen Lannister Baratheon   Greyjoy     Stark 
##     0.375     0.188     0.062     0.125     0.250

notation

if \(f \colon \States \rightarrow \mathbb{R}^{\ge0}\), then

\[ P(\state) \propto f(\state) \]

is shorthand notation for

\[ P(\state) = \frac{f(\state)}{ \sum_{\state' \in \States} f(\state')} \]

binomial

the binomial distribution gives the probability of observing \(k\) successes in \(n\) coin flips with a bias of \(\theta\):

\[ B(k ; n,\theta) = \binom{n}{k} \theta^{k} \, (1-\theta)^{n-k} \]

binomial

probability of observing at least one success in \(n\) coin flips with bias \(\theta\):

\[ \begin{align*} B(k > 0; n,\theta) & = 1 - B(k = 0; n,\theta) \\ & = 1 - \binom{n}{0} \theta^{0} \, (1-\theta)^{n-0} \\ & = 1 - (1-\theta)^{n} \\ \end{align*} \]

this will be useful later

\(p\)-values

quiz

\(p\)-values quantify subjective confidence that the null hypothesis is true/false
\(p\)-values quantify objective evidence for/against the null hypothesis
\(p\)-values quantify how surprising the data is under the null hypothsis
\(p\)-values quantify probability that the null hypothesis is true/false
non-significant \(p\)-values mean that the null hypothesis is true
significant \(p\)-values mean that the null hypothesis is false
significant \(p\)-values mean that the null hypothesis is unlikely

which of these are true?

wide-spread confusion

many statistical concepts are misunderstood by the majority of researchers

Oakes (1986): \(p\)-values are misinterpreted all the time
Haller & Kraus (2002): also true for senior researchers and statistics teachers
Hoekstra et al. (2014): the same holds true for confidence intervals

plot from Haller [from Haller & Kraus (2002)]

\(p\)-values

the \(p\)-value is the probability of observing, under infinite hypothetical repetitions of the same experiment, a less extreme value of a test statistic than that of the oberved data, given that the null hypothesis is true

wagenmakers diagram

requirements

null hypothesis \(H_0\)
- e.g., the coin is fair
- e.g., the mean and variance of measure \(X\) is the same in two groups

sampling distribution \(P(x \mid H_0)\)
- how likely is observation \(x\) under \(H_0\)?
- NB: this requires fixing the space \(X\) of possible \(x\) (in order to normalize)

test statistic \(t(x)\)
- any single-valued function \(t(x)\) that characterizes \(x\) in a relevant or helpful way
  - convention: \(t(x_1) > t(x_2)\) iff \(x_1\) is more extreme that \(x_2\)
- for an exact test we take the likelihood of \(x\) under \(H_0\): \(t(x) = P(x \mid H_0)\)

definition

in the general case, the \(p\)-value of observation \(x\) under null hypothesis \(H_0\), with sample space \(X\), sampling distribution \(P(\cdot \mid H_0) \in \Delta(X)\) and test statistic \(t \colon X \rightarrow \mathbb{R}\) is:

\[ p(x ; H_0, X, P(\cdot \mid H_0), t) = \int_{\left\{ \tilde{x} \in X \ \mid \ t(\tilde{x}) \ge t(x) \right\}} P(\tilde{x} \mid H_0) \ \text{d}\tilde{x}\]

intuitive slogan: probability of at least as extreme outcomes

for an exact test we get:

\[ p(x ; H_0, X, P(\cdot \mid H_0)) = \int_{\left\{ \tilde{x} \in X \ \mid \ P(\tilde{x} \mid H_0) \le P(x \mid H_0) \right\}} P(\tilde{x} \mid H_0) \ \text{d}\tilde{x}\]

intuitive slogan: probability of at least as unlikely outcomes

notation: \(\Delta(X)\) – set of all probability measures over \(X\)

example

fair coin?

data: we flip \(n=24\) times and observe \(k = 7\) successes
null hypothesis: \(\theta = 0.5\)
sampling distribution: binomial distribution

\[ B(k ; n = 24, \theta = 0.5) = \binom{n}{k} \theta^{k} \, (1-\theta)^{n-k} \]

hypothesis test in R

binom.test(7,24)

## 
##  Exact binomial test
## 
## data:  7 and 24
## number of successes = 7, number of trials = 24, p-value = 0.06391
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.1261521 0.5109478
## sample estimates:
## probability of success 
##              0.2916667

binom.test(7,24)$p.value

## [1] 0.06391466

significance

fix a significance level, e.g.: \(0.05\)

we say that a test result is significant iff the \(p\)-value is below the pre-determined significance level

we reject the null hypothesis in case of significant test results

the significance level thereby determines the \(\alpha\)-error of falsely rejecting the null hypothesis

aka: type-I error / incorrect rejection / false positive