\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]

road map for today

 

  • review \(p\)-values & confidence intervals

 

  • digest \(p\)-problems identified by Wagenmakers (2007)

 

  • brief introduction to Rmarkdown & reproducible research

\(p\)-values

\(p\)-values

the \(p\)-value is the probability of observing, under infinite hypothetical repetitions of the same experiment, a less extreme value of a test statistic than that of the oberved data, given that the null hypothesis is true

wagenmakers diagram

requirements

 

  • null hypothesis \(H_0\)
    • e.g., the coin is fair
    • e.g., the mean and variance of measure \(X\) is the same in two groups

 

  • sampling distribution \(P(x \mid H_0)\)
    • how likely is observation \(x\) under \(H_0\)?
    • NB: this requires fixing the space \(X\) of possible \(x\) (in order to normalize)

 

  • test statistic \(t(x)\)
    • any single-valued function \(t(x)\) that characterizes \(x\) in a relevant or helpful way
      • convention: \(t(x_1) > t(x_2)\) iff \(x_1\) is more extreme that \(x_2\)
    • for an exact test we take the likelihood of \(x\) under \(H_0\): \(t(x) = P(x \mid H_0)\)

definition

 

in the general case, the \(p\)-value of observation \(x\) under null hypothesis \(H_0\), with sample space \(X\), sampling distribution \(P(\cdot \mid H_0) \in \Delta(X)\) and test statistic \(t \colon X \rightarrow \mathbb{R}\) is:

\[ p(x ; H_0, X, P(\cdot \mid H_0), t) = \int_{\left\{ \tilde{x} \in X \ \mid \ t(\tilde{x}) \ge t(x) \right\}} P(\tilde{x} \mid H_0) \ \text{d}\tilde{x}\]

intuitive slogan: probability of at least as extreme outcomes

 

for an exact test we get:

\[ p(x ; H_0, X, P(\cdot \mid H_0)) = \int_{\left\{ \tilde{x} \in X \ \mid \ P(\tilde{x} \mid H_0) \le P(x \mid H_0) \right\}} P(\tilde{x} \mid H_0) \ \text{d}\tilde{x}\]

intuitive slogan: probability of at least as unlikely outcomes

notation: \(\Delta(X)\) – set of all probability measures over \(X\)

example

fair coin?

  • data: we flip \(n=24\) times and observe \(k = 7\) successes
  • null hypothesis: \(\theta = 0.5\)
  • sampling distribution: binomial distribution

\[ B(k ; n = 24, \theta = 0.5) = \binom{n}{k} \theta^{k} \, (1-\theta)^{n-k} \]

hypothesis test in R

binom.test(7,24)
## 
##  Exact binomial test
## 
## data:  7 and 24
## number of successes = 7, number of trials = 24, p-value = 0.06391
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.1261521 0.5109478
## sample estimates:
## probability of success 
##              0.2916667

 

binom.test(7,24)$p.value
## [1] 0.06391466

Monte Carlo simulation

use a large number of random samples to approximate the solution to a difficult problem

# repeat 24 flips of a fair coin 20,000 times
n.samples = 20000
x.reps = map_int(1:n.samples, function(i) sum(sample(x = 0:1, size = 24, replace = T, prob = c(0.5, 0.5))))   
ggplot(data.frame(k = x.reps), aes(x = k)) + geom_histogram(binwidth = 1)

MC simulated \(p\)-value

x.reps.prob = dbinom(x.reps, 24, 0.5) ## Bernoulli likelihood under H_0
sum(x.reps.prob <= dbinom(7, 24, 0.5)) /  n.samples
## [1] 0.0618
p.value.sequence = cumsum(x.reps.prob <= dbinom(7, 24, 0.5)) / 1:n.samples
tibble(iteration = 1:n.samples, p.value = cumsum(x.reps.prob <= dbinom(7, 24, 0.5)) / 1:n.samples) %>% 
  ggplot(aes(x = iteration, y = p.value)) + geom_line()

significance

 

fix a significance level, e.g.: \(0.05\)

 

we say that a test result is significant iff the \(p\)-value is below the pre-determined significance level

 

we reject the null hypothesis in case of significant test results

 

the significance level thereby determines the \(\alpha\)-error of falsely rejecting the null hypothesis

  • aka: type-I error / incorrect rejection / false positive

confidence intervals

confidence interval

 

let \(H_0^{ \theta = z}\) be the null hypothesis that assumes that parameter \(\theta = z\)

fix sampling distribution \(P(\cdot \mid H_0^{ \theta = z})\) and test statistic \(t\) as before

the level \((1 - \alpha)\) confidence interval for outcome \(x\) is the biggest interval \([a, b]\) such that:

\[ p(x ; H_0^{\theta = z}) > \alpha \ \ \ \text{, for all } z \in [a;b]\]

intuitive slogan: range of values that we would not reject

great visualization

what do we learn from a CI?

 

  1. range of values we would not reject (at the given significance level)
    yes
  2. range of values that would not make the data surprising (at the given level)
    yes
  3. range of values that are most likely given the data
    no
  4. range of values that it is rational to believe in / bet on
    no
  5. that the true value lies in this interval for sure
    no
  6. that the true value is likely in this interval
    well
  7. that, if we repeat the experiment, the outcome will likely lie in this interval
    no

\(p\)-problems

3 problems

 

  • \(p\) depends on unobserved data

 

  • \(p\) depends on subjective intentions

 

  • \(p\) does not quantify evidence

Wagenmakers (2007)

stop at \(n = 24\)

fair coin?

  • data: we decide to flip \(n=24\) times and observe \(k = 7\) successes
  • null hypothesis: \(\theta = 0.5\)
  • sampling distribution: binomial distribution

\[ B(k ; n = 24, \theta = 0.5) = \binom{n}{k} \theta^{k} \, (1-\theta)^{n-k} \]

stop at \(k = 7\)

fair coin?

  • data: we decide to flip until \(k=7\) and have to flip \(n=24\) times
  • null hypothesis: \(\theta = 0.5\)
  • sampling distribution: negative binomial distribution

\[ NB(n ; k = 7, \theta = 0.5) = \frac{k}{n} \binom{n}{k} \theta^{k} \, (1-\theta)^{n - k}\]

what is the "same experiment" in a different possible world?

 

what does it mean to repeat an experiment?

 

tons of gruesome scenarios:

  • exclusion criteria for participants settled after seeing the data
  • inability to use data when the way of obtaining it is unknown
    • who can you trust?
    • what's the sampling distribution for large-scale surveys, linguistic corpora…?
  • sampling protocol dependent on external circumstance (funding, motivation, …)
  • ex post unverfifiable researcher reports: was this really what they intended to do?

the method is not to blame for the abuse?

Rmarkdown

why Rmarkdown

 

  • prepare, analyze & plot data right inside your document

  • hand over all of your work in one single, easily executable chunk
    • support reproducible and open research
  • export to a variety of different formats

Rmarkdown formats

flow of information

 

Rmarkdown info flow

 

Rmarkdown formats

markdown

headers & sections

# header 1
## header 2
### header 3

emphasis, highlighting etc.

*italics* or _italics_
**bold** or __italics__
~~strikeout~~

links

[link](https://www.google.com)

inline code & code blocks

`function(x) return(x - 1)`

cheat sheet

Rmarkdown

extension of markdown to dynamically integrate R output

multiple output formats:

  • HTML pages, HTML slides (here), …
  • PDF, LaTeX, Word, …

cheat sheet and a quick tour

supports LaTeX

inline equations with $\theta$

equation blocks with

$$ \begin{align*} E &= mc^2 \\
& = \text{a really smart forumla}
\end{align*} $$

 

caveat

LaTeX-style formulas will be rendered differently depending on the output method:

  • PDF-LaTeX gives you genuine LaTeX with (almost) all abilities
  • HTML output uses MathJax to emulate LaTeX-like behavior
    • only LaTeX-packages & functionality emulated in JS will be available

Rmarkdown in your homework

 

do it all in one file BDA+CM_HW1_YOURLASTNAME.Rmd

use a header that generate HTML files like this:

---
title: "My flawless first homework set"
date: 2017-05-8
output: html_document
---

have all code and plots show at the appropriate place in between your text answers which explain the code and the text

send the *.Rmd and the *.HTML

avoid using extra material not included in the *.Rmd

fini

outlook

 

Tuesday

  • introduction to a Bayesian approach to statistical inference

 

Friday

  • introduction to MCMC methods

to prevent boredom

 

obligatory

  • prepare Kruschke chapters 5 & 6

  • start on your first homework set
    • ask questions on Tuesday in class!