\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]

introduction

3 pillars of BDA (recap)

parameter estimation:

\[\underbrace{P(\theta \, | \, D)}_{posterior} \propto \underbrace{P(\theta)}_{prior} \ \underbrace{P(D \, | \, \theta)}_{likelihood}\]

 

model comparison

\[\underbrace{\frac{P(M_1 \mid D)}{P(M_2 \mid D)}}_{\text{posterior odds}} = \underbrace{\frac{P(D \mid M_1)}{P(D \mid M_2)}}_{\text{Bayes factor}} \ \underbrace{\frac{P(M_1)}{P(M_2)}}_{\text{prior odds}}\]

 

prediction

prior predictive

\[ P(D) = \int P(\theta) \ P(D \mid \theta) \ \text{d}\theta \]

 

posterior predictive

\[ P(D \mid D') = \int P(\theta \mid D') \ P(D \mid \theta) \ \text{d}\theta \]

why model predictions?

 

  1. model-based beliefs about what is likely to happen
    • practical decision making
  2. model comparison
    • information criteria
    • Bayes factor
  3. model criticism
    • is the model any good?
    • given belief in the model, should we be shocked by this new observation?
      [ \(p\)-values ]

overview

estimation comparison criticism
goal which \(\theta\), given \(M\) & \(D\)? which better: \(M_0\) or \(M_1\)? \(M\) good model of \(D\)?
method Bayes rule Bayes factor \(p\)-value
no. of models 1 2 1
\(H_0\) subset of \(\theta\) \(P(\theta \mid M_0), P(D \mid \theta, M_0)\) \(P(\theta), P(D \mid \theta)\)
\(H_1\) \(P(\theta \mid M_1), P(D \mid \theta, M_1)\)
prerequisites \(P(\theta), \alpha \times P(D \mid \theta)\) test statistic
pros lean, easy intuitive, plausible, Ockham's razor absolute
cons vagueness in ROPE prior dependence, computational load sample space?

road map for today

 

NHST

  • inference, HDIs & ROPEs
  • nested model comparison
  • \(p\)-values

model criticism

  • posterior predictive checks
  • prior/posterior \(p\)-values

testing a null

up next

 

compare 3 methods for testing a null hypothesis:

  1. \(p\)-values
  2. parameter inference with ROPEs
  3. nested model comparison

 

running example:

\(k=7\), \(N=24\) \(\rightarrow\) \(\theta = 0.5?\)

boring

NHST by \(p\)-value

  • data: we flip \(n=24\) times and observe \(k = 7\) successes
  • null hypothesis: \(\theta = 0.5\)
  • sampling distribution: binomial distribution

\[ B(k ; n = 24, \theta = 0.5) = \binom{n}{k} \theta^{k} \, (1-\theta)^{n-k} \]

 

\[p(d_{\text{obs}}) = P \left (D \in \{d \mid P(d \mid H_0) \le P(d_{\text{obs}} \mid H_0) \} \right )\]

binom.test(7,24)$p.value
## [1] 0.06391466

posterior inference

  • observed: \(k = 7\) out of \(N = 24\) flips came up heads
  • goal: estimate \(P(\theta \mid D)\) & determine posterior 95% HDI

ROPEs and credible values

 

regions of practical equivalence

  • small regions \([\theta - \epsilon, \theta + \epsilon]\) around each \(\theta\)
    • values (practically) indistinguishable from \(\theta\)

 

credible values

  • value \(\theta\) is rejectable if its ROPE lies entirely outside of posterior HDI
  • value \(\theta\) is believable if its ROPE lies entirely whithin posterior HDI

 

NHST by ROPE for our example

\(\theta = 0.5\) is rejectable for all ROPEs with ca. \(\epsilon \le 0.02\)

Bayes factors for NHST

  • \(M_0\): \(\theta = 0.5\) & \(k \sim \text{Binomial}(0.5, N)\)
  • \(M_1\): \(\theta \sim \text{Beta}(1,1)\) & \(k \sim \text{Binomial}(\theta, N)\)

 

straightforward

\[ \begin{align*} \text{BF}(M_0 > M_1) & = \frac{P(D \mid M_0)}{P(D \mid M_1)} \\ & = \frac{\binom{N}{k} 0.5^{k} \, (1-0.5)^{N - k}}{\int_0^1 \binom{N}{k} \theta^{k} \, (1-\theta)^{N - k} \text{ d}\theta} \\ & \approx 0.516 \end{align*} \]

Savage-Dickey

summary

 

method result interpretation
\(p\)-value \(p \approx 0.064\) do not reject \(H_0\)
HDI+ROPE \(\text{HDI} \approx [0.14;0.48]\) do not adopt \(H_0\) (depends on \(\epsilon\))
Bayes Factor \(\text{BF}(M_0 > M_1) \approx 0.561\) mini-evidence in favor of \(H_1\)

comparison

Jeffreys-Lindley paradox

"paradox": two established methods give contradictory results

 

k = 49581
N = 98451

\(p\)-value NHST

binom.test(k, N)$p.value
## [1] 0.02364686

reject \(H_0\)

Savage-Dickey BF

dbeta(0.5, k+1, N - k + 1)
## [1] 19.21139

strong evidence in favor of \(H_0\)

simulation

  • let the true bias be \(\theta = 0.5\)
  • generate all possible outcomes \(k\) keeping \(N\) fixed
    • \(N \in \{ 10, 100, 1000, 10000, 100000 \}\)
    • true frequency of \(k\) is \(Binomial(k \mid N, \theta = 0.5)\)
  • look at the frequency of test results, coded thus:

   

estimation comparison criticism
\(M_0\) \([.5-\epsilon, 0.5+\epsilon] \sqsubseteq\) 95% HDI or v.v. BF(\(M_0\)>\(M_1\)) > 6 \(p\) > 0.05
\(M_1\) \([.5-\epsilon, 0.5+\epsilon] \, \cap \,\) 95% HDI \(=\emptyset\) BF(\(M_1\)>\(M_0\)) > 6 \(p\) <= 0.05
?? otherwise otherwise never

(more on this here)

results (proportion of \(\alpha\)-errors)

BF selects \(H_0\) correctly with prob. 0.986 for \(N = 10000\), and with 0.996 for \(N = 100000\).

[c.f., Lindley's solution to 'paradox': adjust \(p\) depending on \(N\); similar for ROPE's \(\epsilon\)]

interlude

on the importance of tracking positive evidence

model criticism

motivation

  • parameter estimation: what \(\theta\) to believe in?
  • model comparison: which model is better than another?
  • model criticism: is a given model plausible (enough)?

   

posterior predictive checks

graphically compare simulated observations with actual observation

Bayesian predictive \(p\)-values

measure surprise level of data under a model

[think: \(p\)-value for a non-trivial, serious model with potential uncertainty about parameters]

posterior predictive checks

exponential forgetting model

y = c(.94, .77, .40, .26, .24, .16)
t = c(  1,   3,   6,   9,  12,  18)
obs = y*100
## PSEUDO-CODE!
priors {
  a ~ dunif(0,1.5)
  b ~ dunif(0,1.5)
}
likelihood {
  p[i] = min(max( a*exp(-t[i]*b), 0.0001), 0.9999)
  obs[i] ~ dbinom(p[i], 100)    # data is given, so this (implicitly) conditions on data
}
generated predictions{
  # sample one imaginary outcome for current parameter values
  obsRep[i] = sample_from_binomial(p[i], 100) 
}

PPC: exponential model

  • black dots: data
  • blue dots: mean of replicated fake data
  • blue bars: 95% HDIs of replicated fake data

PPC: power model

  • black dots: data
  • blue dots: mean of replicated fake data
  • blue bars: 95% HDIs of replicated fake data

Bayesian predictive model criticism

\(p\)-value for \(H_0\)

\[p(d_{\text{obs}}) = P \left (D \in \{d \mid P(d \mid H_0) \le P(d_{\text{obs}} \mid H_0) \} \right )\]

generalization to arbitrary model

\[p(d_{\text{obs}}) = P \left (D \in \{d \mid P(d \mid M) \le P(d_{\text{obs}} \mid M) \} \right )\]

  • with \(M = \langle P(\theta), P(D \mid \theta) \rangle\) ::: prior predictive \(p\)-value

  • with \(M = \langle P(\theta \mid d_{\text{obs}}), P(D \mid \theta) \rangle\) ::: prior predictive \(p\)-value

example

obs = c(1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0)
k = sum(obs) # 7
N = length(obs) #20

\(p\)-value NHST:

  • do not reject NH \(\theta = 0.5\)
binom.test(k, N, 0.5)$p.value
## [1] 0.263176

Bayesian posterior predictive \(p\)-value

  • test statistic: no. switches 1 <-> 0
  • \(t(d^*)\) = 2
  • PP-\(p\)-value \(\approx 0.028\)

pppvalue

Gelman et al. 2014, p.147–8

summary

summary

 

estimation comparison criticism
goal which \(\theta\), given \(M\) & \(D\)? which better: \(M_0\) or \(M_1\)? \(M\) good model of \(D\)?
method Bayes rule Bayes factor \(p\)-value
no. of models 1 2 1
\(H_0\) subset of \(\theta\) \(P(\theta \mid M_0), P(D \mid \theta, M_0)\) \(P(\theta), P(D \mid \theta)\)
\(H_1\) \(P(\theta \mid M_1), P(D \mid \theta, M_1)\)
prerequisites \(P(\theta), \alpha \times P(D \mid \theta)\) test statistic
pros lean, easy intuitive, plausible, Ockham's razor absolute
cons vagueness in ROPE prior dependence, computational load sample space?

outlook

 

Friday

  • bootcampling GCM (Lee & Wagenmakers ch. 17)

 

Tuesday

  • introduction to Stan