Bayesian data analysis & cognitive modeling

\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]

introduction

3 pillars of BDA (recap)

parameter estimation:

\[\underbrace{P(\theta \, | \, D)}_{posterior} \propto \underbrace{P(\theta)}_{prior} \ \underbrace{P(D \, | \, \theta)}_{likelihood}\]

model comparison

\[\underbrace{\frac{P(M_1 \mid D)}{P(M_2 \mid D)}}_{\text{posterior odds}} = \underbrace{\frac{P(D \mid M_1)}{P(D \mid M_2)}}_{\text{Bayes factor}} \ \underbrace{\frac{P(M_1)}{P(M_2)}}_{\text{prior odds}}\]

prediction

prior predictive

\[ P(D) = \int P(\theta) \ P(D \mid \theta) \ \text{d}\theta \]

posterior predictive

\[ P(D \mid D') = \int P(\theta \mid D') \ P(D \mid \theta) \ \text{d}\theta \]

why model predictions?

model-based beliefs about what is likely to happen
- practical decision making
model comparison
- information criteria
- Bayes factor
model criticism
- is the model any good?
- given belief in the model, should we be shocked by this new observation?
  [ \(p\)-values ]

overview

	estimation	comparison	criticism
goal	which \(\theta\), given \(M\) & \(D\)?	which better: \(M_0\) or \(M_1\)?	\(M\) good model of \(D\)?
method	Bayes rule	Bayes factor	\(p\)-value
no. of models	1	2	1
\(H_0\)	subset of \(\theta\)	\(P(\theta \mid M_0), P(D \mid \theta, M_0)\)	\(P(\theta), P(D \mid \theta)\)
\(H_1\)	—	\(P(\theta \mid M_1), P(D \mid \theta, M_1)\)	—
prerequisites	\(P(\theta), \alpha \times P(D \mid \theta)\)	—	test statistic
pros	lean, easy	intuitive, plausible, Ockham's razor	absolute
cons	vagueness in ROPE	prior dependence, computational load	sample space?

road map for today

NHST

inference, HDIs & ROPEs
nested model comparison
\(p\)-values

model criticism

posterior predictive checks
prior/posterior \(p\)-values

testing a null

up next

compare 3 methods for testing a null hypothesis:

\(p\)-values
parameter inference with ROPEs
nested model comparison

running example:

\(k=7\), \(N=24\) \(\rightarrow\) \(\theta = 0.5?\)

boring

NHST by \(p\)-value

data: we flip \(n=24\) times and observe \(k = 7\) successes
null hypothesis: \(\theta = 0.5\)
sampling distribution: binomial distribution

\[ B(k ; n = 24, \theta = 0.5) = \binom{n}{k} \theta^{k} \, (1-\theta)^{n-k} \]

\[p(d_{\text{obs}}) = P \left (D \in \{d \mid P(d \mid H_0) \le P(d_{\text{obs}} \mid H_0) \} \right )\]

binom.test(7,24)$p.value

## [1] 0.06391466

posterior inference

observed: \(k = 7\) out of \(N = 24\) flips came up heads
goal: estimate \(P(\theta \mid D)\) & determine posterior 95% HDI

ROPEs and credible values

regions of practical equivalence

small regions \([\theta - \epsilon, \theta + \epsilon]\) around each \(\theta\)
- values (practically) indistinguishable from \(\theta\)

credible values

value \(\theta\) is rejectable if its ROPE lies entirely outside of posterior HDI
value \(\theta\) is believable if its ROPE lies entirely whithin posterior HDI

NHST by ROPE for our example

\(\theta = 0.5\) is rejectable for all ROPEs with ca. \(\epsilon \le 0.02\)

Bayes factors for NHST

\(M_0\): \(\theta = 0.5\) & \(k \sim \text{Binomial}(0.5, N)\)
\(M_1\): \(\theta \sim \text{Beta}(1,1)\) & \(k \sim \text{Binomial}(\theta, N)\)

straightforward

\[ \begin{align*} \text{BF}(M_0 > M_1) & = \frac{P(D \mid M_0)}{P(D \mid M_1)} \\ & = \frac{\binom{N}{k} 0.5^{k} \, (1-0.5)^{N - k}}{\int_0^1 \binom{N}{k} \theta^{k} \, (1-\theta)^{N - k} \text{ d}\theta} \\ & \approx 0.516 \end{align*} \]

Savage-Dickey

summary

method	result	interpretation
\(p\)-value	\(p \approx 0.064\)	do not reject \(H_0\)
HDI+ROPE	\(\text{HDI} \approx [0.14;0.48]\)	do not adopt \(H_0\) (depends on \(\epsilon\))
Bayes Factor	\(\text{BF}(M_0 > M_1) \approx 0.561\)	mini-evidence in favor of \(H_1\)

comparison

Jeffreys-Lindley paradox

"paradox": two established methods give contradictory results

k = 49581
N = 98451

\(p\)-value NHST

binom.test(k, N)$p.value

## [1] 0.02364686

reject \(H_0\)

Savage-Dickey BF

dbeta(0.5, k+1, N - k + 1)

## [1] 19.21139

strong evidence in favor of \(H_0\)

simulation

let the true bias be \(\theta = 0.5\)
generate all possible outcomes \(k\) keeping \(N\) fixed
- \(N \in \{ 10, 100, 1000, 10000, 100000 \}\)
- true frequency of \(k\) is \(Binomial(k \mid N, \theta = 0.5)\)
look at the frequency of test results, coded thus:

	estimation	comparison	criticism
\(M_0\)	\([.5-\epsilon, 0.5+\epsilon] \sqsubseteq\) 95% HDI or v.v.	BF(\(M_0\)>\(M_1\)) > 6	\(p\) > 0.05
\(M_1\)	\([.5-\epsilon, 0.5+\epsilon] \, \cap \,\) 95% HDI \(=\emptyset\)	BF(\(M_1\)>\(M_0\)) > 6	\(p\) <= 0.05
??	otherwise	otherwise	never

(more on this here)

results (proportion of \(\alpha\)-errors)

BF selects \(H_0\) correctly with prob. 0.986 for \(N = 10000\), and with 0.996 for \(N = 100000\).

[c.f., Lindley's solution to 'paradox': adjust \(p\) depending on \(N\); similar for ROPE's \(\epsilon\)]

interlude

on the importance of tracking positive evidence

video

model criticism

motivation

parameter estimation: what \(\theta\) to believe in?
model comparison: which model is better than another?
model criticism: is a given model plausible (enough)?

posterior predictive checks

graphically compare simulated observations with actual observation

Bayesian predictive \(p\)-values

measure surprise level of data under a model

[think: \(p\)-value for a non-trivial, serious model with potential uncertainty about parameters]

posterior predictive checks

exponential forgetting model

y = c(.94, .77, .40, .26, .24, .16)
t = c(  1,   3,   6,   9,  12,  18)
obs = y*100

## PSEUDO-CODE!
priors {
  a ~ dunif(0,1.5)
  b ~ dunif(0,1.5)
}
likelihood {
  p[i] = min(max( a*exp(-t[i]*b), 0.0001), 0.9999)
  obs[i] ~ dbinom(p[i], 100)    # data is given, so this (implicitly) conditions on data
}
generated predictions{
  # sample one imaginary outcome for current parameter values
  obsRep[i] = sample_from_binomial(p[i], 100) 
}

PPC: exponential model

black dots: data
blue dots: mean of replicated fake data
blue bars: 95% HDIs of replicated fake data

PPC: power model

black dots: data
blue dots: mean of replicated fake data
blue bars: 95% HDIs of replicated fake data

Bayesian predictive model criticism

\(p\)-value for \(H_0\)

\[p(d_{\text{obs}}) = P \left (D \in \{d \mid P(d \mid H_0) \le P(d_{\text{obs}} \mid H_0) \} \right )\]

generalization to arbitrary model

\[p(d_{\text{obs}}) = P \left (D \in \{d \mid P(d \mid M) \le P(d_{\text{obs}} \mid M) \} \right )\]

with \(M = \langle P(\theta), P(D \mid \theta) \rangle\) ::: prior predictive \(p\)-value
with \(M = \langle P(\theta \mid d_{\text{obs}}), P(D \mid \theta) \rangle\) ::: prior predictive \(p\)-value

example

obs = c(1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0)
k = sum(obs) # 7
N = length(obs) #20

\(p\)-value NHST:

do not reject NH \(\theta = 0.5\)

binom.test(k, N, 0.5)$p.value

## [1] 0.263176

Bayesian posterior predictive \(p\)-value

test statistic: no. switches 1 <-> 0
\(t(d^*)\) = 2
PP-\(p\)-value \(\approx 0.028\)

pppvalue

Gelman et al. 2014, p.147–8

summary

	estimation	comparison	criticism
goal	which \(\theta\), given \(M\) & \(D\)?	which better: \(M_0\) or \(M_1\)?	\(M\) good model of \(D\)?
method	Bayes rule	Bayes factor	\(p\)-value
no. of models	1	2	1
\(H_0\)	subset of \(\theta\)	\(P(\theta \mid M_0), P(D \mid \theta, M_0)\)	\(P(\theta), P(D \mid \theta)\)
\(H_1\)	—	\(P(\theta \mid M_1), P(D \mid \theta, M_1)\)	—
prerequisites	\(P(\theta), \alpha \times P(D \mid \theta)\)	—	test statistic
pros	lean, easy	intuitive, plausible, Ockham's razor	absolute
cons	vagueness in ROPE	prior dependence, computational load	sample space?

outlook

Friday

bootcampling GCM (Lee & Wagenmakers ch. 17)

Tuesday

introduction to Stan