$\definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}}$ $\definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}}$ $\newcommand{\set}[1]{\{#1\}}$ $\newcommand{\tuple}[1]{\langle#1\rangle}$ $\newcommand{\States}{{T}}$ $\newcommand{\state}{{t}}$ $\newcommand{\pow}[1]{{\mathcal{P}(#1)}}$

## 3 pillars of BDA (recap)

parameter estimation:

$\underbrace{P(\theta \, | \, D)}_{posterior} \propto \underbrace{P(\theta)}_{prior} \ \underbrace{P(D \, | \, \theta)}_{likelihood}$

model comparison

$\underbrace{\frac{P(M_1 \mid D)}{P(M_2 \mid D)}}_{\text{posterior odds}} = \underbrace{\frac{P(D \mid M_1)}{P(D \mid M_2)}}_{\text{Bayes factor}} \ \underbrace{\frac{P(M_1)}{P(M_2)}}_{\text{prior odds}}$

prediction

prior predictive

$P(D) = \int P(\theta) \ P(D \mid \theta) \ \text{d}\theta$

posterior predictive

$P(D \mid D') = \int P(\theta \mid D') \ P(D \mid \theta) \ \text{d}\theta$

## why model predictions?

1. model-based beliefs about what is likely to happen
• practical decision making
2. model comparison
• information criteria
• Bayes factor
3. model criticism
• is the model any good?
• given belief in the model, should we be shocked by this new observation?
[ $$p$$-values ]

## overview

estimation comparison criticism
goal which $$\theta$$, given $$M$$ & $$D$$? which better: $$M_0$$ or $$M_1$$? $$M$$ good model of $$D$$?
method Bayes rule Bayes factor $$p$$-value
no. of models 1 2 1
$$H_0$$ subset of $$\theta$$ $$P(\theta \mid M_0), P(D \mid \theta, M_0)$$ $$P(\theta), P(D \mid \theta)$$
$$H_1$$ $$P(\theta \mid M_1), P(D \mid \theta, M_1)$$
prerequisites $$P(\theta), \alpha \times P(D \mid \theta)$$ test statistic
pros lean, easy intuitive, plausible, Ockham's razor absolute
cons vagueness in ROPE prior dependence, computational load sample space?

NHST

• inference, HDIs & ROPEs
• nested model comparison
• $$p$$-values

model criticism

• posterior predictive checks
• prior/posterior $$p$$-values

## up next

compare 3 methods for testing a null hypothesis:

1. $$p$$-values
2. parameter inference with ROPEs
3. nested model comparison

running example:

$$k=7$$, $$N=24$$ $$\rightarrow$$ $$\theta = 0.5?$$

## NHST by $$p$$-value

• data: we flip $$n=24$$ times and observe $$k = 7$$ successes
• null hypothesis: $$\theta = 0.5$$
• sampling distribution: binomial distribution

$B(k ; n = 24, \theta = 0.5) = \binom{n}{k} \theta^{k} \, (1-\theta)^{n-k}$

$p(d_{\text{obs}}) = P \left (D \in \{d \mid P(d \mid H_0) \le P(d_{\text{obs}} \mid H_0) \} \right )$

binom.test(7,24)p.value ## [1] 0.06391466 ## posterior inference • observed: $$k = 7$$ out of $$N = 24$$ flips came up heads • goal: estimate $$P(\theta \mid D)$$ & determine posterior 95% HDI ## ROPEs and credible values regions of practical equivalence • small regions $$[\theta - \epsilon, \theta + \epsilon]$$ around each $$\theta$$ • values (practically) indistinguishable from $$\theta$$ credible values • value $$\theta$$ is rejectable if its ROPE lies entirely outside of posterior HDI • value $$\theta$$ is believable if its ROPE lies entirely whithin posterior HDI NHST by ROPE for our example $$\theta = 0.5$$ is rejectable for all ROPEs with ca. $$\epsilon \le 0.02$$ ## Bayes factors for NHST • $$M_0$$: $$\theta = 0.5$$ & $$k \sim \text{Binomial}(0.5, N)$$ • $$M_1$$: $$\theta \sim \text{Beta}(1,1)$$ & $$k \sim \text{Binomial}(\theta, N)$$ straightforward \begin{align*} \text{BF}(M_0 > M_1) & = \frac{P(D \mid M_0)}{P(D \mid M_1)} \\ & = \frac{\binom{N}{k} 0.5^{k} \, (1-0.5)^{N - k}}{\int_0^1 \binom{N}{k} \theta^{k} \, (1-\theta)^{N - k} \text{ d}\theta} \\ & \approx 0.516 \end{align*} Savage-Dickey ## summary method result interpretation $$p$$-value $$p \approx 0.064$$ do not reject $$H_0$$ HDI+ROPE $$\text{HDI} \approx [0.14;0.48]$$ do not adopt $$H_0$$ (depends on $$\epsilon$$) Bayes Factor $$\text{BF}(M_0 > M_1) \approx 0.561$$ mini-evidence in favor of $$H_1$$ ## comparison ## Jeffreys-Lindley paradox "paradox": two established methods give contradictory results k = 49581 N = 98451 $$p$$-value NHST binom.test(k, N)p.value
## [1] 0.02364686

reject $$H_0$$

Savage-Dickey BF

dbeta(0.5, k+1, N - k + 1)
## [1] 19.21139

strong evidence in favor of $$H_0$$

## simulation

• let the true bias be $$\theta = 0.5$$
• generate all possible outcomes $$k$$ keeping $$N$$ fixed
• $$N \in \{ 10, 100, 1000, 10000, 100000 \}$$
• true frequency of $$k$$ is $$Binomial(k \mid N, \theta = 0.5)$$
• look at the frequency of test results, coded thus:

estimation comparison criticism
$$M_0$$ $$[.5-\epsilon, 0.5+\epsilon] \sqsubseteq$$ 95% HDI or v.v. BF($$M_0$$>$$M_1$$) > 6 $$p$$ > 0.05
$$M_1$$ $$[.5-\epsilon, 0.5+\epsilon] \, \cap \,$$ 95% HDI $$=\emptyset$$ BF($$M_1$$>$$M_0$$) > 6 $$p$$ <= 0.05
?? otherwise otherwise never

(more on this here)

## results (proportion of $$\alpha$$-errors)

BF selects $$H_0$$ correctly with prob. 0.986 for $$N = 10000$$, and with 0.996 for $$N = 100000$$.

[c.f., Lindley's solution to 'paradox': adjust $$p$$ depending on $$N$$; similar for ROPE's $$\epsilon$$]

## motivation

• parameter estimation: what $$\theta$$ to believe in?
• model comparison: which model is better than another?
• model criticism: is a given model plausible (enough)?

posterior predictive checks

graphically compare simulated observations with actual observation

Bayesian predictive $$p$$-values

measure surprise level of data under a model

[think: $$p$$-value for a non-trivial, serious model with potential uncertainty about parameters]

## posterior predictive checks

exponential forgetting model

y = c(.94, .77, .40, .26, .24, .16)
t = c(  1,   3,   6,   9,  12,  18)
obs = y*100
## PSEUDO-CODE!
priors {
a ~ dunif(0,1.5)
b ~ dunif(0,1.5)
}
likelihood {
p[i] = min(max( a*exp(-t[i]*b), 0.0001), 0.9999)
obs[i] ~ dbinom(p[i], 100)    # data is given, so this (implicitly) conditions on data
}
generated predictions{
# sample one imaginary outcome for current parameter values
obsRep[i] = sample_from_binomial(p[i], 100)
}

## PPC: exponential model

• black dots: data
• blue dots: mean of replicated fake data
• blue bars: 95% HDIs of replicated fake data

## PPC: power model

• black dots: data
• blue dots: mean of replicated fake data
• blue bars: 95% HDIs of replicated fake data

## Bayesian predictive model criticism

$$p$$-value for $$H_0$$

$p(d_{\text{obs}}) = P \left (D \in \{d \mid P(d \mid H_0) \le P(d_{\text{obs}} \mid H_0) \} \right )$

generalization to arbitrary model

$p(d_{\text{obs}}) = P \left (D \in \{d \mid P(d \mid M) \le P(d_{\text{obs}} \mid M) \} \right )$

• with $$M = \langle P(\theta), P(D \mid \theta) \rangle$$ ::: prior predictive $$p$$-value

• with $$M = \langle P(\theta \mid d_{\text{obs}}), P(D \mid \theta) \rangle$$ ::: prior predictive $$p$$-value

## example

obs = c(1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0)
k = sum(obs) # 7
N = length(obs) #20

$$p$$-value NHST:

• do not reject NH $$\theta = 0.5$$
binom.test(k, N, 0.5)\$p.value
## [1] 0.263176

Bayesian posterior predictive $$p$$-value

• test statistic: no. switches 1 <-> 0
• $$t(d^*)$$ = 2
• PP-$$p$$-value $$\approx 0.028$$

Gelman et al. 2014, p.147–8

## summary

estimation comparison criticism
goal which $$\theta$$, given $$M$$ & $$D$$? which better: $$M_0$$ or $$M_1$$? $$M$$ good model of $$D$$?
method Bayes rule Bayes factor $$p$$-value
no. of models 1 2 1
$$H_0$$ subset of $$\theta$$ $$P(\theta \mid M_0), P(D \mid \theta, M_0)$$ $$P(\theta), P(D \mid \theta)$$
$$H_1$$ $$P(\theta \mid M_1), P(D \mid \theta, M_1)$$
prerequisites $$P(\theta), \alpha \times P(D \mid \theta)$$ test statistic
pros lean, easy intuitive, plausible, Ockham's razor absolute
cons vagueness in ROPE prior dependence, computational load sample space?

## outlook

Friday

• bootcampling GCM (Lee & Wagenmakers ch. 17)

Tuesday

• introduction to Stan