\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]
parameter estimation:
\[\underbrace{P(\theta \, | \, D)}_{posterior} \propto \underbrace{P(\theta)}_{prior} \ \underbrace{P(D \, | \, \theta)}_{likelihood}\]
model comparison
\[\underbrace{\frac{P(M_1 \mid D)}{P(M_2 \mid D)}}_{\text{posterior odds}} = \underbrace{\frac{P(D \mid M_1)}{P(D \mid M_2)}}_{\text{Bayes factor}} \ \underbrace{\frac{P(M_1)}{P(M_2)}}_{\text{prior odds}}\]
prediction
prior predictive
\[ P(D) = \int P(\theta) \ P(D \mid \theta) \ \text{d}\theta \]
posterior predictive
\[ P(D \mid D') = \int P(\theta \mid D') \ P(D \mid \theta) \ \text{d}\theta \]
estimation | comparison | criticism | |
---|---|---|---|
goal | which \(\theta\), given \(M\) & \(D\)? | which better: \(M_0\) or \(M_1\)? | \(M\) good model of \(D\)? |
method | Bayes rule | Bayes factor | \(p\)-value |
no. of models | 1 | 2 | 1 |
\(H_0\) | subset of \(\theta\) | \(P(\theta \mid M_0), P(D \mid \theta, M_0)\) | \(P(\theta), P(D \mid \theta)\) |
\(H_1\) | — | \(P(\theta \mid M_1), P(D \mid \theta, M_1)\) | — |
prerequisites | \(P(\theta), \alpha \times P(D \mid \theta)\) | — | test statistic |
pros | lean, easy | intuitive, plausible, Ockham's razor | absolute |
cons | vagueness in ROPE | prior dependence, computational load | sample space? |
NHST
model criticism
compare 3 methods for testing a null hypothesis:
running example:
\(k=7\), \(N=24\) \(\rightarrow\) \(\theta = 0.5?\)
\[ B(k ; n = 24, \theta = 0.5) = \binom{n}{k} \theta^{k} \, (1-\theta)^{n-k} \]
\[p(d_{\text{obs}}) = P \left (D \in \{d \mid P(d \mid H_0) \le P(d_{\text{obs}} \mid H_0) \} \right )\]
binom.test(7,24)$p.value
## [1] 0.06391466
regions of practical equivalence
credible values
NHST by ROPE for our example
\(\theta = 0.5\) is rejectable for all ROPEs with ca. \(\epsilon \le 0.02\)
straightforward
\[ \begin{align*} \text{BF}(M_0 > M_1) & = \frac{P(D \mid M_0)}{P(D \mid M_1)} \\ & = \frac{\binom{N}{k} 0.5^{k} \, (1-0.5)^{N - k}}{\int_0^1 \binom{N}{k} \theta^{k} \, (1-\theta)^{N - k} \text{ d}\theta} \\ & \approx 0.516 \end{align*} \]
Savage-Dickey
method | result | interpretation |
---|---|---|
\(p\)-value | \(p \approx 0.064\) | do not reject \(H_0\) |
HDI+ROPE | \(\text{HDI} \approx [0.14;0.48]\) | do not adopt \(H_0\) (depends on \(\epsilon\)) |
Bayes Factor | \(\text{BF}(M_0 > M_1) \approx 0.561\) | mini-evidence in favor of \(H_1\) |
"paradox": two established methods give contradictory results
k = 49581 N = 98451
\(p\)-value NHST
binom.test(k, N)$p.value
## [1] 0.02364686
reject \(H_0\)
Savage-Dickey BF
dbeta(0.5, k+1, N - k + 1)
## [1] 19.21139
strong evidence in favor of \(H_0\)
estimation | comparison | criticism | |
---|---|---|---|
\(M_0\) | \([.5-\epsilon, 0.5+\epsilon] \sqsubseteq\) 95% HDI or v.v. | BF(\(M_0\)>\(M_1\)) > 6 | \(p\) > 0.05 |
\(M_1\) | \([.5-\epsilon, 0.5+\epsilon] \, \cap \,\) 95% HDI \(=\emptyset\) | BF(\(M_1\)>\(M_0\)) > 6 | \(p\) <= 0.05 |
?? | otherwise | otherwise | never |
BF selects \(H_0\) correctly with prob. 0.986 for \(N = 10000\), and with 0.996 for \(N = 100000\).
[c.f., Lindley's solution to 'paradox': adjust \(p\) depending on \(N\); similar for ROPE's \(\epsilon\)]
posterior predictive checks
graphically compare simulated observations with actual observation
Bayesian predictive \(p\)-values
measure surprise level of data under a model
[think: \(p\)-value for a non-trivial, serious model with potential uncertainty about parameters]
exponential forgetting model
y = c(.94, .77, .40, .26, .24, .16) t = c( 1, 3, 6, 9, 12, 18) obs = y*100
## PSEUDO-CODE! priors { a ~ dunif(0,1.5) b ~ dunif(0,1.5) } likelihood { p[i] = min(max( a*exp(-t[i]*b), 0.0001), 0.9999) obs[i] ~ dbinom(p[i], 100) # data is given, so this (implicitly) conditions on data } generated predictions{ # sample one imaginary outcome for current parameter values obsRep[i] = sample_from_binomial(p[i], 100) }
\(p\)-value for \(H_0\)
\[p(d_{\text{obs}}) = P \left (D \in \{d \mid P(d \mid H_0) \le P(d_{\text{obs}} \mid H_0) \} \right )\]
generalization to arbitrary model
\[p(d_{\text{obs}}) = P \left (D \in \{d \mid P(d \mid M) \le P(d_{\text{obs}} \mid M) \} \right )\]
with \(M = \langle P(\theta), P(D \mid \theta) \rangle\) ::: prior predictive \(p\)-value
with \(M = \langle P(\theta \mid d_{\text{obs}}), P(D \mid \theta) \rangle\) ::: prior predictive \(p\)-value
obs = c(1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) k = sum(obs) # 7 N = length(obs) #20
\(p\)-value NHST:
binom.test(k, N, 0.5)$p.value
## [1] 0.263176
Bayesian posterior predictive \(p\)-value
Gelman et al. 2014, p.147–8
estimation | comparison | criticism | |
---|---|---|---|
goal | which \(\theta\), given \(M\) & \(D\)? | which better: \(M_0\) or \(M_1\)? | \(M\) good model of \(D\)? |
method | Bayes rule | Bayes factor | \(p\)-value |
no. of models | 1 | 2 | 1 |
\(H_0\) | subset of \(\theta\) | \(P(\theta \mid M_0), P(D \mid \theta, M_0)\) | \(P(\theta), P(D \mid \theta)\) |
\(H_1\) | — | \(P(\theta \mid M_1), P(D \mid \theta, M_1)\) | — |
prerequisites | \(P(\theta), \alpha \times P(D \mid \theta)\) | — | test statistic |
pros | lean, easy | intuitive, plausible, Ockham's razor | absolute |
cons | vagueness in ROPE | prior dependence, computational load | sample space? |
Friday
Tuesday