Model Comparison: Part 2

Michael Franke

\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]

     

recap

Bayes factors (recap)

  • take two models:
    • \(P(\theta_1 \mid M_1)\) and \(P(D \mid \theta_1, M_1)\)
    • \(P(\theta_2 \mid M_2)\) and \(P(D \mid \theta_2, M_2)\)
  • ideally, we’d want to know the absolute probability of \(M_i\) given the data
    • but then we’d need to know set of all models (for normalization)
  • alternatively, we take odds of models given the data:

\[\underbrace{\frac{P(M_1 \mid D)}{P(M_2 \mid D)}}_{\text{posterior odds}} = \underbrace{\frac{P(D \mid M_1)}{P(D \mid M_2)}}_{\text{Bayes factor}} \ \underbrace{\frac{P(M_1)}{P(M_2)}}_{\text{prior odds}}\]

The Bayes factor is the factor by which our prior odds are changed by the data.

marginal likelihood (recap)

Bayes factor in favor of model \(M_1\)

\[\text{BF}(M_1 > M_2) = \frac{P(D \mid M_1)}{P(D \mid M_2)}\]

marginal likelihood of data under model \(M_i\)

\[P(D \mid M_i) = \int P(\theta_i \mid M_i) \ P(D \mid \theta_i, M_i) \text{ d}\theta_i\]

  • we marginalize out parameters \(\theta_i\)
  • this is a function of the prior and the likelihood

comparison against MLE-method (recap)

standard approach

  • Akaike information criterion
  • likelihood function \(P(D\mid \theta)\)
  • maximum likelihood estimate \[\hat{\theta} \in \arg\max_\theta P(D\mid \theta)\]
  • ex post: model uses data
  • counts number of free parameters
  • relatively easy to compute

Bayes

  • Bayes factors
  • likelihood function \(P(D\mid \theta)\) & prior \(P(\theta)\)
  • marginalized (prior) likelihood \[\int P(D \mid \theta) \ P(\theta) \text{d}\theta\]
  • ex ante: model doesn’t use data
  • implicitly weighs in effectiveness of parameters
  • relatively hard to compute

how to caculate Bayes factors

  1. get each model’s marginal likelihood
    • grid approximation
      (last time)
    • by Monte Carlo sampling
      (last time)
    • brute force clever math
      (today)
    • bridge sampling
      (today)
  2. get Bayes factor directly
    • Savage-Dickey method
      (today)
    • transdimensional MCMC and now to some more space
      (not here)

     

Savage-Dickey method

example: model comparison for coin flip data

  • observed: \(k = 7\) out of \(N = 24\) coin flips were successes
  • goal: compare a null-model \(M_0\) with an alternative model \(M_1\)
    • \(M_0\) has \(\theta = 0.5\) and \(k \sim \text{Binomial}(0.5, N)\)
    • \(M_1\) has \(\theta \sim \text{Beta}(1,1)\) and \(k \sim \text{Binomial}(\theta, N)\)

   

exact calculation

  • data: \(k = 7\) successes in \(N = 24\) trials

\[ \begin{align*} \text{BF}(M_0 > M_1) & = \frac{P(D \mid M_0)}{P(D \mid M_1)} \\ & = \frac{\text{Binomial}(k,N,0.5)}{\int_0^1 \text{Beta}(\theta, 1, 1) \ \text{Binomial}(k,N, \theta) \text{ d}\theta} \\ & = \frac{\binom{N}{k} 0.5^{k} \, (1-0.5)^{N - k}}{\int_0^1 \binom{N}{k} \theta^{k} \, (1-\theta)^{N - k} \text{ d}\theta} \\ & = \frac{\binom{N}{k} 0.5^{N}}{\binom{N}{k} \int_0^1 \theta^{k} \, (1-\theta)^{N - k} \text{ d}\theta} \\ & = \frac{0.5^{N}}{BetaFunction(k+1, N-k+1)} \approx 0.516 \end{align*} \]

properly nested models

  • suppose that ther are \(n\) continuous parameters of interest \(\theta = \langle \theta_1, \dots, \theta_n \rangle\)
  • \(M_1\) is a model defined by \(P(\theta \mid M_1)\) & \(P(D \mid \theta, M_1)\)
  • \(M_0\) is properly nested under \(M_1\) if:
    • \(M_0\) assigns fixed values to parameters \(\theta_i = x_i, \dots, \theta_n = x_n\)
    • \(\lim_{\theta_i \rightarrow x_i, \dots, \theta_n \rightarrow x_n} P(\theta_1, \dots, \theta_{i-1} \mid \theta_i, \dots, \theta_n, M_1) = P(\theta_1, \dots, \theta_{i-1} \mid M_0)\)
    • \(P(D \mid \theta_1, \dots, \theta_{i-1}, M_0) = P(D \mid \theta_1, \dots, \theta_{i-1}, \theta_i = x_i, \dots, \theta_n = x_n, M_1)\)

Savage-Dickey method

let \(M_0\) be properly nested under \(M_1\) s.t. \(M_0\) fixes \(\theta_i = x_i, \dots, \theta_n = x_n\)

\[ \begin{align*} \text{BF}(M_0 > M_1) & = \frac{P(D \mid M_0)}{P(D \mid M_1)} \\ & = \frac{P(\theta_i = x_i, \dots, \theta_n = x_n \mid D, M_1)}{P(\theta_i = x_i, \dots, \theta_n = x_n \mid M_1)} \end{align*} \]

proof

  • \(M_0\) has parameters \(\theta = \tuple{\phi, \psi}\) with \(\phi = \phi_0\)
  • \(M_1\) has parameters \(\theta = \tuple{\phi, \psi}\) with \(\phi\) free to vary
  • crucial assumption: \(\lim_{\phi \rightarrow \phi_0} P(\psi \mid \phi, M_1) = P(\psi \mid M_0)\)
  • rewrite marginal likelihood under \(M_0\):

\[ \begin{align*} P(D \mid M_0) & = \int P(D \mid \psi, M_0) P(\psi \mid M_0) \ \text{d}\psi \\ & = \int P(D \mid \psi, \phi = \phi_0, M_1) P(\psi \mid \phi = \phi_0, M_1) \ \text{d}\psi\\ & = P(D \mid \phi = \phi_0, M_1) \ \ \ \ \ \ \text{(by Bayes rule)} \\ & = \frac{P(\phi = \phi_0 \mid D, M_1) P(D \mid M_1)}{P(\phi = \phi_0 \mid M_1)} \end{align*} \]

     

Savage-Dickey method applied to the Generalized Context Model

Generalized Context Model ::: Stimuli

GCM_stimuli

see Lee & Wagenmakers (2015) Chapter 17

Generalized Context Model ::: Model

CGM

Generalized Context Model ::: Stan code

GCM ::: Posterior inference & BF

## [1] "Approximate BF: 4.747"

     

sampling-based approximations of marginal likelihoods

reformulating marginal likelihood

naive Monte Carlo

\[ P(D) = \int P_{\text{prior}}(\theta) \ P(D \mid \theta) \ \text{d} \theta = \mathbb{E}_{P_{\text{prior}}(\theta)} \left [ P(D \mid \theta) \right ]\]

importance sampling

\[ P(D) = \mathbb{E}_{g_{IS}(\theta)} \left [ \frac{P_{\text{prior}}(\theta) \ P(D \mid \theta)}{g_{IS}(\theta)} \right ]\]

generalized harmonic mean

\[ P(D) = \left [ \mathbb{E}_{P_{\text{posterior}}(\theta \mid D)} \left [ \frac{g_{HM}(\theta)}{P_{\text{prior}}(\theta) \ P(D \mid \theta)} \right ] \right ]^{-1} \]

bridge

\[ P(D) = \frac{\mathbb{E}_{g_{\text{proposal}}}(\theta) \left [ P(D \mid \theta) \ P_{\text{prior}}(\theta) \ h_{\text{bridge}}(\theta) \right ] } {\mathbb{E}_{P_{\text{posterior}}(\theta \mid D)} \left [ h_{\text{bridge}}(\theta) \ g_{\text{proposal}}(\theta) \right ]}\]

generalized harmonic mean sampler

since \(g_{HM}(\theta)\) is a probability distribution, we have \(\int g_{HM}(\theta) \text{d}\theta = 1\); therefore:

\[ \begin{align*} \frac{1}{P(D)} & = \frac{P(\theta \mid D)}{P(D \mid \theta) P(\theta)}\\ & = \frac{P(\theta \mid D)}{P(D \mid \theta) P(\theta)} \int g_{HM}(\theta) \text{d}\theta \\ & = \int \frac{g_{HM}(\theta) P(\theta \mid D)}{P(D \mid \theta) P(\theta)} \text{d}\theta \\ & \approx \frac{1}{n} \sum^{n}_{\theta_i \sim P(\theta \mid D)} \frac{g_{HM}(\theta)}{P(D \mid \theta) P(\theta)} \end{align*} \]

choose a \(g_{HM}(\theta)\) that resembles the posterior

for more info see the bridge sampling tutorial

     

bridge sampling

bridge sampling

## [1] "Approximate BF in favor of complex model: 4.394"
## [1] "Approximate error percentage: 0.00718%"
## [2] "Approximate error percentage: 0.0064%"

outlook & preparatory reading

 

Thursday

  • predictions & hypothesis testing

  • read Kruschke Chapter 12