Markov Chain Monte Carlo methods

Michael Franke

\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]

key notions

Monte Carlo methods
Markov Chain Monte Carlo methods
- Metropolis Hastings
- Gibbs
convergence / representativeness
- trace plots
- $\hat{R}$
efficiency
- autocorrelation
- effective sample size

advances in computability have enabled BDA

recap

Bayes rule for data analysis:

\[\underbrace{P(\theta \, | \, D)}_{posterior} \propto \underbrace{P(\theta)}_{prior} \times \underbrace{P(D \, | \, \theta)}_{likelihood}\]

normalizing constant:

\[ \int P(\theta') \times P(D \mid \theta') \, \text{d}\theta' = P(D) \]

easy to solve only if:

$\theta$ is a single discrete variable with reasonably sized domain
$P(\theta)$ is conjugate prior for the likelihood function $P(D \mid \theta)$
we are very lucky

Monte Carlo simulation

what’s the probability of a deadlock in solitaire?

solitaire

Birth of MC methods at Los Alamos

Ulam

Stanislaw Ulam

Metropolis

Nicholas Metropolis

Problem

assume that $x \sim X$ and that $X$ is “unwieldy”
we’d like to know some property $F(X)$ of $X$ that can be expressed as an expectation (where $f(x)$ could be any useful transformation of single values $x$):

\[F(X) = \int P(X = x) \ f(x) \ \text{d}x\]

examples of $F(X)$ are the mean, median, variance, 95% HDI, or any cumulative probability, such as $\int_{0.8}^\infty P(X = x) \ \text{d}x$

dummy

solution: Monte Carlo sampling

draw samples $S = x_1, \dots x_n \sim X$ and compute:

\[F(S) = \frac{1}{N} \sum_{i = 1}^N f(x_i)\]

if the samples are “good samples” (and if $f$ is not crazy), this approximates the original expectation well:

\[F(S) \sim \mathcal{N}(\mathbb{E}_X, \frac{\text{Var}(f)}{N})\]

example 1: normal distribution

# get density values and samples
xVec = seq(-4, 4, length.out = 5000)     # vector of x-coordinates
myPlot = ggplot(tibble(x = rnorm(5000, mean = 0, sd = 1)), aes(x = x)) +  
  geom_density() + 
  geom_line(aes(x = xVec, y = dnorm(xVec, mean = 0, sd = 1)), color = "firebrick") +
  xlab("x") + ylab("(estimated) probability")
myPlot

example 2: simulating $p$-values

k_obs = 7
n_obs = 24
x_reps = 500000

lhs = map_dbl(1:x_reps, function(i) {
  k_hyp = rbinom(1, size = n_obs, prob = 0.5)
  dbinom(k_hyp, size = n_obs, prob = 0.5)
})

lh_obs = dbinom(k_obs, size = n_obs, prob = 0.5)

mean(lhs <= lh_obs) %>% show()

## [1] 0.064272

tibble(iteration = 1:x_reps, 
       p_value = cumsum(lhs <= lh_obs) / 1:x_reps) %>% 
  ggplot(aes(x = iteration, y = p_value)) + geom_line()

Markov chains

Markov chain

intuition

a sequence of elements, $x_1, \dots, x_n$ such that every $x_{i+1}$ depends only on its predecessor $x_i$ (think: probabilistic FSA)

probabilistic automaton

Markov property

\[ P(x_{n+1} \mid x_1, \dots, x_n) = P(x_{n+1} \mid x_n) \]