- Markov Chain Monte Carlo methods
- Metropolis Hastings
- Gibbs
- convergence / representativeness
- trace plots
- \(\hat{R}\)
- efficiency
- autocorrelation
- effective sample size
\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]
Bayes rule for data analysis:
\[\underbrace{P(\theta \, | \, D)}_{posterior} \propto \underbrace{P(\theta)}_{prior} \times \underbrace{P(D \, | \, \theta)}_{likelihood}\]
normalizing constant:
\[ \int P(\theta') \times P(D \mid \theta') \, \text{d}\theta' = P(D) \]
easy to solve only if:
# get density values and samples xVec = seq(-4, 4, length.out = 5000) # vector of x-coordinates myPlot = ggplot(tibble(x = rnorm(5000, mean = 0, sd = 1)), aes(x = x)) + geom_density() + geom_line(aes(x = xVec, y = dnorm(xVec, mean = 0, sd = 1)), color = "firebrick") + xlab("x") + ylab("(estimated) probability") myPlot
if \(P(\cdot \mid \vec{\alpha}) \in \Delta(X)\) is a distribution over \(X\) (possibly high-dimensional) with fixed parameters \(\vec{\alpha}\), then write \(x \sim P(\vec{\alpha})\)
examples
problem
dummy
solution: Monte Carlo simulation
intuition
a sequence of elements, \(x_1, \dots, x_n\) such that every \(x_{i+1}\) depends only on its predecessor \(x_i\) (think: probabilistic FSA)
Markov property
\[ P(x_{n+1} \mid x_1, \dots, x_n) = P(x_{n+1} \mid x_n) \]
get sequence of samples \(x_1, \dots, x_n\) s.t.
dummy
consequences of Markov property
\[ P(x_1) = \int P(\tuple{x_1, x_2, x_3}) \ \text{d} \tuple{x_2, x_3}\]
clever software helps determine when/how to do Gibbs and how to tune MH's proposal function (next class)
convergence/representativeness
efficiency
require('coda') require('ggmcmc')
coda
and ggmcmc
basically do the same thing, but they differ in, say, aesthetics
ggmcmc
lives in the tidyverse
out = MH(f, iterations = 60000, chains = 2, burnIn = 10000) # self-made MH head(out[[1]]) # returns MCMC-list from package 'coda'
## Markov Chain Monte Carlo (MCMC) output: ## Start = 1 ## End = 7 ## Thinning interval = 1 ## variable ## iteration mu sigma ## 1 0.149087 0.9350134 ## 2 0.149087 0.9350134 ## 3 0.149087 0.9350134 ## 4 0.149087 0.9350134 ## 5 0.149087 0.9350134 ## 6 0.149087 0.9350134 ## 7 0.149087 0.9350134
out2 = ggs(out) # cast the same information into a format that 'ggmcmc' uses out2 # this is now a tibble
## # A tibble: 200,000 × 4 ## Iteration Chain Parameter value ## <int> <int> <fctr> <dbl> ## 1 1 1 mu 0.149087 ## 2 2 1 mu 0.149087 ## 3 3 1 mu 0.149087 ## 4 4 1 mu 0.149087 ## 5 5 1 mu 0.149087 ## 6 6 1 mu 0.149087 ## 7 7 1 mu 0.149087 ## 8 8 1 mu 0.149087 ## 9 9 1 mu 0.149087 ## 10 10 1 mu 0.149087 ## # ... with 199,990 more rows
plot(out)
ggs_traceplot(out2)
trace plots from multiple chains should look like:
a bunch of hairy caterpillars madly in love with each other
burn-in
remove initial chunk of sample sequence to discard effects of (random) starting position
thinning
only consider every \(i\)-th sample to remove autocorrelation
\(\hat{R}\)-statistics,
gelman.diag(out)
## Potential scale reduction factors: ## ## Point est. Upper C.I. ## mu 1 1.01 ## sigma 1 1.00 ## ## Multivariate psrf ## ## 1
ggs_Rhat(out2)
autocorr.plot(out)
ggs_autocorrelation(out2)
\[\text{ESS} = \frac{N}{ 1 + 2 \sum_{k=1}^{\infty} ACF(k)}\]
effectiveSize(out)
## mu sigma ## 2395.912 2138.160
Tuesday
Friday
obligatory
optional