\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]
Monte Carlo methods
Bayes rule for data analysis:
\[\underbrace{P(\theta \, | \, D)}_{posterior} \propto \underbrace{P(\theta)}_{prior} \times \underbrace{P(D \, | \, \theta)}_{likelihood}\]
normalizing constant:
\[ \int P(\theta') \times P(D \mid \theta') \, \text{d}\theta' = P(D) \]
easy to solve only if:
Stanislaw Ulam
Nicholas Metropolis
\[F(X) = \int P(X = x) \ f(x) \ \text{d}x\]
dummy
\[F(S) = \frac{1}{N} \sum_{i = 1}^N f(x_i)\]
\[F(S) \sim \mathcal{N}(\mathbb{E}_X, \frac{\text{Var}(f)}{N})\]
# get density values and samples
xVec = seq(-4, 4, length.out = 5000) # vector of x-coordinates
myPlot = ggplot(tibble(x = rnorm(5000, mean = 0, sd = 1)), aes(x = x)) +
geom_density() +
geom_line(aes(x = xVec, y = dnorm(xVec, mean = 0, sd = 1)), color = "firebrick") +
xlab("x") + ylab("(estimated) probability")
myPlot
intuition
a sequence of elements, \(x_1, \dots, x_n\) such that every \(x_{i+1}\) depends only on its predecessor \(x_i\) (think: probabilistic FSA)
Markov property
\[ P(x_{n+1} \mid x_1, \dots, x_n) = P(x_{n+1} \mid x_n) \]
get sequence of samples \(x_1, \dots, x_n\) s.t.
dummy
consequences of Markov property
let \(f(x) = \alpha P(x)\) (e.g., unnormalized posterior)
motto: always up, down with probability \(\bfrac{f(x^{i+1})}{f(x^{i})}\)
ratio \(\bfrac{f(x^{i+1})}{f(x^{i})}\) means that we can neglect normalizing constants
\(P_\text{trans}(x^i \rightarrow x^{i+1})\) defines transition matrix -> Markov chain analysis!
MH is a very general and versatile algorithm
in such cases, the Gibbs sampler may be helpful
basic idea: split the multidimensional problem into a series of simpler problems of lower dimensionality
assume \(X=(X_1,X_2,X_3)\) and \(p(x)=P(x_1,x_2,x_3)\)
start with \(x^0=(x_1^0,x_2^0,x_3^0)\)
\[ P(x_1) = \int P(\tuple{x_1, x_2, x_3}) \ \text{d} \tuple{x_2, x_3}\]
Gibbs sampling can be more efficient than MH
convergence/representativeness
efficiency
compare how similar several independent sample chains are
look at the temporal development of single chains
coda
and ggmcmc
basically do the same thing, but they differ in, say, aesthetics
ggmcmc
lives in the tidyverse
# self-made MH
# returns MCMC-list from package 'coda'
out = MH(f, iterations = 60000,
chains = 2, burnIn = 10000)
head(out[[1]])
## Markov Chain Monte Carlo (MCMC) output:
## Start = 1
## End = 7
## Thinning interval = 1
## variable
## iteration mu sigma
## 1 -0.005970447 1.051339
## 2 -0.005970447 1.051339
## 3 -0.005970447 1.051339
## 4 -0.005970447 1.051339
## 5 -0.005970447 1.051339
## 6 -0.005970447 1.051339
## 7 -0.005970447 1.051339
# cast the same information into 'ggmcmc' format
out2 = ggmcmc::ggs(out)
# same samples but now in a tibble
out2
## # A tibble: 200,000 x 4
## Iteration Chain Parameter value
## <int> <int> <fct> <dbl>
## 1 1 1 mu -0.00597
## 2 2 1 mu -0.00597
## 3 3 1 mu -0.00597
## 4 4 1 mu -0.00597
## 5 5 1 mu -0.00597
## 6 6 1 mu -0.00597
## 7 7 1 mu -0.00597
## 8 8 1 mu -0.00597
## 9 9 1 mu -0.00597
## 10 10 1 mu -0.00597
## # ... with 199,990 more rows
trace plots from multiple chains should look like:
a bunch of hairy caterpillars madly in love with each other
burn-in (warm-up)
remove initial chunk of sample sequence to discard effects of (random) starting position
thinning
only consider every \(i\)-th sample to remove autocorrelation
\(\hat{R}\)-statistics,
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## mu 10.73 66.0
## sigma 3.16 17.8
##
## Multivariate psrf
##
## 11.5
\[\text{ESS} = \frac{N}{ 1 + 2 \sum_{k=1}^{\infty} ACF(k)}\]
## mu sigma
## 1361.64 1184.35
Tuesday
Friday