Bayesian data analysis & cognitive modeling

\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]

roadmap

JAGS
- background
- model specification syntax
- workflow
- tips & tricks

recap

Bayes rule for data analysis:

\[\underbrace{P(\theta \, | \, D)}_{posterior} \propto \underbrace{P(\theta)}_{prior} \times \underbrace{P(D \, | \, \theta)}_{likelihood}\]

normalizing constant:

\[ \int P(\theta') \times P(D \mid \theta') \, \text{d}\theta' = P(D) \]

easy to solve only if:

\(\theta\) is a single discrete variable with reasonably sized domain
\(P(\theta)\) is conjugate prior for the likelihood function \(P(D \mid \theta)\)
we are very lucky

recap

Markov Chain Monte Carlo

get sequence of samples \(x_1, \dots, x_n\) s.t.

sequence has the Markov property (\(x_{i+1}\) depends only on \(x_i\)), and
the stationary distribution of the chain is \(P\).

MCMC algorithms

Metropolis Hastings
- versatile but often inefficient
- depends on proposal distribution
Gibbs sampling
- fast and efficient, but not universally applicable
- depends on availability of conditional posterior

recap

assessing quality of sample chains

convergence / representativeness
- trace plots
- \(\hat{R}\)
efficiency
- autocorrelation
- effective sample size

JAGS

history

BUGS project (1989 - present ???)

Bayesian inference Using Gibbs Sampling
developed by UK-based biostatisticians

WinBUGS (1997 - 2007)

Windows based; with GUI
component pascal

OpenBUGS (2005 - present)

Windows & Linux; MacOS through Wine
component pascal

JAGS (2007 - present)

Windows, Linux, MacOS
no GUI
written in C++

JAGS

Just Another Gibbs Sampler

declarative language
- model as a directed acyclic graph
- nodes are variables, edges are deterministic or probabilistic dependencies
automatically selects adequate sampler
call from & process output in R via packages:
- R2Jags, rjags …

coded by Martyn Plummer

example

require('rjags') # also loads `coda` package
modelString = "
model{
  theta ~ dbeta(1,1)
  k ~ dbinom(theta, N)
}"
# prepare for JAGS
dataList = list(k = 7, N = 24)
# set up and run model
jagsModel = jags.model(file = textConnection(modelString), 
                       data = dataList,
                       n.chains = 2)
update(jagsModel, n.iter = 5000)
codaSamples = coda.samples(jagsModel, 
                           variable.names = c("theta"),
                           n.iter = 5000)

example

ms = ggmcmc::ggs(codaSamples)
ms %>% group_by(Parameter) %>% 
   summarise(mean = mean(value),
             HDIlow  = coda::HPDinterval(as.mcmc(value))[1], 
             HDIhigh = coda::HPDinterval(as.mcmc(value))[2])

## # A tibble: 1 × 4
##   Parameter      mean    HDIlow   HDIhigh
##      <fctr>     <dbl>     <dbl>     <dbl>
## 1     theta 0.3087857 0.1436357 0.4823556

example

  tracePlot = ggmcmc::ggs_traceplot(ms)  
  densPlot = ggmcmc::ggs_density(ms) + 
    stat_function(fun = function(x) dbeta(x, 8,18), color = "black")

  gridExtra::grid.arrange(tracePlot, densPlot, ncol = 2)

Using JAGS

about JAGS

current version 4
- faster, allows for use of =
- added samplers and distributions (not yet all documented!?)
syntax is a mix of BUGS and R
- careful with parameterization of probability distributions !!!
declarative model specification
- directed acyclic graphs / Bayes nets
  - nodes are variables, edges are dependencies
- order invariant
- no variable reassignment, no control flow statements etc.
  - exception: ifelse, step … commands for boolean switching
  - for loops for vector construction
\(\exists\) command-line interface (not used in this course)

structure of a JAGS model description

# declare size of variables if needed
var ... ; 
# do some data massaging (usually done in R)
data{
  ...
}
# specify model
model{
  ...
}

you could write this into a separtate file called "myModel.jags.R"
- this is untidy file naming but gives you default R syntax highlighting

(nonsense) example

var myData[5,100], myMu[5], myTau[5]; 
data{
  N = sum(counts) # counts is input to JAGS from R
  indVector = c(3,2,4) # works like in R
  specialConditions = counts[indVector] # like R; new in JAGS 4
}
model{
  for (i in 1:dim(myData)[1]){
    for (j in 1:dim(myData)[2]){
      myData[i,j] ~ dnorm(myMu[i], myTau[i])
    }
  }
  for (i in 1:5){
    tmp[i] ~ dbeta(1,1)
    myMu[i] = 100 + tmp[i] - 0.5
    myTau[i] ~ dunif(0, 1000)
  }
}

caveats

semicolon after declaration of variable dimensions
declare dimensions to avoid confusing the JAGS compiler
JAGS error messages are occassionally underinformative
all model specification lines are probabilistic ("~") or deterministic ("=" or "<-")
- "=" only works for JAGS 4+
all probabilistic dependencies must be samples from a distribution known to JAGS!
- this is therefore all ruled out:

  x ~ dbeta(1,1) - 0.5

  myCostumDistribution = ... # something fancy
  x ~ myCustomDistribution(0,1)

another example

Your turn: what's this model good for?

obs is a vector of 100 observations (real numbers)

model{
  mu ~ dunif(-2,2)
  var ~ dunif(0, 100)
  tau = 1/var
  for (i in 1:100){
    obs[i] ~ dnorm(mu,tau)
  }
}

NB: JAGS uses precision \(\tau = 1/\sigma^2\), not standard deviation \(\sigma\) in dnorm

model specifications, formally

model{
  mu ~ dunif(-2,2)
  var ~ dunif(0, 100)
  tau = 1/var
  for (i in 1:100){
    obs[i] ~ dnorm(mu,tau)
  }
}

\[\mu \sim \text{Unif}(-2,2)\] \[\sigma^2 \sim \text{Unif}(0,100)\] \[obs_i \sim \text{Norm}(\mu, \sigma)\]

running a JAGS script

homebrew MH samples

set.seed(1789)

fakeData = rnorm(200, mean = 0, sd = 1)

f = function(mu, sigma){
  if (sigma <=0){
    return(0)
  }
  priorMu = dunif(mu, min = -4, max = 4)
  priorSigma = dunif(sigma, min = 0, max = 4)
  likelihood =  prod(dnorm(fakeData, mean = mu, sd = sigma))
  return(priorMu * priorSigma * likelihood)
}

samplesMH = MH(f, 
               iterations = 60000,
               chains = 2,
               burnIn = 10000) # outputs mcmc.list from `coda` package

back to the future

modelString = "
model{
  mu ~ dunif(-4,4)
  sigma ~ dunif(0,4)
  tau = 1/sigma^2
  for (i in 1:length(obs)){
    obs[i] ~ dnorm(mu,tau)
  }
}"
jagsModel = jags.model(file = textConnection(modelString), 
                       data = list(obs = fakeData),
                       n.chains = 2)
update(jagsModel, n.iter = 5000)
samplesJAGS = coda.samples(jagsModel, 
                           variable.names = c("mu", "sigma"),
                           n.iter = 5000)

compare sample outputs

grid.arrange(ggs_density(ggs(samplesMH)) ,       ggs_density(ggs(samplesJAGS)) )

compare sample outputs

grid.arrange(ggs_traceplot(ggs(samplesMH)) + theme(plot.background=element_blank()),
      ggs_traceplot(ggs(samplesJAGS)) + theme(plot.background=element_blank()))

compare sample outputs

# function from Kruschke, defined in 'DBDA2E-utilities.R' 
DbdaAcfPlot(samplesMH) + theme(plot.background=element_blank())

## NULL

compare sample outputs

# function from Kruschke, defined in 'DBDA2E-utilities.R'
DbdaAcfPlot(samplesJAGS) + theme(plot.background=element_blank())

## NULL

tips and tricks

debugging

develop step by step & monitor each new intermediate variable

modelString = "
model{
  mu ~ dnorm(0,1)
}"
jagsModel = jags.model(file = textConnection(modelString), 
                       data = list(obs = fakeData),
                       n.chains = 2, n.adapt = 10)
update(jagsModel, n.iter = 10)
codaSamples = coda.samples(jagsModel, variable.names = c("mu"), n.iter = 5000)
ggs_density(ggs(codaSamples))

prior and posterior predictive

model{
  thetaPost ~ beta(1,1)
  thetaPrior ~ beta(1,1)
  # generate data from prior distribution
  priorPredictive ~ dbin(thetaPrior,n) 
  # here `thetaPost` is conditioned on observed data!!
  kObs ~ dbin(thetaPost, n) 
  # generate data from posterior
  posteriorPredictive ~ dbin(thetaPost, n) 
}

prior predictive

\[ P(D) = \int P(\theta) \ P(D \mid \theta) \ \text{d}\theta \]

posterior predictive

\[ P(D \mid D') = \int P(\theta \mid D') \ P(D \mid \theta) \ \text{d}\theta \]

conditioning

boolean operations x <= y, x != y, x || y as usual (see manual)
functions ifelse, equals and step for boolean conditioning

model{
  flag ~ dbern(0.5)
  parameter1 = ifelse(flag, 0, -100)
  parameter2 = ifelse(flag, 1, 100)
}

a problem

What is this model trying to achieve? And, why does it not work?

model{
  flag ~ dbern(0.5)
  SOMEPDF = ifelse(flag, dnorm, dunif)
  parameter1 = ifelse(flag, 0, -100)
  parameter2 = ifelse(flag, 1, 100)
  for(i in 1:length(obs)){
    obs[i] ~ SOMEPDF(parameter1, parameter2)
  }
}

solution

data{
  for (i in 1:length(obs)){
    ones[i] = 1 # create a vector of ones
  }
}
model{
  flag ~ dbern(0.5)
  parameter1 = ifelse(flag, 0, -100)
  parameter2 = ifelse(flag, 1, 100)
  for(i in 1:length(obs)){
    theta[i] = ifelse(flag, 
                      dnorm(obs[i],parameter1, parameter2), 
                      dunif(obs[i],parameter1, parameter2))
    ones[i] ~ dbern(theta[i])
  }
}

fini

outlook

Friday

1st practice session

Tuesday

more complex (interesting) models

next class ::: we need you prepared!!

totally obligatory

have installed on your laptop, newest versions of:
- R and JAGS (optional and recommended: RStudio)
- packages: R2jags, rjags and runjags
obtain access to chapters 3 & 4 of Lee & Wagenmakers (university library!)
- download the example code for JAGS
- make sure that you can execute the file Code/ParameterEstimation/Binomial/Rate_1_jags.R
  - hint: you may need to plan a rendevous with line 10 of this file