Bayesian data analysis & cognitive modeling

\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]

recap

Bayes rule for data analysis:

\[\underbrace{P(\theta \, | \, D)}_{posterior} \propto \underbrace{P(\theta)}_{prior} \times \underbrace{P(D \, | \, \theta)}_{likelihood}\]

Markov Chain Monte Carlo

sequence of representative samples from \(X \sim P\)

JAGS

declarative language to specify data-generating model
- model = prior + likelihood function
- samples of whatever variable we want:
  - conditioning on data \(\rightarrow\) additional constraints on log-score

plan for today

fun with JAGS models
- hierarchical models / latent mixture models
- think: "data-generating process"
- ideas & notation
priors in hierarchical models
shrinkage
examples, examples, examples

first example

example (recap)

require('rjags')
modelString = "
model{
  theta ~ dbeta(1,1)
  k ~ dbinom(theta, N)
}"
# prepare for JAGS
dataList = list(k = 7, N = 24)
# set up and run model
jagsModel = jags.model(file = textConnection(modelString), 
                       data = dataList,
                       n.chains = 2)
update(jagsModel, n.iter = 5000)
codaSamples = coda.samples(jagsModel, 
                           variable.names = c("theta"),
                           n.iter = 5000)

example (recap)

ms = ggs(codaSamples)
ggs_density(ms) + 
  stat_function(fun = function(x) dbeta(x, 8,18), color = "black")

simple scenario

rational Bayesian agent
originally believes \(\theta \sim Beta(1,1)\) (any bias equally likely)
observes \(k = 7\) successes in \(N = 24\) tosses

model{
  theta ~ dbeta(1,1)
  k ~ dbinom(theta, N)
}

slightly more complex scenario

rational Bayesian agent
originally believes \(\theta \sim Beta(16,16)\)
- e.g., has observed 15 successes and 15 failures
- has updated believes via Bayes rule
observes another \(k = 7\) successes in \(N = 24\) tosses

model{
  theta ~ dbeta(6,16)
  k ~ dbinom(theta, N)
}

complex scenario

boundedly-rational, Bayesian agent with imprecise memory
has observed 15 successes and 15 failures but doesn't clearly remember
observes another \(k = 7\) successes in \(N = 24\) tosses

model{
  a ~ pois(5)
  b ~ pois(15)
  theta ~ dbeta(a+1,b+1)
  k ~ dbinom(theta, N)
}

hierarchical models

graph notation

model{
  theta ~ dbeta(1,1)
  k ~ dbinom(theta, N)
}

probMod

model{
  a ~ pois(5)
  b ~ pois(15)
  theta ~ dbeta(a+1,b+1)
  k ~ dbinom(theta, N)
}

probMod

hierarchical models

latent variables can depend on other latent variables
likelihood can be a (direct) function of only some latent variables

dummy

\[ \begin{align*} P(\theta, a, b \, | \, D) & \propto P(\theta, a, b) \ P(D \, | \, \theta, a, b)\\ & = P(a) \ P(b) \ P(\theta \mid a, b) \ P(D \, | \, \theta) \end{align*} \]

dummy dummy

probMod

deterministic dependencies

probMod

\[ \begin{align*} P(\theta, s \, | \, D) & \propto P(\theta, s) \ P(D \, | \, \theta, s)\\ & = P(s) \ \underbrace{P(\theta \mid s)}_{\text{degenerate!}} \ P(D \, | \, \theta) \end{align*} \]

model{
  s ~ dbern(0.5)
  theta = ifelse(s == 1, 0.75, 0.25)
  k ~ dbinom(theta, N)
}

graph conventions

type of variable:
- continuous: circle
- discrete: square
information status:
- observed: grey shade
- latent: white
dependency:
- stochastic: single line
- deterministic: double line

Lee & Wagenmakers (2013)

the sky is the limit!

from a previous lecture

probMod

from Lee & Wagenmakers

probMod

priors in hierarchical models

priors are complex objects
should make sense at every level in the hierarchy:
- check dependent priors of intermediate latent variables!
can have interesting effects
- e.g., shrinkage (later in this class)

example

model{
  b ~ dgamma(1,1)
  a ~ dgamma(1,1)
  theta ~ dbeta(a,b)
  k ~ dbinom(theta, N)
}

dummy dummy

probMod

example: (hyper-)priors

  ggplot(data.frame(x = c(0.0001,10)), aes(x)) +
         stat_function(fun = function(x) dgamma(x,1,1))

example: posteriors

example: priors

reparameterization

\(\text{Beta}(a,b)\) has mode \(\omega\) and concentration \(\kappa\): \[ \begin{align*} \omega & = \frac{a-1}{a+b-2} & \kappa &= a + b \end{align*} \]
reformulate: \[ \begin{align*} a & = \omega(\kappa - 2) + 1 & b & = (1-\omega)(\kappa-2)+1 \end{align*} \]

reparameterized model

model{
  omega ~ dbeta(1,1)
  kappaMin2 ~ dgamma(0.01,0.01)
  theta ~ dbeta(omega*kappaMin2+1, 
                (1-omega)*kappaMin2+1)
  k ~ dbinom(theta, N)
  kappa = kappaMin2+2
}

dummy dummy

probMod

reparameterized model: priors

reparameterized model: posteriors

latent mixture models

latent mixtures

hierarchical models with latent parameters:
- individual-level differences
- item-level differences
- …

think: mixed effects regression;

therapeutic touches

dummy dummy

TT practitioners sense energy fields
they sense which hand is near another persons body

(Kruschke, chapter 9.2.4)

TT data

TTdata = readr::read_csv('../data/02_therapeutic_touchers.csv') %>% 
         group_by(s) %>% 
         summarize(k = sum(y), N = length(y))
# k - number of successes per subject
# N - number of trials per subject
head(TTdata)

## # A tibble: 6 × 3
##       s     k     N
##   <chr> <int> <int>
## 1   S01     1    10
## 2   S02     2    10
## 3   S03     3    10
## 4   S04     3    10
## 5   S05     3    10
## 6   S06     3    10

TTdata

ggplot(TTdata, aes(x  = s, y = k/N)) + geom_bar(stat='identity')

latent mixture model

model{
  omega ~ dbeta(1,1)
  kappaMin2 ~ dgamma(0.01,0.01)
  for (i in 1:28){
    theta[i] ~ dbeta(omega*kappaMin2+1, 
                (1-omega)*kappaMin2+1)
    k[i] ~ dbinom(theta[i], N[i])
  }
}

posterior traceplot

posterior density

ggs_density(ms, family = "omega|kappa")

posterior on subject-level guessing bias

msTheta = filter(ms, ! Parameter %in% c("omega", "kappa"))
msTheta$subject = with(msTheta,
                       factor(regmatches(Parameter, regexpr("[0-9]+", Parameter)), 
                              levels = 1:28))  
ggplot(msTheta, aes(x = subject, y = value)) + geom_boxplot() + ylab("bias level (posterior)")

shrinkage

what's going on?

shrinkage vs. population-level generalization

careful: "shrinkage" is a misleading term!
- shrinkage of variance in low-level parameters (compared to non-hierarchical)
- variance can also increase, depending on the particular model
hierarchical modeling "binds together" low-level parameters
- this is an additional assumption (can be good or bad)
hierarchical "binding" reduces "freedom" of free variables
- can guard us against overfitting
hierarchical group-level binding gives us population-level estimates
- generalize to new observations

double mixture classifier

two country quiz

Candidates from Thailand and Moldovia take part in a quiz. Questions are about the history of each country. Here's the data of who got which question right.

show(CountryQuizData)

##    QA QB QC QD QE QF QG QH
## P1  1  0  0  1  1  0  0  1
## P2  1  0  0  1  1  0  0  1
## P3  0  1  1  0  0  1  0  0
## P4  0  1  1  0  0  1  1  0
## P5  1  0  0  1  1  0  0  1
## P6  0  0  0  1  1  0  0  1
## P7  0  1  0  0  0  1  1  0
## P8  0  1  1  1  0  1  1  0

We want to infer:

which questions are (likely) about the history of the same country, and
which candidates are (likely) from the same country.

Lee & Wagenmakers (2013, Chapter 6.4)

double mixture model

probMod

double mixture model

model{
  alpha ~ dunif(0,1)    # chance correct if origin matches question
  beta ~ dunif(0,alpha) # mismatch   
  for (i in 1:dim(k)[1]){
    x[i] ~ dbern(0.5)
  }
  for (j in 1:dim(k)[2]){
    z[j] ~ dbern(0.5)
  }   
  for (i in 1:nx){
    for (j in 1:nz){
      theta[i,j] = ifelse(x[i] == z[i], alpha, beta)
      k[i,j] ~ dbern(theta[i,j])
    }
  }   
}

NB: simple for JAGS but a hard nut for Stan 😞

samples from prior

posteriors

posterior over types

## # A tibble: 10 × 2
##    Parameter       mean
##       <fctr>      <dbl>
## 1      alpha 0.87918124
## 2       beta 0.05905099
## 3       x[1] 1.00000000
## 4       x[2] 1.00000000
## 5       x[3] 0.00000000
## 6       x[4] 0.00000000
## 7       x[5] 1.00000000
## 8       x[6] 1.00000000
## 9       x[7] 0.00000000
## 10      x[8] 0.00000000

## # A tibble: 8 × 2
##   Parameter  mean
##      <fctr> <dbl>
## 1      z[1]     1
## 2      z[2]     0
## 3      z[3]     0
## 4      z[4]     1
## 5      z[5]     1
## 6      z[6]     0
## 7      z[7]     0
## 8      z[8]     1

fini

summary

hierarchical models: chains of dependencies btw. variables
information from data percolates "upward":
- stronger impact on nearby nodes
- less impact on higher nodes (convergence checks!)
high-level nodes can unify estimates for lower-level variables
- sometimes called "shrinkage"
- better conceived as: group-level prior homogeneity assumption
complex data generating models:
- prior + likelihood
- our assumptions about how the data could have been generated

fini

outlook

Friday

boot camp on complex models in JAGS

Tuesday

model comparison

to prevent boredom

obligatory

skim Lee & Wagenmakers chapters 5 & 6
- decide what you want to explor in class
finish homework 3