\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]

recap

Bayes rule for data analysis:

\[\underbrace{P(\theta \, | \, D)}_{posterior} \propto \underbrace{P(\theta)}_{prior} \times \underbrace{P(D \, | \, \theta)}_{likelihood}\]

Markov Chain Monte Carlo

  • sequence of representative samples from \(X \sim P\)

JAGS

  • declarative language to specify data-generating model
    • model = prior + likelihood function
    • samples of whatever variable we want:
      • conditioning on data \(\rightarrow\) additional constraints on log-score

plan for today

  • fun with JAGS models
    • hierarchical models / latent mixture models
    • think: "data-generating process"
    • ideas & notation
  • priors in hierarchical models
  • shrinkage
  • examples, examples, examples

first example

example (recap)

require('rjags')
modelString = "
model{
  theta ~ dbeta(1,1)
  k ~ dbinom(theta, N)
}"
# prepare for JAGS
dataList = list(k = 7, N = 24)
# set up and run model
jagsModel = jags.model(file = textConnection(modelString), 
                       data = dataList,
                       n.chains = 2)
update(jagsModel, n.iter = 5000)
codaSamples = coda.samples(jagsModel, 
                           variable.names = c("theta"),
                           n.iter = 5000)

example (recap)

ms = ggs(codaSamples)
ggs_density(ms) + 
  stat_function(fun = function(x) dbeta(x, 8,18), color = "black") 

simple scenario

  • rational Bayesian agent
  • originally believes \(\theta \sim Beta(1,1)\) (any bias equally likely)
  • observes \(k = 7\) successes in \(N = 24\) tosses

 

model{
  theta ~ dbeta(1,1)
  k ~ dbinom(theta, N)
}

slightly more complex scenario

  • rational Bayesian agent
  • originally believes \(\theta \sim Beta(16,16)\)
    • e.g., has observed 15 successes and 15 failures
    • has updated believes via Bayes rule
  • observes another \(k = 7\) successes in \(N = 24\) tosses
model{
  theta ~ dbeta(6,16)
  k ~ dbinom(theta, N)
}

complex scenario

  • boundedly-rational, Bayesian agent with imprecise memory
  • has observed 15 successes and 15 failures but doesn't clearly remember
  • observes another \(k = 7\) successes in \(N = 24\) tosses

 

model{
  a ~ pois(5)
  b ~ pois(15)
  theta ~ dbeta(a+1,b+1)
  k ~ dbinom(theta, N)
}

hierarchical models

graph notation

model{
  theta ~ dbeta(1,1)
  k ~ dbinom(theta, N)
}

 

 

probMod

 

 

model{
  a ~ pois(5)
  b ~ pois(15)
  theta ~ dbeta(a+1,b+1)
  k ~ dbinom(theta, N)
}

probMod

hierarchical models

  • latent variables can depend on other latent variables
  • likelihood can be a (direct) function of only some latent variables

dummy

\[ \begin{align*} P(\theta, a, b \, | \, D) & \propto P(\theta, a, b) \ P(D \, | \, \theta, a, b)\\ & = P(a) \ P(b) \ P(\theta \mid a, b) \ P(D \, | \, \theta) \end{align*} \]

dummy dummy

probMod

deterministic dependencies

probMod

\[ \begin{align*} P(\theta, s \, | \, D) & \propto P(\theta, s) \ P(D \, | \, \theta, s)\\ & = P(s) \ \underbrace{P(\theta \mid s)}_{\text{degenerate!}} \ P(D \, | \, \theta) \end{align*} \]

model{
  s ~ dbern(0.5)
  theta = ifelse(s == 1, 0.75, 0.25)
  k ~ dbinom(theta, N)
}

graph conventions

  • type of variable:
    • continuous: circle
    • discrete: square
  • information status:
    • observed: grey shade
    • latent: white
  • dependency:
    • stochastic: single line
    • deterministic: double line

Lee & Wagenmakers (2013)

the sky is the limit!

from a previous lecture

probMod

from Lee & Wagenmakers

probMod

priors in hierarchical models

priors in hierarchical models

  • priors are complex objects
  • should make sense at every level in the hierarchy:
    • check dependent priors of intermediate latent variables!
  • can have interesting effects
    • e.g., shrinkage (later in this class)

example

model{
  b ~ dgamma(1,1)
  a ~ dgamma(1,1)
  theta ~ dbeta(a,b)
  k ~ dbinom(theta, N)
}

dummy dummy

probMod

example: (hyper-)priors

  ggplot(data.frame(x = c(0.0001,10)), aes(x)) +
         stat_function(fun = function(x) dgamma(x,1,1))

example: posteriors

example: priors

reparameterization

  • \(\text{Beta}(a,b)\) has mode \(\omega\) and concentration \(\kappa\): \[ \begin{align*} \omega & = \frac{a-1}{a+b-2} & \kappa &= a + b \end{align*} \]
  • reformulate: \[ \begin{align*} a & = \omega(\kappa - 2) + 1 & b & = (1-\omega)(\kappa-2)+1 \end{align*} \]

reparameterized model

model{
  omega ~ dbeta(1,1)
  kappaMin2 ~ dgamma(0.01,0.01)
  theta ~ dbeta(omega*kappaMin2+1, 
                (1-omega)*kappaMin2+1)
  k ~ dbinom(theta, N)
  kappa = kappaMin2+2
}

dummy dummy

probMod

reparameterized model: priors

reparameterized model: posteriors

latent mixture models

latent mixtures

  • hierarchical models with latent parameters:
    • individual-level differences
    • item-level differences

think: mixed effects regression;

therapeutic touches

therapeuticTouches

dummy dummy

  • TT practitioners sense energy fields
  • they sense which hand is near another persons body

(Kruschke, chapter 9.2.4)

TT data

TTdata = readr::read_csv('../data/02_therapeutic_touchers.csv') %>% 
         group_by(s) %>% 
         summarize(k = sum(y), N = length(y))
# k - number of successes per subject
# N - number of trials per subject
head(TTdata)
## # A tibble: 6 × 3
##       s     k     N
##   <chr> <int> <int>
## 1   S01     1    10
## 2   S02     2    10
## 3   S03     3    10
## 4   S04     3    10
## 5   S05     3    10
## 6   S06     3    10

TTdata

ggplot(TTdata, aes(x  = s, y = k/N)) + geom_bar(stat='identity')

latent mixture model

model{
  omega ~ dbeta(1,1)
  kappaMin2 ~ dgamma(0.01,0.01)
  for (i in 1:28){
    theta[i] ~ dbeta(omega*kappaMin2+1, 
                (1-omega)*kappaMin2+1)
    k[i] ~ dbinom(theta[i], N[i])
  }
}

probMod

posterior traceplot

posterior density

ggs_density(ms, family = "omega|kappa")

posterior on subject-level guessing bias

msTheta = filter(ms, ! Parameter %in% c("omega", "kappa"))
msTheta$subject = with(msTheta,
                       factor(regmatches(Parameter, regexpr("[0-9]+", Parameter)), 
                              levels = 1:28))  
ggplot(msTheta, aes(x = subject, y = value)) + geom_boxplot() + ylab("bias level (posterior)")

shrinkage

what's going on?

shrinkage vs. population-level generalization

  • careful: "shrinkage" is a misleading term!
    • shrinkage of variance in low-level parameters (compared to non-hierarchical)
    • variance can also increase, depending on the particular model
  • hierarchical modeling "binds together" low-level parameters
    • this is an additional assumption (can be good or bad)
  • hierarchical "binding" reduces "freedom" of free variables
    • can guard us against overfitting
  • hierarchical group-level binding gives us population-level estimates
    • generalize to new observations

double mixture classifier

two country quiz

Candidates from Thailand and Moldovia take part in a quiz. Questions are about the history of each country. Here's the data of who got which question right.

 

show(CountryQuizData)
##    QA QB QC QD QE QF QG QH
## P1  1  0  0  1  1  0  0  1
## P2  1  0  0  1  1  0  0  1
## P3  0  1  1  0  0  1  0  0
## P4  0  1  1  0  0  1  1  0
## P5  1  0  0  1  1  0  0  1
## P6  0  0  0  1  1  0  0  1
## P7  0  1  0  0  0  1  1  0
## P8  0  1  1  1  0  1  1  0

 

We want to infer:

  1. which questions are (likely) about the history of the same country, and
  2. which candidates are (likely) from the same country.

 

Lee & Wagenmakers (2013, Chapter 6.4)

double mixture model

probMod

double mixture model

model{
  alpha ~ dunif(0,1)    # chance correct if origin matches question
  beta ~ dunif(0,alpha) # mismatch   
  for (i in 1:dim(k)[1]){
    x[i] ~ dbern(0.5)
  }
  for (j in 1:dim(k)[2]){
    z[j] ~ dbern(0.5)
  }   
  for (i in 1:nx){
    for (j in 1:nz){
      theta[i,j] = ifelse(x[i] == z[i], alpha, beta)
      k[i,j] ~ dbern(theta[i,j])
    }
  }   
}

NB: simple for JAGS but a hard nut for Stan 😞

samples from prior

posteriors

posterior over types

## # A tibble: 10 × 2
##    Parameter       mean
##       <fctr>      <dbl>
## 1      alpha 0.87918124
## 2       beta 0.05905099
## 3       x[1] 1.00000000
## 4       x[2] 1.00000000
## 5       x[3] 0.00000000
## 6       x[4] 0.00000000
## 7       x[5] 1.00000000
## 8       x[6] 1.00000000
## 9       x[7] 0.00000000
## 10      x[8] 0.00000000
## # A tibble: 8 × 2
##   Parameter  mean
##      <fctr> <dbl>
## 1      z[1]     1
## 2      z[2]     0
## 3      z[3]     0
## 4      z[4]     1
## 5      z[5]     1
## 6      z[6]     0
## 7      z[7]     0
## 8      z[8]     1

fini

summary

  • hierarchical models: chains of dependencies btw. variables
  • information from data percolates "upward":
    • stronger impact on nearby nodes
    • less impact on higher nodes (convergence checks!)
  • high-level nodes can unify estimates for lower-level variables
    • sometimes called "shrinkage"
    • better conceived as: group-level prior homogeneity assumption
  • complex data generating models:
    • prior + likelihood
    • our assumptions about how the data could have been generated

fini

outlook

 

Friday

  • boot camp on complex models in JAGS

 

Tuesday

  • model comparison

to prevent boredom

 

obligatory

  • skim Lee & Wagenmakers chapters 5 & 6
    • decide what you want to explor in class
  • finish homework 3