Bayesian data analysis & cognitive modeling

\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]

roadmap for today

announcements
mid-term survey results
Bayes factor computation
- Monte Carlo simulation
  - naive
  - importance sampling
- transdimensional MCMC

announcements

talk today by Jonathan Love (Tasmania) on jamovi

title

"jamovi: an R powered, open, user-friendly, reproducible, community-driven alternative to SPSS"

time & venue

AFK 4.503 at the Institute of Psychology, Fr, 23.06., 17:00 - 18:00

from the abstract

jamovi is a free and open statistical spreadsheet which is easy to use and designed to be familiar to users of SPSS. Analyses in jamovi are implemented in R, and jamovi is able to provide the underlying R syntax for each analysis. Additionally, jamovi is modular and provides the 'jamovi library', a collection of analyses developed by third parties. Analyses in jamovi are reproducible, allowing anyone to return to an earlier analysis and inspect what options were used. jamovi is rich and accessible like RShiny, familiar like SPSS, reproducible like Rmarkdown, and, like R, open and inviting to methodologists and developers. This talk begins by exploring the motivations and recent developments in statistical software, and how these have lead to the jamovi vision. The talk then leads on to introduce jamovi, demonstrates its features, and explores its broader goals.

talk on Monday by Judith Degen (Stanford)

title

"Mentioning atypical properties of objects is communicatively efficient"

time & venue

Wilhelmstr. 19 room 1.13, Mo, 26.06., 10:00 - 12:00

abstract

What governs how much information speakers include in referring expressions? Atypical properties of objects are more likely to be included in referring expressions than typical ones (Sedivy 2003; Westerbeek et al 2015). For example, speakers are more likely to call a blue banana a “blue banana” and a yellow banana a "banana". A unified account of this phenomenon is lacking. We ask: when should a rational speaker mention an object’s color? Reference production is modeled within the Rational Speech Act framework (Frank & Goodman 2012). Utterances (e.g., “banana”, “blue”, and “blue banana”) are taken to have a graded semantics: rather than assuming all bananas are equally good instances of “banana”, we empirically elicited object-utterance typicality values for all possible utterances. Pragmatic speakers select utterances proportionally to the probability that a literal listener using a graded semantics will select the intended referent. We evaluate the proposed model on a dataset of freely produced referring expressions collected in an interactive reference game experiment via the web. We conclude that the systematicity with which speakers redundantly mention color implicates a system geared towards communicative efficiency rather than towards wasteful overinformativeness, and discuss potential extensions of this approach to other production phenomena, such as optional instrument mention.

survey results

memory models

recall data & models

recall rates y for 100 trials after time t in seconds

y = c(.94, .77, .40, .26, .24, .16)
t = c(  1,   3,   6,   9,  12,  18)
obs = y*100

exponential model

\[P(t \ ; \ a, b) = a \exp (-bt)\]

\[\text{where } a,b>0 \]

power model

\[P(t \ ; \ c, d) = ct^{-d}\]

\[\text{where } c,d>0 \]

model specification (priors & likelihood function)

exponential

priorExp = function(a, b){
  dunif(a, 0, 1.5) * dunif(b, 0, 1.5)
}
lhExp = function(a, b){
  p = a*exp(-b*t)
  p[p <= 0.0] = 1.0e-5
  p[p >= 1.0] = 1-1.0e-5
  prod(dbinom(x = obs, prob = p, size = 100))
}

power

priorPow = function(c, d){
  dunif(c, 0, 1.5) * dunif(d, 0, 1.5)
}
lhPow = function(c, d){
  p = c*t^(-d)
  p[p <= 0.0] = 1.0e-5
  p[p >= 1.0] = 1-1.0e-5
  prod(dbinom(x = obs, prob = p, size = 100))
}

BF by grid approximation

lhExp = Vectorize(lhExp)
lhPow = Vectorize(lhPow)
start.time <- Sys.time()
stepsize = 0.01
evidence = expand.grid(x = seq(0.005, 1.495, by = stepsize),
                       y = seq(0.005, 1.495, by = stepsize)) %>% 
  mutate(lhExp = lhExp(x,y), priExp = 1 / length(x),  # uniform priors!
         lhPow = lhPow(x,y), priPow = 1 / length(x))
end.time <- Sys.time()
BF_grid = with(evidence, sum(priExp*lhExp)/ sum(priPow*lhPow))
time.taken_grid <- end.time - start.time
paste0("BF in favor of exponential model: ", round(BF_grid,2))

## [1] "BF in favor of exponential model: 1221.39"

overwhelming evidence in favor of the exponential model

naive Monte Carlo simulation

recap: why simulate?

generally:

\[\int f(\theta) \ P(\theta) \ \text{d}\theta \approx \frac{1}{n} \sum^{n}_{\theta_i \sim P(\theta)} f(\theta)\]

in particular:

\[P(D) = \int P(D \mid \theta) \ P(\theta) \ \text{d}\theta \approx \frac{1}{n} \sum^{n}_{\theta_i \sim P(\theta)} P(D \mid \theta)\]

example: naive sampling

nSamples = 200000
start.time <- Sys.time()
a = runif(nSamples, 0, 1.5)
b = runif(nSamples, 0, 1.5)
lhExpVec = lhExp(a,b)
lhPowVec = lhPow(a,b)
BF_naive = sum(lhExpVec)/ sum(lhPowVec)
end.time <- Sys.time()
time.taken_naive <- end.time - start.time
paste0("BF in favor of exponential model: ", round(BF_naive,2))

## [1] "BF in favor of exponential model: 1212.23"

time course of estimate

BFVec = sapply(seq(10000,nSamples, by = 500), 
     function(i) sum(lhExpVec[1:i]) / sum(lhPowVec[1:i]))
ggplot(data.frame(i = seq(10000,nSamples, by = 500), BF = BFVec), aes(x=i, y=BF)) +
  geom_line() + geom_hline(yintercept = 1221, color = "firebrick") + xlab("number of samples")

importance sampling

take an arbitrary function \(h(\theta)\) such that \(\int h(\theta) \text{d}\theta = 1\)

\[ \begin{align*} \frac{1}{P(D)} & = \frac{P(\theta \mid D)}{P(D \mid \theta) P(\theta)}\\ & = \frac{P(\theta \mid D)}{P(D \mid \theta) P(\theta)} \int h(\theta) \text{d}\theta \\ & = \int \frac{h(\theta) P(\theta \mid D)}{P(D \mid \theta) P(\theta)} \text{d}\theta \\ & \approx \frac{1}{n} \sum^{n}_{\theta_i \sim P(\theta \mid D)} \frac{h(\theta)}{P(D \mid \theta) P(\theta)} \end{align*} \]

choose a \(h(\theta)\) that resembles the posterior

posterior over parameters for exponential model

modelString = "
model{
  a ~ dunif(0,1.5)
  b ~ dunif(0,1.5)
  for (i in 1: 6){
    pT[i] = a*exp(-t[i]*b)
    p[i] = min(max( pT[i], 0.00001), 0.99999)
    obs[i] ~ dbinom(p[i], 100)
  }
}"
# prepare for JAGS
dataList = list(obs = obs, t = t)
# set up and run model
jagsModel = jags.model(file = textConnection(modelString), 
                       data = dataList,
                       n.chains = 2)
update(jagsModel, n.iter = 25000)
codaSamples = coda.samples(jagsModel, 
                           variable.names = c("a", "b"),
                           n.iter = 5000)
ms = ggs(codaSamples)

parametric approximation for posteriors

getGammaApprox = function(samples){
  s = sd(samples)
  m = mean(samples)
  ra = ( m + sqrt( m^2 + 4*s^2 ) ) / ( 2 * s^2 )
  sh = 1 + m * ra
  return(c(shape = sh, rate = ra))
}
paramA = getGammaApprox(filter(ms, Parameter == "a")$value)
paramB = getGammaApprox(filter(ms, Parameter == "b")$value)

parameter estimation for power model

summary of estimated parameters

##   parameter     shape      rate
## 1         a 1066.4871 1004.2332
## 2         b  194.7138 1486.6266
## 3         c 2333.6721 2464.4321
## 4         d  232.5604  466.2629

convenience function for parameters and samples

get_posterior_approximation = function(modelIndex = 0, nSamples = 50000) {
  modelString = "model{
  a ~ dunif(0,1.5)
  b ~ dunif(0,1.5)
  for (i in 1: 6){
  pT[i] = ifelse(m == 0, a*exp(-t[i]*b), a*t[i]^(-b))
  p[i] = min(max( pT[i], 0.0001), 0.9999)
  obs[i] ~ dbinom(p[i], 100)
  }}"
  dataList = list(obs = obs, t = t, m = modelIndex) # includes flag for model
  jagsModel = jags.model(file = textConnection(modelString), 
                         data = dataList,
                         n.chains = 2)
  update(jagsModel, n.iter = 25000)
  codaSamples = coda.samples(jagsModel, 
                                 variable.names = c("a", "b"),
                                 n.iter = nSamples)
  ms = ggs(codaSamples)
  a = getGammaApprox(filter(ms, Parameter == "a")$value)
  b = getGammaApprox(filter(ms, Parameter == "b")$value)
  return(list(a_shape = a[["shape"]], a_rate = a[["rate"]],
              b_shape = b[["shape"]], b_rate = b[["rate"]],
              a_samples = filter(ms, Parameter == "a")$value,
              b_samples = filter(ms, Parameter == "b")$value)  )
}

implement this

nSamples = 50000
start.time <- Sys.time()
post_exp = get_posterior_approximation(0, nSamples)
post_pow = get_posterior_approximation(1, nSamples)
estimate_evidence = function(post, i, lhFun = "lhExp") {
  quotients = dgamma(post$a_samples[1:i], shape = post$a_shape, rate = post$a_rate) * 
    dgamma(post$b_samples[1:i], shape = post$b_shape, rate = post$b_rate) /
    do.call(lhFun, list(a = post$a_samples[1:i], b = post$b_samples[1:i]) )
  1/mean(quotients)
}
BF_importance = estimate_evidence(post_exp, nSamples, "lhExp") / 
                    estimate_evidence(post_pow, nSamples, "lhPow")
end.time <- Sys.time()
time.taken_importance <- end.time - start.time

\[ \begin{align*} \text{recall: } \ \ \ \ \ \ \ \ \frac{1}{P(D)} & & \approx \frac{1}{n} \sum^{n}_{\theta_i \sim P(\theta \mid D)} \frac{h(\theta)}{P(D \mid \theta) P(\theta)} \end{align*} \]

paste0("BF in favor of Exponential Model (using importance sampling): ", 
            round(BF_importance,2))

## [1] "BF in favor of Exponential Model (using importance sampling): 1234.6"

temporal development

transcendental MCMC

idea

make model index a parameter \(M \in \set{0,1}\)

probMod

e.g., posterior likelihood of \(M = 1\):

\[ P(M = 1 \mid D ) \propto P(D \mid M = 1) P(M = 1)\]

JAGS model

model{
  m ~ dbern(0.5)
  a ~ dunif(0,1.5)
  b ~ dunif(0,1.5)
  c ~ dunif(0,1.5)
  d ~ dunif(0,1.5)
  for (i in 1: 6){
    pT[i] = ifelse(m == 0, a*exp(-t[i]*b), c*t[i]^(-d))
    p[i] = min(max( pT[i], 0.0001), 0.9999)
    obs[i] ~ dbinom(p[i], 100)
  }
}

probMod

posteriors

problem:

while MCMC chain "spends time" with one model, the other model's parameters are free to meander around in low likelihood regions, thus preventing the transcendental jump

trickery

tweaking prior model odds & using pseudo-priors

model{
  mT ~ dbern(0.9995)
  m = mT + 1
  a[1] ~ dunif(0,1.5)
  b[1] ~ dunif(0,1.5)
  c[2] ~ dunif(0,1.5)
  d[2] ~ dunif(0,1.5)
  a[2] ~ dgamma(1145, 1078)
  b[2] ~ dgamma(203, 1555)
  c[1] ~ dgamma(2246, 2373)
  d[1] ~ dgamma(242, 485)
for (i in 1: 6){
    pExp[i] = a[m]*exp(-t[i]*b[m])
    pPow[i] = c[m]*t[i]^(-d[m]) 
    pT[i] = ifelse(m == 1, pExp[i], pPow[i])
    p[i] = min(max( pT[i], 0.00001), 0.99999)
    obs[i] ~ dbinom(p[i], 100)
  }  
}

posteriors

filter(ms, Parameter == "m") -> tmc
posterior = prop.table(table(tmc$value))
prior = c(0.0005, 0.9995)
BF_transdim = posterior[1]/posterior[2] * prior[2]/prior[1]
end.time = Sys.time()
time.taken_transdim <- end.time - start.time
show(paste0("BF in favor of Exponential Model (using transdimensional MCMC): ", round(BF_transdim,2)))

## [1] "BF in favor of Exponential Model (using transdimensional MCMC): 1348.74"

summary

method	computation time	BF result
grid approximation	0.289495	1221.3916033
naive Monte Carlo	2.6117401	1212.2276006
importance sampling	6.7606142	1234.5972935
transdimensional MCMC	23.079844	1348.7357985

outlook

Tuesday

parameter inference, model comparison & model critcism

Friday

bootcamp on cognitive models