Bayesian data analysis & cognitive modeling

\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]

overview

generalized linear model (GLM)
- types of variables
  - metric, nominal, ordinal, count
linear model
- ordinary least squares regresssion
- maximum likelihood regression
- Bayesian approaches
generalization to simplest \(t\)-test scenario

generalized linear model

probabilistic models

standard notion

model = likelihood function \(P(D \mid \theta)\)

Bayesian

model = likelihood \(P(D \mid \theta)\) + prior \(P(\theta)\)

approaches to modeling

flavors_of_modeling

generalized linear model

terminology

\(y\) predicted variable, data, observation, …
\(X\) predictor variables for \(y\), explanatory variables, …

blueprint of a GLM

\[ \begin{align*} \eta & = \text{linear_combination}(X) \\ \mu & = \text{link_fun}( \ \eta, \theta_{\text{LF}} \ ) \\ y & \sim \text{lh_fun}( \ \mu, \ \theta_{\text{LH}} \ ) \end{align*} \]

glm_scheme

types of variables

type	examples
metric	speed of a car, reading time, average time spent cooking p.d., …
binary	coin flip, truth-value judgement, experience with R, …
nominal	gender, political party, favorite philosopher, …
ordinal	level of education, rating scale judgement, …
count	number of cars passing under bridge in 1h, …

common link & likelihood function

type of \(y\)	(inverse) link function	likelihood function
metric	\(\mu = \eta\)	\(y \sim \text{Normal}(\mu, \sigma)\)
binary	\(\mu = \text{logistic}(\eta, \theta, \gamma) = (1 + \exp(-\gamma (\eta - \theta)))^{-1}\)	\(y \sim \text{Binomial}(\mu)\)
nominal	\(\mu_k = \text{soft-max}(\eta_k, \lambda) \propto \exp(\lambda \eta_k)\)	\(y \sim \text{Multinomial}({\mu})\)
ordinal	\(\mu_k = \text{threshold-Phi}(\eta_k, \sigma, {\delta})\)	\(y \sim \text{Multinomial}({\mu})\)
count	\(\mu = \exp(\eta)\)	\(y \sim \text{Poisson}(\mu)\)

linear regression

muder rate data set

murder_data = readr::read_csv('../data/06_murder_rates.csv') %>% 
  rename(murder_rate = annual_murder_rate_per_million_inhabitants,
         low_income = percentage_low_income, 
         unemployment = percentage_unemployment) %>% 
  select(murder_rate, low_income, unemployment,population)
murder_data %>% head

## # A tibble: 6 x 4
##   murder_rate low_income unemployment population
##         <dbl>      <dbl>        <dbl>      <int>
## 1        11.2       16.5          6.2     587000
## 2        13.4       20.5          6.4     643000
## 3        40.7       26.3          9.3     635000
## 4         5.3       16.5          5.3     692000
## 5        24.8       19.2          7.3    1248000
## 6        12.7       16.5          5.9     643000

visualize data

GGally::ggpairs(murder_data, 
                title = "Murder rate data")

zooming in

murder_sub = select(murder_data, murder_rate, low_income) 
rate_income_plot = ggplot(murder_sub, 
    aes(x = low_income, y = murder_rate)) + geom_point()
rate_income_plot

linear regression

given

predicted variable \(y\): murder rates
metric
predictor variable \(x\): percentage low income
metric

question

wich linear function –in terms of intercept \(\beta_0\) and slope \(\beta_1\)– approximates the data best?

\[ y_{\text{pred}} = \beta_0 + \beta_1 x \]

ordinary linear regression

"geometric" approach

find \(\beta_0\) and \(\beta_1\) that minimize the squared error between predicted \(y_\text{pred}\) and observed \(y\)

mse = function(y, x, beta_0, beta_1) {
  yPred = beta_0 + x * beta_1
  (y-yPred)^2 %>% mean
}
fit_mse = optim(par = c(0, 1), 
  fn = function(par) with(murder_data,
    mse(murder_rate,low_income, 
        par[1], par[2])))
fit_mse$par

## [1] -29.905188   2.559588

linear regression: MLE

maximum likelihood approach

find \(\beta_0\) and \(\beta_1\) and \(\sigma\) that maximize the likelihood of observed \(y\):

\[ \begin{align*} y_{\text{pred}} & = \beta_0 + \beta_1 x & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y & \sim \mathcal{N}(\mu = y_{\text{pred}}, \sigma) \end{align*} \]

nll = function(y, x, beta_0, beta_1, sd) {
  if (sd <= 0) {return( Inf )}
  yPred = beta_0 + x * beta_1
  nll = -dnorm(y, mean=yPred, sd=sd, log = T)
  sum(nll)
}
fit_lh = optim(par = c(0, 1, 1), 
  fn = function(par) with(murder_data, 
    nll(murder_rate, low_income,
        par[1], par[2], par[3])))
fit_lh$par

## [1] -29.901445   2.559601   5.234126

compare homebrew to the "real thing"

fit_lm = lm(formula = murder_rate ~ low_income, data = murder_data)
fit_glm = glm(formula = murder_rate ~ low_income, data = murder_data)

tibble(parameter = c("beta_0", "beta_1"),
       manual_mse = fit_mse$par,
       manual_lh = fit_lh$par[1:2],
       lm = fit_lm %>% coefficients,
       glm = fit_glm %>% coefficients) %>% show

## # A tibble: 2 x 5
##   parameter manual_mse  manual_lh        lm       glm
##       <chr>      <dbl>      <dbl>     <dbl>     <dbl>
## 1    beta_0 -29.905188 -29.901445 -29.90116 -29.90116
## 2    beta_1   2.559588   2.559601   2.55939   2.55939

linear regression: a Bayesian approach

Bayes: likelihood + prior

inspect posterior distribution over \(\beta_0\), \(\beta_1\) and \(\sigma_{\text{err}}\) given the data \(y\) and the model:

\[ \begin{align*} y_{\text{pred}} & = \beta_0 + \beta_1 x & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y & \sim \mathcal{N}(\mu = y_{\text{pred}}, \sigma_{err}) \\ \beta_i & \sim \mathcal{N}(0, \sigma_{\beta}) & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \frac{1}{\sigma_{err}^2} & \sim \text{Gamma}(0.1,0.1) \end{align*} \]

model{
  sigma_e = 1/sqrt(tau_e)
  tau_e ~ dgamma(0.1,0.1)
  b0 ~ dnorm(0, 1/10000000)
  b1 ~ dnorm(0, 1/10000000)
  for (i in 1:k){
    yPred[i] = b0 + b1 * x[i]
    y[i] ~ dnorm(yPred[i], tau_e)
  }
}

linear regression: a Bayesian approach

## # A tibble: 2 x 4
##   Parameter     HDI_lo       mean     HDI_hi
##      <fctr>      <dbl>      <dbl>      <dbl>
## 1        b0 -45.434242 -29.124843 -12.157119
## 2        b1   1.711819   2.520637   3.377466

inference with regression models

parameter estimation

is coefficient \(\beta_i\) significantly/credibly different from, e.g., 0?

model comparison

compare model with \(\beta_i\) fixed to model with \(\beta_i\) free

prediction/model criticism

predict future data from model
could model be a likely generating model of the observed data?

significance of point-estimates of coefficients

fit_glm = glm(formula = murder_rate ~ low_income, data = murder_data)
fit_glm %>% summary

## 
## Call:
## glm(formula = murder_rate ~ low_income, data = murder_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -9.1663  -2.5613  -0.9552   2.8887  12.3475  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -29.901      7.789  -3.839   0.0012 ** 
## low_income     2.559      0.390   6.562 3.64e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 30.38125)
## 
##     Null deviance: 1855.20  on 19  degrees of freedom
## Residual deviance:  546.86  on 18  degrees of freedom
## AIC: 128.93
## 
## Number of Fisher Scoring iterations: 2

credible intervals for posteriors over coefficients

ggmcmc::ggs(codaSamples) %>% 
  group_by(Parameter) %>% 
  summarize(HDI_lo = coda::HPDinterval(as.mcmc(value))[1],
          mean = mean(value),
          HDI_hi = coda::HPDinterval(as.mcmc(value))[2])  %>% 
  show

## # A tibble: 2 x 4
##   Parameter     HDI_lo       mean     HDI_hi
##      <fctr>      <dbl>      <dbl>      <dbl>
## 1        b0 -45.434242 -29.124843 -12.157119
## 2        b1   1.711819   2.520637   3.377466

model comparison

frequentist

fit_glm_simple = glm(formula = murder_rate ~ 1, data = murder_data)
AIC(fit_glm_simple, fit_glm)

##                df      AIC
## fit_glm_simple  2 151.3579
## fit_glm         3 128.9268

Bayesian

# package does not 'like' tibbles
BayesFactor::regressionBF(formula = murder_rate ~ low_income, 
            data = as.data.frame(murder_sub))

## Bayes factor analysis
## --------------
## [1] low_income : 3429.181 ±0.01%
## 
## Against denominator:
##   Intercept only 
## ---
## Bayes factor type: BFlinearModel, JZS

\(t\)-test scenario

IQ data

iq_data = readr::read_csv('../data/07_Kruschke_TwoGroupIQ.csv')
summary(iq_data)

##      Score           Group          
##  Min.   : 50.00   Length:120        
##  1st Qu.: 92.75   Class :character  
##  Median :102.00   Mode  :character  
##  Mean   :104.13                     
##  3rd Qu.:112.00                     
##  Max.   :208.00

possible research questions?

is average IQ-score higher than 100 in treatment group?
is average IQ-score higher in treatment group than in control? ( next time)

from Kruschke (2015, Chapter 16)

Case 1: IQ-score higher than 100 in treatment group?

Bayesian GLM:

inspect posterior distribution over \(\beta_0\) and \(\sigma_{\text{err}}\) given the data \(y\) and the model:

\[ \begin{align*} y_{\text{pred}} & = \beta_0 & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y & \sim \mathcal{N}(\mu = y_{\text{pred}}, \sigma_{err}) \\ \beta_0 & \sim \mathcal{N}(100, \sigma_{\beta}) & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \frac{1}{\sigma_{err}^2} & \sim \text{Gamma}(0.1,0.1) \end{align*} \]

model{
  sigma_e = 1/sqrt(tau_e)
  tau_e ~ dgamma(0.1,0.1)
  b0 ~ dnorm(100, 1/10000000)
  for (i in 1:k){
    yPred[i] = b0
    y[i] ~ dnorm(yPred[i], tau_e)
  }
}

posterior inference: results

## # A tibble: 1 x 4
##   Parameter   HDI_lo     mean   HDI_hi
##      <fctr>    <dbl>    <dbl>    <dbl>
## 1        b0 101.5248 107.8332 114.2719

summary

generalized linear model

terminology

\(y\) predicted variable, data, observation, …
\(X\) predictor variables for \(y\), explanatory variables, …

blueprint of a GLM

\[ \begin{align*} \eta & = \text{linear_combination}(X) \\ \mu & = \text{link_fun}( \ \eta, \theta_{\text{LF}} \ ) \\ y & \sim \text{lh_fun}( \ \mu, \ \theta_{\text{LH}} \ ) \end{align*} \]

glm_scheme

outlook

Friday

robust regression
GLM with nominal predictors
GLM with nominal & ordinal predicted variables

Tuesday

mixed effects models
model comparison by cross-validation