Hierarchical GLMs

Michael Franke

Case study: processing relative clauses

 

  • in most languages subject relative clauses
    are easier than object relative clauses

  • but Chinese seems to be an exception

 

 

subject relative clause

The senator who interrogated the journalist …

object relative clause

The senator who the journalist interrogated …

\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]

data: self-paced reading times

37 subjects read 15 sentences either with an SRC or an ORC in a self-paced reading task

 

# A tibble: 15 × 4
    subj  item so       rt
   <dbl> <dbl> <chr> <dbl>
 1     1    13 1      1561
 2     1     6 -1      959
 3     1     5 1       582
 4     1     9 1       294
 5     1    14 -1      438
 6     1     4 -1      286
 7     1     8 -1      438
 8     1    10 -1      278
 9     1     2 -1      542
10     1    11 1       494
11     1     7 1       270
12     1     3 1       406
13     1    16 -1      374
14     1    15 1       286
15     1     1 1       246

\(\Leftarrow\) contrast coding of categorical predictor so

data from Gibson & Wu (2013)

inspect data

# A tibble: 2 × 2
  so    mean_log_rt
  <chr>       <dbl>
1 -1           6.10
2 1            6.02

fixed effects model

  • predict log-reading times as affected by treatment so

  • assume improper priors for parameters

 

\[ \begin{align*} \log(\mathtt{rt}_i) & \sim \mathcal{N}(\eta_i, \sigma_{err}) & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \eta_{i} & = \beta_0 + \beta_1 \mathtt{so}_i \\ \sigma_{err} & \sim \mathcal{U}(0, \infty) & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \beta_0, \beta_1 & \sim \mathcal{U}(- \infty, \infty) \end{align*} \]  

fixed effects model: results

 

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: log(rt) ~ so 
   Data: rt_data (Number of observations: 547) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     6.10      0.04     6.03     6.17 1.00     4102     3064
so1          -0.08      0.05    -0.17     0.02 1.00     4434     2854

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.60      0.02     0.57     0.64 1.00     3996     2748

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

fixed effects model: results

varying intercepts model

  • predict log-reading times as affected by treatment so

  • assume improper priors for parameters

  • assume that different subjects and items could be “slower” or “faster” throughout

 

\[ \begin{align*} \log(\mathtt{rt}_i) & \sim \mathcal{N}(\eta_i, \sigma_{err}) & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \eta_{i} & = \beta_0 + \underbrace{u_{0,\mathtt{subj}_i} + w_{0,\mathtt{item}_i}}_{\text{varying intercepts}} + \beta_1 \mathtt{so}_i \\ u_{0,\mathtt{subj}_i} & \sim \mathcal{N}(0, \sigma_{u_0}) & \ \ \ \ \ \ \ \ \ \ \ \ \ \ w_{0,\mathtt{subj}_i} & \sim \mathcal{N}(0, \sigma_{w_0}) \\ \sigma_{err}, \sigma_{u_0}, \sigma_{w_0} & \sim \mathcal{U}(0, \infty) & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \beta_0, \beta_1 & \sim \mathcal{U}(- \infty, \infty) \end{align*} \]  

varying intercepts model: results

 

interc.+slopes model

  • predict log-reading times as affected by treatment so

  • assume improper priors for parameters

  • assume that different subjects and items could be “slower” or “faster” throughout

  • assume that different subjects and items react more or less to so manipulation

 

\[ \begin{align*} \log(\mathtt{rt}_i) & \sim \mathcal{N}(\eta_i, \sigma_{err})\\ \eta_{i} & = \beta_0 + \underbrace{u_{0,\mathtt{subj}_i} + w_{0,\mathtt{item}_i}}_{\text{varying intercepts}} + (\beta_1 + \underbrace{u_{1,\mathtt{subj}_i} + w_{1,\mathtt{item}_i}}_{\text{varying slopes}} ) \mathtt{so}_i \end{align*} \] \[ \begin{align*} u_{0,\mathtt{subj}_i} & \sim \mathcal{N}(0, \sigma_{u_0}) & w_{0,\mathtt{subj}_i} & \sim \mathcal{N}(0, \sigma_{w_0}) \\ u_{1,\mathtt{subj}_i} & \sim \mathcal{N}(0, \sigma_{u_1}) & w_{1,\mathtt{subj}_i} & \sim \mathcal{N}(0, \sigma_{w_1}) \\ \sigma_{err}, \sigma_{u_{0|1}}, \sigma_{w_{0|1}} & \sim \mathcal{U}(0, \infty) & \beta_0, \beta_1 & \sim \mathcal{U}(- \infty, \infty) \end{align*} \]  

interc.+slopes model: results

 

interc.+slopes model w/ correlation

  • predict log-reading times as affected by treatment so

  • assume improper priors for parameters

  • assume that different subjects and items could be “slower” or “faster” throughout

  • assume that different subjects and items react more or less to so manipulation

  • assume that random intercepts and slopes might be correlated

\[ \begin{align*} \log(\mathtt{rt}_i) & \sim \mathcal{N}(\eta_i, \sigma_{err}) \\ \eta_{i} & = \beta_0 + u_{0,\mathtt{subj}_i} + w_{0,\mathtt{item}_i} + \left (\beta_1 + u_{1,\mathtt{subj}_i} + w_{1,\mathtt{item}_i} \right ) \mathtt{so}_i \\ \begin{pmatrix}u_{0,\mathtt{subj}_i} \\ u_{1,\mathtt{subj}_i} \end{pmatrix} & \sim \mathcal{N} \left (\begin{pmatrix} 0 \\ 0 \end{pmatrix}, \Sigma_{u} \right ) \\ \Sigma_{w} & = \begin{pmatrix} \sigma_{u_{0}}^2 & \rho_u\sigma_{u{0}}\sigma_{u{1}} \\ \rho_u\sigma_{u{0}}\sigma_{u{1}} & \sigma_{u_{0}}^2 \end{pmatrix} \ \ \ \text{same for } \mathtt{item} \\ \beta_0, \beta_1 & \sim \mathcal{U}(- \infty, \infty) \ \ \ \ \ \ \ \ \rho_u, \rho_w \sim \mathcal{U}(-1,1) \ \ \ \ \ \ \ \ \sigma_{err}, \sigma_{u_{0|1}}, \sigma_{w_{0|1}} \sim \mathcal{U}(0, \infty) \end{align*} \]

interc.+slopes model w/ corr.: results

 

How to choose RE-structure

  • two approaches:
    1. “keep it maximal”
    • include the maximum RE structure that “makes sense”
    • what makes sense can depend on a priori conceptual considerations
    • data might not be sufficient to estimate some RE coefficients
    1. “let the data decide”
    • fit models with varying RE-structures and compare
  • the former is more careful / prudent in a science context (learning about the world from the model and the data), the latter may be more adequate for an engineering context (predicting well enough with efficient models)