02c: Categorical predictors (exercises)

Bayesian regression: theory & practice

Author

Michael Franke & Timo Roettger

Preamble

Here is code to load (and if necessary, install) required packages, and to set some global options (for plotting and efficient fitting of Bayesian models).

Toggle code

# install packages from CRAN (unless installed)
pckgs_needed <- c(
  "tidyverse",
  "brms",
  "rstan",
  "rstanarm",
  "remotes",
  "tidybayes",
  "bridgesampling",
  "shinystan",
  "mgcv"
)
pckgs_installed <- installed.packages()[,"Package"]
pckgs_2_install <- pckgs_needed[!(pckgs_needed %in% pckgs_installed)]
if(length(pckgs_2_install)) {
  install.packages(pckgs_2_install)
} 

# install additional packages from GitHub (unless installed)
if (! "aida" %in% pckgs_installed) {
  remotes::install_github("michael-franke/aida-package")
}
if (! "faintr" %in% pckgs_installed) {
  remotes::install_github("michael-franke/faintr")
}
if (! "cspplot" %in% pckgs_installed) {
  remotes::install_github("CogSciPrag/cspplot")
}

# load the required packages
x <- lapply(pckgs_needed, library, character.only = TRUE)
library(aida)
library(faintr)
library(cspplot)

# these options help Stan run faster
options(mc.cores = parallel::detectCores())

# use the CSP-theme for plotting
theme_set(theme_csp())

# global color scheme from CSP
project_colors = cspplot::list_colors() |> pull(hex)
# names(project_colors) <- cspplot::list_colors() |> pull(name)

# setting theme colors globally
scale_colour_discrete <- function(...) {
  scale_colour_manual(..., values = project_colors)
}
scale_fill_discrete <- function(...) {
   scale_fill_manual(..., values = project_colors)
}

Toggle code

dolphin <- aida::data_MT

Exercise 1: Regression w/ multiple categorical predictors

We want to regress log(RT) against the full combination of categorical factors group, condition, and prototype_label.

log(RT) ~ group * condition * prototype_label

The research hypotheses we would like to investigate are:

Typical trials are faster than atypical ones.
CoM trials are slower than the other kinds of trials (straight and curved) together, and respectively.
‘straight’ trials are faster than ‘curved’ trials.
Click trials are slower than touch trials.

But for this to work (without at least mildly informative priors), we would need to have a sufficient amount of observations in each cell. So, let’s check:

Toggle code

dolphin |>
  mutate(group = as_factor(group),
         condition = as_factor(condition),
         prototype_label = as_factor(prototype_label)) |>
  count(group, condition, prototype_label, .drop = FALSE) |>
  arrange(n)

# A tibble: 20 × 4
   group condition prototype_label     n
   <fct> <fct>     <fct>           <int>
 1 touch Atypical  dCoM2               0
 2 touch Typical   dCoM2               0
 3 touch Atypical  dCoM                9
 4 click Atypical  dCoM2              11
 5 click Typical   dCoM2              11
 6 touch Typical   dCoM               14
 7 touch Atypical  cCoM               21
 8 touch Typical   cCoM               31
 9 click Atypical  cCoM               32
10 click Atypical  curved             36
11 touch Atypical  curved             37
12 click Typical   cCoM               48
13 click Atypical  dCoM               50
14 click Typical   dCoM               52
15 touch Typical   curved             72
16 click Typical   curved             84
17 click Atypical  straight          189
18 touch Atypical  straight          263
19 click Typical   straight          494
20 touch Typical   straight          598

So, there are cells for which we have no observations at all. For simplicity, we therefore just lump all “change of mind”-type trajectories into one category:

Toggle code

dolphin_prepped <-
  dolphin |>
  mutate(
    prototype_label = case_when(
     prototype_label %in% c('curved', 'straight') ~ prototype_label,
     TRUE ~ 'CoM'
    ),
    prototype_label = factor(prototype_label,
                             levels = c('straight', 'curved', 'CoM')))

dolphin_prepped |>
  select(RT, prototype_label)

# A tibble: 2,052 × 2
      RT prototype_label
   <dbl> <fct>          
 1   950 straight       
 2  1251 straight       
 3   930 curved         
 4   690 curved         
 5   951 CoM            
 6  1079 CoM            
 7  1050 CoM            
 8   830 straight       
 9   700 straight       
10   810 straight       
# ℹ 2,042 more rows

Here is a plot of the data to be analyzed:

Toggle code

dolphin_prepped |>
  ggplot(aes(x = log(RT), fill = condition)) +
  geom_density(alpha = 0.4) +
  facet_grid(group ~ prototype_label)

Exercise 1a

Use brm() to run a linear regression model for the data set dolphin_prepped and the formula:

log(RT) ~ group * condition * prototype_label

Set the prior for all population-level slope coefficients to a reasonable, weakly-informative but unbiased prior.

Solution

Toggle code

fit <- brm(
  formula = log(RT) ~ group * condition * prototype_label,
  prior   = prior(student_t(1, 0, 3), class = "b"),
  data    = dolphin_prepped
  )

Exercise 1b

Plot the posteriors for population-level slope coefficients using the tidybayes package in order to:

determine which combination of factor levels is the default cell
check which coefficients have 95% CIs that do not include zero
try to use this latter information to address any of our research hypotheses (stated above)

Solution

Toggle code

tidybayes::summarise_draws(fit)

# A tibble: 15 × 10
   variable       mean   median      sd     mad       q5      q95  rhat ess_bulk
   <chr>         <num>    <num>   <num>   <num>    <num>    <num> <num>    <num>
 1 b_Interce…  7.63e+0  7.63e+0 0.0337  0.0337   7.58e+0  7.69e+0  1.00    1953.
 2 b_groupto… -2.39e-1 -2.39e-1 0.0442  0.0429  -3.12e-1 -1.67e-1  1.00    1772.
 3 b_conditi… -2.46e-1 -2.46e-1 0.0390  0.0395  -3.10e-1 -1.81e-1  1.00    1776.
 4 b_prototy… -5.14e-2 -5.07e-2 0.0823  0.0833  -1.88e-1  8.79e-2  1.00    1829.
 5 b_prototy…  1.77e-1  1.78e-1 0.0574  0.0584   8.11e-2  2.68e-1  1.00    1981.
 6 b_groupto…  1.25e-2  1.32e-2 0.0521  0.0513  -7.50e-2  9.85e-2  1.00    1660.
 7 b_groupto…  1.34e-1  1.33e-1 0.115   0.115   -4.94e-2  3.21e-1  1.00    1847.
 8 b_groupto…  3.34e-3  1.79e-3 0.106   0.107   -1.69e-1  1.76e-1  1.00    2288.
 9 b_conditi…  3.16e-2  3.16e-2 0.0980  0.0993  -1.29e-1  1.95e-1  1.00    1852.
10 b_conditi…  4.50e-2  4.53e-2 0.0743  0.0764  -7.51e-2  1.71e-1  1.00    1970.
11 b_groupto… -1.37e-1 -1.37e-1 0.137   0.136   -3.59e-1  9.15e-2  1.00    1932.
12 b_groupto…  5.35e-2  5.12e-2 0.137   0.137   -1.68e-1  2.78e-1  1.00    2117.
13 sigma       4.60e-1  4.60e-1 0.00716 0.00703  4.49e-1  4.72e-1  1.00    4666.
14 lprior     -2.79e+1 -2.79e+1 0.0129  0.0101  -2.79e+1 -2.79e+1  1.00    1574.
15 lp__       -1.35e+3 -1.35e+3 2.55    2.47    -1.35e+3 -1.34e+3  1.00    1564.
# ℹ 1 more variable: ess_tail <num>

Toggle code

tidybayes::gather_draws(fit, `b_.*`, regex = TRUE) |>
  filter(.variable != "b_Intercept") |>
  ggplot(aes(y = .variable, x = .value)) +
  tidybayes::stat_halfeye() +
  labs(x = "", y = "") +
  geom_segment(x = 0, xend = 0, y = Inf, yend = -Inf,
               lty = "dashed")

The default cell is for click-atypical-straight. The coeffiencents with 95% CIs that do not include zero are: grouptouch, conditionTypical, prototype_labelCoM. None of these give us direct information about our research hypotheses.

Exercise 1c

Use the faintr package to get information relevant for the current research hypotheses. Interpret each result with respect to what we may conclude from it.

Solution

Toggle code

# 1. Typical trials are faster than atypical ones.
# -> There is overwhelming evidence that this is true
#    (given the data and the model).

faintr::compare_groups(
  fit,
  lower  = condition == 'Typical',
  higher = condition == 'Atypical'
)

Outcome of comparing groups: 
 * higher:  condition == "Atypical" 
 * lower:   condition == "Typical" 
Mean 'higher - lower':  0.2284 
95% HDI:  [ 0.17 ; 0.2894 ]
P('higher - lower' > 0):  1 
Posterior odds:  Inf

Toggle code

# 2. CoM trials are slower than the other kinds of trials
#    (straight and curved) together, and respectively.
# -> There is overwhelming evidence that this is true
#    (given the data and the model).

faintr::compare_groups(
  fit,
  lower  = prototype_label != 'CoM',
  higher = prototype_label == 'CoM'
)

Outcome of comparing groups: 
 * higher:  prototype_label == "CoM" 
 * lower:   prototype_label != "CoM" 
Mean 'higher - lower':  0.2158 
95% HDI:  [ 0.1424 ; 0.2821 ]
P('higher - lower' > 0):  1 
Posterior odds:  Inf

Toggle code

faintr::compare_groups(
  fit,
  lower  = prototype_label == 'straight',
  higher = prototype_label == 'CoM'
)

Outcome of comparing groups: 
 * higher:  prototype_label == "CoM" 
 * lower:   prototype_label == "straight" 
Mean 'higher - lower':  0.2143 
95% HDI:  [ 0.1521 ; 0.2815 ]
P('higher - lower' > 0):  1 
Posterior odds:  Inf

Toggle code

# 3. 'straight' trials are faster than 'curved' trials.
# -> There is no evidence for this hypothesis
#    (given the data and the model).

faintr::compare_groups(
  fit,
  lower  = prototype_label == 'straight',
  higher = prototype_label == 'curved'
)

Outcome of comparing groups: 
 * higher:  prototype_label == "curved" 
 * lower:   prototype_label == "straight" 
Mean 'higher - lower':  -0.003049 
95% HDI:  [ -0.06976 ; 0.06543 ]
P('higher - lower' > 0):  0.4655 
Posterior odds:  0.8709

Toggle code

# 4. Click trials are slower than touch trials.
# -> There is overwhelming evidence that this is true
#    (given the data and the model).

faintr::compare_groups(
  fit,
  lower  = group == 'touch',
  higher = group == 'click'
)

Outcome of comparing groups: 
 * higher:  group == "click" 
 * lower:   group == "touch" 
Mean 'higher - lower':  0.2014 
95% HDI:  [ 0.1385 ; 0.2603 ]
P('higher - lower' > 0):  1 
Posterior odds:  Inf

Exercise 2: Regression w/ metric & categorical predictors

Exercise 2a

Create a new data frame that contains only the mean values of the RT, and MAD for each animal (exemplar) and for correct and incorrect responses. Print out the head of the new data frame.

Solution

Toggle code

# aggregate
dolphin_agg <- dolphin |> 
  group_by(exemplar, correct) |> 
  dplyr::summarize(MAD = mean(MAD, na.rm = TRUE),
                   RT = mean(RT, na.rm = TRUE))
  
# let's have a look
head(dolphin_agg)

# A tibble: 6 × 4
# Groups:   exemplar [3]
  exemplar  correct   MAD    RT
  <chr>       <dbl> <dbl> <dbl>
1 alligator       0 271.  4246.
2 alligator       1  87.4 1717.
3 bat             0 176.  3334.
4 bat             1 182.  2252.
5 butterfly       0  15.0 1636.
6 butterfly       1 145.  1761.

Exercise 2b

Run a linear regression using brms. MAD is the dependent variable (i.e. the measure) and both RT and correct are independent variables (MAD ~ RT + correct). (Hint: the coefficients might be really small, so make sure the output is printed with enough numbers after the comma.)

Try to understand the coefficient table. There is one coefficient for RT and one coefficient for correct which gives you the change in MAD from incorrect to correct responses.

Solution

Toggle code

# specify the model 
model2 = brm(
  # model formula
  MAD ~ RT + correct, 
  # data
  data = dolphin_agg
  )

print(summary(model2), digits = 5)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: MAD ~ RT + correct 
   Data: dolphin_agg (Number of observations: 36) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
           Estimate Est.Error  l-95% CI u-95% CI    Rhat Bulk_ESS Tail_ESS
Intercept  21.43821  36.00766 -47.86667 92.78947 0.99987     4281     3191
RT          0.06231   0.01541   0.03179  0.09370 1.00072     4111     3277
correct   -15.08456  26.27864 -68.47390 36.01476 1.00147     4043     2838

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI    Rhat Bulk_ESS Tail_ESS
sigma 77.51173   9.58571 61.48184 98.87038 1.00061     3640     3036

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Exercise 2c

Plot a scatter plot of MAD ~ RT and color code it for correct responses. (Hint: Make sure that correct is treated as a factor and not a numeric vector). Draw two predicted lines into the scatterplot. One for correct responses (“lightblue”) and one for incorrect responses (“orange”).

Solution

Toggle code

dolphin_agg$correct <- as.factor(as.character(dolphin_agg$correct))

# extract model parameters:
model_intercept <- summary(model2)$fixed[1,1]
model_RT <- summary(model2)$fixed[2,1]
model_correct <- summary(model2)$fixed[3,1]

# plot
ggplot(data = dolphin_agg, 
       aes(x = RT, 
           y = MAD,
           color = correct)) + 
  geom_abline(intercept = model_intercept, slope = model_RT, color = "orange", size  = 2) +
  geom_abline(intercept = model_intercept + model_correct , slope = model_RT, color = "lightblue",size  = 2) +
  geom_point(size = 3, alpha = 0.3)

Exercise 2d

Extract the posteriors for the coefficients of both RT and correct from the model output (use the spread_draws() function), calculate their means and a 67% Credible Interval. Print out the head of the aggregated dataframe.

Solution

Toggle code

# get posteriors for the relevant coefficients
posteriors2 <- model2 |>
  spread_draws(b_RT, b_correct) |>
  select(b_RT, b_correct) |> 
  gather(key = "parameter", value = "posterior")

# aggregate
posteriors2_agg <- posteriors2 |> 
  group_by(parameter) |> 
  summarise(mean_posterior = mean(posterior),
            `67lowerCrI` = HDInterval::hdi(posterior, credMass = 0.67)[1],
            `67higherCrI` = HDInterval::hdi(posterior, credMass = 0.67)[2]
            )

# print out
posteriors2_agg

# A tibble: 2 × 4
  parameter mean_posterior `67lowerCrI` `67higherCrI`
  <chr>              <dbl>        <dbl>         <dbl>
1 b_RT              0.0623       0.0473        0.0762
2 b_correct       -15.1        -39.0          11.0

Exercise 2e

Plot the scatterplot from 2c and plot 50 sample tuples for the regression lines for correct and incorrect responses.

Solution

Toggle code

# sample 50 random numbers from 4000 samples
random_50 <- sample(1:4000, 50, replace = FALSE)
  
# wrangle data frame
posteriors3 <- model2 |>
  spread_draws(b_Intercept, b_RT, b_correct) |>
  select(b_Intercept, b_RT, b_correct) |> 
  # filter by the row numbers in random_50
  slice(random_50)
  
# plot
ggplot(data = dolphin_agg, 
       aes(x = RT, 
           y = MAD, 
           color = correct)) + 
  geom_abline(data = posteriors3,
              aes(intercept = b_Intercept, slope = b_RT), 
              color = "orange", size  = 0.1) +
  geom_abline(data = posteriors3,
              aes(intercept = b_Intercept + b_correct, slope = b_RT), 
              color = "lightblue", size  = 0.1) +
  geom_point(size = 3, alpha = 0.3)

Exercise 2f

Given our model and our data, calculate the evidence ratio of correct responses exhibiting larger MADs than incorrect responses. How would you interpret the result?

Solution

Toggle code

hypothesis(model2, 'correct > 0')

Hypothesis Tests for class b:
     Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (correct) > 0   -15.08     26.28   -58.72    27.46       0.39      0.28     
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

Exercise 3: Metric and categorical variables, and their interaction

Here is an aggregated data set dolphin_agg for you.

Toggle code

# aggregate
dolphin_agg <- dolphin %>% 
  group_by(group, exemplar) %>% 
  dplyr::summarize(MAD = median(MAD, na.rm = TRUE),
                   RT = median(RT, na.rm = TRUE)) %>% 
  mutate(log_RT = log(RT))

Exercise 3a

Standardize (“z-transform”) log_RT such that the mean is at zero and 1 unit corresponds to the standard deviation. Name it log_RT_s. (Hint: use function scale().)

Solution

Toggle code

dolphin_agg$log_RT_s <- scale(dolphin_agg$log_RT, scale = TRUE)

Exercise 3b

Run a linear model with brms that predicts MAD based on log_RT_s, group, and their two-way interaction.

Solution

Toggle code

model1 = brm(
  MAD ~ log_RT_s * group, 
  data = dolphin_agg
  )

Exercise 3c

Plot MAD (y) against log_RT_s (x) in a scatter plot and color-code for group. Plot the regression lines for the click and the touch group into the plot and don’t forget to take possible interactions into account.

Solution

Toggle code

# extract posterior means for model coefficients
Intercept = summary(model1)$fixed[1,1]
log_RT = summary(model1)$fixed[2,1]
group = summary(model1)$fixed[3,1]
interaction = summary(model1)$fixed[4,1]

# plot
ggplot(data = dolphin_agg, 
       aes(x = log_RT_s, 
           y = MAD, 
           color = group)) + 
  geom_point(size = 3, alpha = 0.3) +
  geom_vline(xintercept = 0, lty = "dashed") +
  geom_abline(intercept = Intercept, slope = log_RT, 
              color = project_colors[1], size = 2) +
  geom_abline(intercept = Intercept + group, slope = log_RT + interaction, 
              color = project_colors[2], size = 2)

Exercise 3d

Specify very skeptic priors for all three coefficients. Use a normal distribution with mean = 0, and sd = 10. Rerun the model with those priors.

Solution

Toggle code

# specify priors
priors_model2 <- c(
  set_prior("normal(0,10)", class = "b", coef = "log_RT_s"),
  set_prior("normal(0,10)", class = "b", coef = "grouptouch"),
  set_prior("normal(0,10)", class = "b", coef = "log_RT_s:grouptouch")
)

# model
model2 = brm(
  MAD ~ log_RT_s * group, 
  data = dolphin_agg,
  prior = priors_model2
  )

Exercise 3e

Compare the model output of model1 to model2. What are the differences and what are the reasons for these differences?

Solution

Toggle code

# We can compare the model predictions by looking at the coefficients / plotting them:
summary(model1)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: MAD ~ log_RT_s * group 
   Data: dolphin_agg (Number of observations: 38) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept              66.56      7.64    51.96    82.15 1.00     2796     2323
log_RT_s               16.59      7.64     1.38    31.49 1.00     2262     2240
grouptouch            -35.46     10.95   -57.49   -14.08 1.00     3493     2792
log_RT_s:grouptouch   -11.24     11.30   -33.22    11.26 1.00     2513     2414

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    27.65      3.38    21.90    34.97 1.00     3162     2551

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Toggle code

summary(model2)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: MAD ~ log_RT_s * group 
   Data: dolphin_agg (Number of observations: 38) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept              60.33      6.18    48.20    72.36 1.00     4377     2544
log_RT_s               13.71      5.25     3.46    24.08 1.00     3590     2974
grouptouch            -17.79      7.38   -32.08    -2.88 1.00     4320     2518
log_RT_s:grouptouch    -0.96      7.23   -14.97    13.30 1.00     4071     3225

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    28.53      3.51    22.64    36.08 1.00     3855     2858

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Toggle code

# extract posterior means for model coefficients
Intercept = summary(model2)$fixed[1,1]
log_RT = summary(model2)$fixed[2,1]
group = summary(model2)$fixed[3,1]
interaction = summary(model2)$fixed[4,1]

# plot
ggplot(data = dolphin_agg, 
       aes(x = log_RT_s, 
           y = MAD, 
           color = group)) + 
  geom_point(size = 3, alpha = 0.3) +
  geom_vline(xintercept = 0, lty = "dashed") +
  geom_abline(intercept = Intercept, slope = log_RT, 
              color = project_colors[1], size = 2) +
  geom_abline(intercept = Intercept + group, slope = log_RT + interaction, 
              color = project_colors[1], size = 2)

The magnitude of the coefficients is much smaller in model2, with the interaction term being close to zero. As a result, the lines in the plot are closer together and run in parallel. The reason for this change lies in the priors. We defined the priors of model2 rather narrowly, down-weighing data points larger or smaller than zero. This is a case of the prior dominating the posterior.