14.2 Single multi-level predictor

badge-mental-chronometry

The 0/1 coding scheme above works fine for a single categorical predictor value with two levels. It is possible to use linear regression also for categorical predictors with more than two levels. Only, in that case, there are quite a few more reasonable contrast coding schemes, i.e., ways to choose numbers to encode the levels of the predictor.

The mental chronometry data has a single categorical predictor, called block, with three levels, called “reaction”, “goNoGo” and “discrimination”. We are interested in regressing reaction times, stored in variable RT, against block. Our main question of interest is whether these inequalities are supported by the data:

\[ \text{RT in 'reaction'} < \text{RT in 'goNoGo'} < \text{RT in 'discrimination'} \] So we are interested in the \(\delta\)s, so to speak, between ‘reaction’ and ‘goNoGo’ and between ‘discrimination’ and ‘goNoGo’.

Let’s consider only the data relevant for our current purposes:

# select the relevant columns
data_MC_excerpt <- aida::data_MC_cleaned %>% 
  select(RT, block)

# show the first couple of lines
data_MC_excerpt %>% 
  head(5)
## # A tibble: 5 × 2
##      RT block   
##   <dbl> <ord>   
## 1   311 reaction
## 2   269 reaction
## 3   317 reaction
## 4   325 reaction
## 5   240 reaction

Here are the means of the reaction times for different block levels:

data_MC_excerpt %>% 
  group_by(block) %>% 
  summarize(mean_RT = mean(RT))
## # A tibble: 3 × 2
##   block          mean_RT
##   <ord>            <dbl>
## 1 reaction          300.
## 2 goNoGo            427.
## 3 discrimination    488.

And here is a plot of the distribution of measurements in each block:

To fit this model with brms, we need a simple function call with the formula RT ~ block that precisely describes what we are interested in, namely explaining reaction times as a function of the experimental condition:

fit_brms_mc <- brm(
  formula = RT ~ block,
  data = data_MC_excerpt
)

To inspect the posterior fits of this model, we can extract the relevant summary statistics as before:

summary(fit_brms_mc)$fixed[,c("l-95% CI", "Estimate", "u-95% CI")]
##            l-95% CI  Estimate  u-95% CI
## Intercept 400.98876 404.90896 408.87119
## block.L   127.21126 132.80967 138.33805
## block.Q   -34.71398 -27.28713 -19.86181

Notice that there is an intercept term, as before. This corresponds to the mean reaction time of the reference level, which is again set based on alphanumeric ordering, so corresponding to “discrimination”. There are two slope coefficients, one for the difference between the reference level and “goNoGo” and another for the difference between the reference level and the “reaction” condition.

These intercepts are estimated to be credibly negative, suggesting that the “discrimination” condition indeed had the highest mean reaction times. This answers one half of the comparisons we are interested in:

\[ \text{RT in 'reaction'} < \text{RT in 'goNoGo'} < \text{RT in 'discrimination'} \] Unfortunately, it is not directly possible to read off information about the second comparison we care about, namely the comparison between “reaction” and “goNoGo”. And here is where we see the point of contrast coding pop up for the first time. We would like to encode predictor levels ideally in such a way that we can read off (test) the hypotheses we care about directly. In other words, if possible, we would like to have parameters in our model in the form of slope coefficients, which directly encode the \(\delta\)s, so to speak, that we want to test.65

In the case at hand, all we need to do is change the reference level. If the reference level is the “middle category” (as per our ordered hypothesis), the two slopes will express the contrasts we care about. To change the reference level, we only need to make block a factor and order its levels manually, like so:

data_MC_excerpt <- data_MC_excerpt %>% 
  mutate(block_reordered = factor(block, levels = c("goNoGo", "reaction", "discrimination")))

We then run another Bayesian regression model, regressing RT against block_reordered.

fit_brms_mc_reordered <- brm(
  formula = RT ~ block_reordered,
  data = data_MC_excerpt
)

And inspect the summary of the posterior samples for the relevant coefficients:

summary(fit_brms_mc_reordered)$fixed[,c("l-95% CI", "Estimate", "u-95% CI")]
##                    l-95% CI  Estimate  u-95% CI
## Intercept         401.16332 404.89229 408.64559
## block_reordered.L  35.94682  42.79215  49.73385
## block_reordered.Q 122.74315 128.61676 134.46136

Now the “Intercept” corresponds to the new reference level “goNoGo”. And the two slope coefficients give the differences to the other two levels.

Which numeric encoding leads to this result? In formulaic terms, we have three coefficients \(\beta_0, \dots, \beta_2\). The predicted mean value for observation \(i\) is \(\xi_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}\). We assign numeric value \(1\) for predictor \(x_1\) when the observation is from the “reaction” block. We assign numeric value \(1\) for predictor \(x_2\) when the observation is from the “discrimination” block. Schematically, what we now have is:

## # A tibble: 3 × 4
##   block            x_0   x_1   x_2
##   <chr>          <dbl> <dbl> <dbl>
## 1 goNoGo             1     0     0
## 2 reaction           1     1     0
## 3 discrimination     1     0     1

As we may have expected, the 95% inter-quantile range for both slope coefficients (which, given the amount of data we have, is almost surely almost identical to the 95% HDI) does not include 0 by a very wide margin. We could therefore conclude that, based on a Bayesian approach to hypothesis testing in terms of posterior estimation, the reaction times of conditions are credibly different.

The coding of levels in terms of a reference level is called treatment coding, or also dummy coding. The video included at the beginning of this chapter discusses further contrast coding schemes, and also shows in more detail how a coding scheme translates into “directly testable” hypotheses.

Exercise 14.2

Suppose that there are three groups, A, B, and C as levels of your predictor. You want the regression intercept to be the mean of group A. You want the first slope to be the difference between the means of group B and group A. And, you want the second slope to be the difference between the mean of C and B. How do you numerically encode these contrasts in terms of numeric predictor values?

Schematically, like this:

## # A tibble: 3 × 4
##   group   x_0   x_1   x_2
##   <chr> <dbl> <dbl> <dbl>
## 1 A         1     0     0
## 2 B         1     1     0
## 3 C         1     1     1

As group A is a reference category, \(\beta_0\) expresses the mean reaction time of group A. The mean reaction time of group B is \(\beta_0 + \beta_1\), so we need \((x_{i1} =1 , x_{i2} = 0)\) for any \(i\) which is of group B. In the text above, the mean reaction time of group C is given by \(\beta_0 + \beta_2\). However, the value we need now is given by \(\beta_0 + \beta_1 + \beta_2\), so \((x_{i1} =1 , x_{i2} = 1)\).


  1. To be precise, it is possible to also test derived random variables from the posterior samples. So, it is not necessary to encode the contrasts of interests directly. But, most often, in Bayesian analyses it will make sense to put priors on exactly these \(\delta\)s (e.g., skeptical priors biased against a hypothesis to be tested) and for that purpose, it is (almost) practically necessary to have the relevant contrasts expressed as slope coefficients in the model.↩︎