Here is code to load (and if necessary, install) required packages, and to set some global options (for plotting and efficient fitting of Bayesian models).
Code
# install packages from CRAN (unless installed)pckgs_needed <-c("tidyverse","brms","remotes","tidybayes")pckgs_installed <-installed.packages()[,"Package"]pckgs_2_install <- pckgs_needed[!(pckgs_needed %in% pckgs_installed)]if(length(pckgs_2_install)) {install.packages(pckgs_2_install)} # install additional packages from GitHub (unless installed)if (!"aida"%in% pckgs_installed) { remotes::install_github("michael-franke/aida-package")}if (!"faintr"%in% pckgs_installed) { remotes::install_github("michael-franke/faintr")}if (!"cspplot"%in% pckgs_installed) { remotes::install_github("CogSciPrag/cspplot")}# load the required packagesx <-lapply(pckgs_needed, library, character.only =TRUE)library(aida)library(faintr)library(cspplot)# these options help Stan run fasteroptions(mc.cores = parallel::detectCores())# use the CSP-theme for plottingtheme_set(theme_csp())# global color scheme from CSPproject_colors = cspplot::list_colors()[c(1,3,4,5,2,6:14),"hex", drop =TRUE]# setting theme colors globallyscale_colour_discrete <-function(...) {scale_colour_manual(..., values = project_colors)}scale_fill_discrete <-function(...) {scale_fill_manual(..., values = project_colors)}
Generalized (non-)linear mixed effect models are a powerful statistical tool that gains increasing popularity for data analysis in cognitive science and many other disciplines. This tutorial will provide an overview of different categorical variable coding schemes used in mixed effect models. We will look at two example data sets from factorial-design experiments with categorical predictors and a continuous dependent variable which we will analyze using a Bayesian approach.
Introduction
Whereas models involving continuous predictors like height, weight or reaction time might be interpreted rather intuitively, models involving (very common!) categorical variables like gender or experimental conditions (did the participant read text1 or text2?) are often confusing. However, it is important to understand how to deal with such categorical variables mathematically because different decisions we make about the mathematical procedures influence the interpretation of the regression results.
This hands-on tutorial should provide an overview of different categorical variable coding schemes and walk you through applying and interpreting the output of a regression model fitted to variables coded in those different schemes. This tutorial presupposes a basic familiarity with R, linear regression and Bayesian data analysis with the brms R package (Buerkner, 2016). However, all the details of Bayesian analysis need not be understood. Avoiding too many mathematical details, the focus of this tutorial is rather conceptual - to convey in intuitive terms how categorical variables are translated to mathematical terms and how differences in this ‘translation’ influence the interpretation of a regression model. These conceptual ideas are also not bound to Bayesian analysis and are directly applicable in frequentist modeling tools (e.g. lmer package in R).
Most importantly, we should remember: any statistics of the data are just a tool for us to be able to draw conclusions about our research questions. The numbers that R (or any other application) computes don’t tell us anything - it’s our understanding of those tools and the interpretation of the numbers that provides answers to scientific questions!
First, we will shortly recap how linear regression models look like. Then, we will roughly familiarize ourselves with the data sets. Then, step by step we will look at different coding schemes: how they are applied in R and what a fitted regression model looks like.
Estimated reading time: 1.5h
Categorical Variables Coding Schemes
Why do we need to code categorical variables like gender or experimental conditions? ‘We don’t have to code continuous variables like reaction time!’, you might say. The reason is that categorical variables (also called factors) are usually represented in a data set as strings, e.g. the factor gender could contain the levels (i.e. distinct categories) female and male. But we cannot represent different strings in a regression model. So we need to numerically indicate that a predictor has different levels - by (implicitly) recoding the levels of a variable to numbers, e.g. a 1 for each occurrence of female and -1 for each occurrence of male in our data set. How we choose these numbers is the focus of this tutorial.
In common terms, by recoding the levels numerically we indicate the contrasts between the levels. Generally, we need N-1 numeric variables to represent the contrasts of a categorical variable with N levels.
To get a better understanding of what this means, we will dive right into an example.
Preparing the tools
We will use the following R packages in this tutorial. If you are missing some of them, you can install them by running e.g. install.packages('tidyverse'). NB: we will fit the models using brm - this might take some time to compile!
The first part of this tutorial is based on a data set from an experiment by Winter and Grawunder (2012) You can get the dataset by running:
politeness_df <- faintr::politeness# get a look at the data sethead(politeness_df)
# A tibble: 6 × 5
subject gender sentence context pitch
<chr> <chr> <chr> <chr> <dbl>
1 F1 F S1 pol 213.
2 F1 F S1 inf 204.
3 F1 F S2 pol 285.
4 F1 F S2 inf 260.
5 F1 F S3 pol 204.
6 F1 F S3 inf 287.
The data contains records of the voice pitch of speakers in different social contexts (polite and informal). They investigated whether the mean voice pitch differs across the factorgender of the speakers (F and M) and across the factorcontexts - resulting in four different condition combinations (gender X context). Such a design is called factorial design and the single combinations are called design cells.
Explore Data visually
Before we dive into any statistical analyses of our dataset it is helpful to get a rough idea of what the data looks like. For example, we can start by exploring the dataset visually.
Furthermore, we can compute some basic statistics - e.g. the mean of the different design cells, before we turn to more complex linear models. We can also compute the overall mean pitch across all the conditions - the grand mean. These values will be helpful for a sanity check when interpreting the linear models later on.
# A tibble: 4 × 3
# Groups: context [2]
context gender mean
<chr> <chr> <dbl>
1 inf F 261.
2 inf M 144.
3 pol F 233.
4 pol M 133.
mean(tibble_means$mean)
[1] 192.8605
Regression Models
While this tutorial presupposes some basic understanding of regression models, let us recap what exactly we compute when fitting a regression model. As usual for factorial designs, the regression model works under the assumption that observations in each cell are sampled from a normal distribution with a certain mean - and each design cell has an own mean. Research questions about data from factorial designs are usually about the estimated means of these design cells, e.g.: is this design cell’s mean different from zero, or from that design cell’s mean?
Let us stick to our example data. We want to estimate the mean pitch (\(y\)) depending on gender (\(x_1\)), context (\(x_2\)) and the interaction of the two factors (\(x_1x_2\)). So what we estimate is the regression function described by
\(\beta_0\) is the intercept and \(\beta_1, \beta_2\) and \(\beta_3\) are the slopes that are estimated when fitting the model; \(x_1\) and \(x_2\) represent the categorical variables and are exactly what is ‘coded’ when applying different variable coding schemes. For simplicity we will omit the error term \(\epsilon\) below. Will will come back to this regression function when interpreting the regression output given respective coding schemes.
Coding Categorical Variables
In our example, gender and context are categorical variables, so they need to be coded numerically. When categorical variables are coded for regression modeling, they are internally set to factors. Let’s do this manually here for explicitness.
politeness_df <- politeness_df %>%mutate(gender =factor(gender), # you could rename or reorder the levels herecontext =factor(context))# checkhead(politeness_df)
# A tibble: 6 × 5
subject gender sentence context pitch
<chr> <fct> <chr> <fct> <dbl>
1 F1 F S1 pol 213.
2 F1 F S1 inf 204.
3 F1 F S2 pol 285.
4 F1 F S2 inf 260.
5 F1 F S3 pol 204.
6 F1 F S3 inf 287.
Dummy (Treatment) Coding
Dummy coding, or treatment coding, is the default coding scheme used by R. Understanding the name ‘treatment coding’ helps understanding what this coding scheme does: imagine a medical experiment with a single control group (who obtain a placebo) and different experimental groups each of which gets a different treatment (e.g., different drugs), and where we want to compare each treatment group to the single, pivotal control group. Consequently, dummy coded variables are estimated by comparing all levels of the variable to a reference level. The intercept of a linear model containing dummy-coded variables is the mean of the reference level.
Our variables only have two levels, so the effect of gender could be estimated by treating female as the reference level and estimating the effect of being male compared to the reference level – so basically estimating the difference in pitch it takes to “get from female to male”. Similarly, we can estimate the effect of context: the informal context can be treated as the reference level and the effect of politeness can be estimated against it. By default, the first level of a factor is treated as the reference level (for unordered factors that is the first string in alphanumeric order) - but principally, there is no difference as to which level should be used as the reference level. It makes sense to choose the level which is in some sense the ‘control’ in your experimental design.
Because R uses dummy-coding by default, we can look at the default numerical coding right away. The function contrasts() displays the contrast matrix for the respective variable:
contrasts(politeness_df$gender)
M
F 0
M 1
contrasts(politeness_df$context)
pol
inf 0
pol 1
But if we wish to explicitly assign a dummy (treatment) coding to a variable, we may do so by a built-in R function:
contrasts(politeness_df$gender) <-contr.treatment(2) # insert the number of levels here# checkcontrasts(politeness_df$gender)
2
F 0
M 1
So both variables \(x_1\) and \(x_2\) can take either the value 0 or 1 (because we dummy-code both categorical variables; see below for more). We already defined the referenc levels of the single variables, now we can define the overall reference level of our model (by combining the two individual reference levels) – it is the mean pitch of female speakers in informal contexts.
Having set all the basics, we can now turn to computing a linear model of the mean pitch as predicted by the factors gender and context:
# here, we only use fixed effectslm.dummy.FE <-brm( pitch ~ gender * context,data = politeness_df,cores =4,iter =1000)lm.dummy.FE.coefs <-fixef(lm.dummy.FE)[,1] %>%as.numeric() # get the estimated coefficientssummary(lm.dummy.FE)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: pitch ~ gender * context
Data: politeness_df (Number of observations: 83)
Draws: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
total post-warmup draws = 2000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 260.60 8.26 243.90 275.93 1.00 1106 1406
gender2 -115.76 11.62 -138.42 -92.27 1.00 822 1084
contextpol -27.05 11.77 -49.81 -2.97 1.00 1041 1188
gender2:contextpol 15.39 16.54 -18.71 47.66 1.00 832 843
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 36.18 2.89 31.12 42.18 1.00 2375 1461
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Now how do we interpret the estimated coefficients?
Let us recall the regression equation that is hidden behind this output: \[y = \beta_0 + \beta_1*x_1 + \beta_2*x_2 + \beta_3*x_1x_2\]
In order to help us interpret the output, R assigns string names to the estimated coefficients using the names we used in the generic formula. The (Intercept) corresponds to \(\beta_0\), genderM corresponds to \(\beta_1\), contextpol corresponds to \(\beta_2\) and genderM:contextpol (the interaction term) to \(\beta_3\).
Further, let us recall the numerical coding of our variables: for \(x_1\) (gender) a 0 means female, a 1 means male; for \(x_2\) (context) a 0 means informal, a 1 means polite. So the computed values are the estimates for conditions differing from the respective reference conditions - i.e. when the respective \(x\) is a 1.
To get an estimate of a certain design cell (\(y_i\)) - let’s start with the mean pitch of female speakers (0 for \(x_1\)) in informal contexts (0 for \(x_2\)) - we just insert the corresponding numeric values for the corresponding \(x\) and the estimated value for the corresponding \(\beta\). Thus we get:
Hence, the mean pitch of female speakers in informal context corresponds to the intercept. As a sanity check, we can recall that for dummy coded variables the model intercept is just the mean of the reference cell (in our case, female speakers in informal contexts!).
Let’s now calculate the mean pitch of male speakers (1 for \(x_1\)) in informal contexts (0 for \(x_2\)):
Another common coding scheme is the simple coding (also called contrast coding). Simple coded variables are also compared to a reference level (just like dummy-coded ones). However, the intercept of a simple coded model is the grand mean – the mean of all cells (i.e. the mean of female-informal & female-polite & male-informal & male-polite cells).
Generally, this kind of coding can be created by subtracting \(1/k\) from the dummy coding contrast matrix, where \(k\) is the number of levels a variable has (in our case, both have two). Hence, the reference level will always only have negative values in the contrast matrix. The general rule is that the contrasts within a column have to add up to 0. R does not provide a built-in function for simple coding, but we can easily create the respective matrix ourselves by subtracting \(1/k\) (i.e. 1/2) from the dummy coding matrix:
# manual creation of contrastscontr.matrix <-matrix( rep(0.5, 2))dummy.matrix <-contr.treatment(2)contr.coding <- dummy.matrix - contr.matrix# we should duplicate the values to not overwrite previous contrastspoliteness_df <- politeness_df %>%mutate(context_contr = context,gender_contr = gender)contrasts(politeness_df$context_contr) <- contr.codingcontrasts(politeness_df$gender_contr) <- contr.coding
Hence now the gender is coded as -0.5 for female and 0.5 for male; context is coded as -0.5 for informal and 0.5 for polite.
Let’s again look at our regression model:
lm.contr.FE <-brm( pitch ~ gender_contr * context_contr,data = politeness_df,cores =4,iter =1000)lm.contr.FE.coefs <-fixef(lm.contr.FE)[,1] %>%as.numeric() # get vector of estimated coefficientssummary(lm.contr.FE)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: pitch ~ gender_contr * context_contr
Data: politeness_df (Number of observations: 83)
Draws: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
total post-warmup draws = 2000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept 192.83 4.04 184.91 200.43 1.00 2433
gender_contr2 -108.34 7.95 -123.08 -92.79 1.00 2346
context_contr2 -19.73 7.65 -33.97 -4.72 1.00 3264
gender_contr2:context_contr2 15.85 15.77 -15.16 47.94 1.00 2310
Tail_ESS
Intercept 1285
gender_contr2 1535
context_contr2 1653
gender_contr2:context_contr2 1666
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 36.21 3.00 30.92 42.56 1.00 2337 1577
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
In order to compute the mean pitch of a specific cell, we proceed just as with dummy-coded variables and insert the respective estimates and values for \(x\). Let us start with female speakers (\(x_1\) is -0.5) in informal contexts (\(x_2\) is -0.5):
We get the same result as before (as we should - the estimates should not depend on a coding scheme but only on the data). As a sanity check, we can again look at the intercept – it matches the grand mean we computed in the beginning of this tutorial – as it should.
Your turn! Compute the pitch means for the other three conditions.
# your code here
Deviation (Sum) Coding
Deviation coding (also called sum coding) is the most popular coding scheme and is often considered the best choice to get a clear picture of presence (or absence) of an effect and a clear random effects interpretation.
It is slightly different from the previous schemes. It compares the mean of the predicted variable for a specific condition to the grand mean. So the estimates do not tell you the difference between the reference level and another level anymore. The intercept of linear models with sum coded variables is the grand mean.
R has a built-in function for creating sum coded variables:
# again create a new variablepoliteness_df %>%mutate(context_dev = context,gender_dev = gender) -> politeness_dfcontrasts(politeness_df$context_dev) <-contr.sum(2) # insert number of levelscontrasts(politeness_df$gender_dev) <-contr.sum(2)
Now the gender is coded as s 1 for female and -1 for male; context is coded as 1 for informal and -1 for polite.
Below we fit a model with the sum-coded variables:
Since the intercept is now the grand mean and not a specific reference level, let us think about the interpretation of the single estimates. The estimate of e.g. the context effect now denotes the value by which the mean pitch in informal (estimate * 1, remember our coding!) or polite contexts (estimate * -1) differs from the grand mean. So if we wish to calculate the mean pitch in polite contexts (across genders), we would do:
This means that the single estimates are in some sense ‘independent’ of each other (in contrast to e.g. dummy-coded variables where the estimates are bound to the reference levels of two variables) and give us insight if a specific factor is credibly different from 0. Similarly, if we wish to calculate the mean pitch of male speakers, we would calculate:
yM = lm.dev.FE.coefs[1] + lm.dev.FE.coefs[3] * (-1)yM
[1] 138.7232
Helmert Coding
In this coding scheme, a level of a variable is compared to its subsequent levels. In our dataset, e.g. for gender the level female is compared to the subsequent level male.
Generally, to create such a coding, in order to compare the first level to the subsequent levels you would assign \((k-1)/k\) to the first level and \(-1/k\) to all subsequent levels where \(k\) is the total number of levels. To compare the second level to the subsequent levels you would assign 0 to the first level, \((i-1)/i\) to the second level and \(-i/1\) to all subsequent levels where \(i = k-1\) and so on. The difference of this coding scheme to previous ones is more clear for variales with >2 levels (see below). The intercept of a linear model corresponds to the grand mean.
R does not have a built-in function for standard Helmert coding, so we do it manually:
Family: gaussian
Links: mu = identity; sigma = identity
Formula: pitch ~ context_helm * gender_helm
Data: politeness_df (Number of observations: 83)
Draws: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
total post-warmup draws = 2000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept 193.01 3.95 185.03 200.73 1.00 2453
context_helm1 19.30 8.03 3.82 35.68 1.00 2830
gender_helm1 108.45 7.84 93.06 123.73 1.00 2252
context_helm1:gender_helm1 16.34 16.76 -16.66 49.85 1.00 2303
Tail_ESS
Intercept 1683
context_helm1 1691
gender_helm1 1572
context_helm1:gender_helm1 1501
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 36.26 2.99 30.84 42.68 1.00 2262 1430
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The contrast estimate for the first level and the remaining levels is calculated by taking the mean of the dependent variable for the first level and subtracting the mean of the dependent variable for the remaining levels (in our case, just the mean of the second level). In other words, if we look at the context coefficient it denotes the difference between the mean of the polite and informal context means.
Mixing Coding Schemes
If you have several categorical predictor variables, it is also possible (and often useful!) to use different coding schemes for the different variables. It might, for example, make sense to use dummy coding for a variable which has a control and a treatment condition, and to use e.g. simple coding for a variable which has two ‘equal’ levels.
For example, we could use dummy-coding for context and simple-coding for gender.
When you mix coding schemes or define your own schemes there might be no pre-defined answers to questions as to what the sigle coefficients or the intercept mean. But knowing how the different schemes work, you can easily find this out!
Let us explore how the interpretation of the model changes if we mix coding schemes:
Family: gaussian
Links: mu = identity; sigma = identity
Formula: pitch ~ context * gender_contr
Data: politeness_df (Number of observations: 83)
Draws: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
total post-warmup draws = 2000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept 202.51 5.48 192.32 213.72 1.00 2463
contextpol -19.42 7.96 -34.89 -3.92 1.00 1949
gender_contr2 -116.24 11.12 -138.49 -95.16 1.00 1970
contextpol:gender_contr2 16.23 16.07 -14.57 47.91 1.00 1831
Tail_ESS
Intercept 1436
contextpol 1295
gender_contr2 1679
contextpol:gender_contr2 1619
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 36.18 2.96 30.95 42.45 1.00 2394 1532
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Generally, the interpretation is just the combination of what we have learned about the individual coding schemes. Recall that the intercept of a dummy-coded model is the mean of the reference level – since we dummy-coded context, the refernce level would be informal context. But it is not the intercept yet! We have the second predictor in our model – the simple coded gender. In simple coded models the intercept is the mean across the levels of the variable. Now the intercept of our model with the two different predictors is the mean pitch in informal contexts - across genders.
Following this logic, the context estimate denotes the difference between the informal and polite contexts - still across genders. The gender estimate denoted the difference between the mean pitch and female speakers if multiplied by the value -0.5 (recall our coding above); and the mean pitch and male speakers if multiplied by the value 0.5.
Variables with three levels
In order to see the differences yielded by the different coding schemes more clearly, it is usefut to look at an example datas et containing predictors with three levels. But don’t worry, all the concepts we have learned so far work just the same way - independently of the number of levels a predictor has.
The data set latinsquare is provided in the package languageR which you need to install in order to access it.
# alternative dataset with a three-level variablerequire(languageR)latinsquare_df <-as_tibble(latinsquare)head(latinsquare_df)
# A tibble: 6 × 6
Group Subject Word RT SOA List
<fct> <fct> <fct> <int> <fct> <fct>
1 G1 S1 W1 532 short L1
2 G1 S2 W1 542 short L1
3 G1 S3 W1 615 short L1
4 G1 S4 W1 547 short L1
5 G2 S5 W9 553 short L3
6 G2 S6 W9 464 short L3
In this example data set, the reaction time (RT) of subjects in a lexical decision experiment was simulated. The predictors are, among others, the stimulus-onset asynchrony (SOA) (with the levels long, medium, short) and different lists of prime-target pairs of words (L1, L2, L3) that are presented to the participants. Without going into the details of the experimental paradign, let’s imagine we would like to predict the participants’ mean reaction time given the list type, the SOA type and their interaction.
Again, it is helpful to start by trying to get a rough idea of what the data looks like.
# A tibble: 9 × 3
# Groups: List [3]
List SOA mean
<fct> <fct> <dbl>
1 L1 long 530.
2 L1 medium 537.
3 L1 short 544.
4 L2 long 543.
5 L2 medium 556.
6 L2 short 534.
7 L3 long 529.
8 L3 medium 515.
9 L3 short 522.
mean(tibble_baseStats$mean)
[1] 534.5139
Now we can turn to modelling!
Dummy (Treatment) Coding
Generally, the coding schemes work in the very same way independently of the number of levels. The only difference to our old data set is that now we need two numeric variables coding the contrasts between the levels of one variable. If we look at the default (dummy) coding of e.g. the variable List we see a contrast matrix with two columns, each denoting the comparisons between two levels:
contrasts(latinsquare$SOA)
medium short
long 0 0
medium 1 0
short 0 1
contrasts(latinsquare_df$List)
L2 L3
L1 0 0
L2 1 0
L3 0 1
So now the recoding of the categorical variable takes two numeric variables: e.g. \(x_{1_2}\) and \(x_{1_3}\), where both can take the values 0 or 1; the single levels are denoted by the combination of the two numeric variables. Again there is a reference level - List 1 - described by \(x_{1_2}\) and \(x_{1_3}\) being 0. \(x_{1_2}\) being 1 describes the difference between the reference level and List 2; \(x_{1_3}\) being 1 describes the difference between the reference level and List 3. Correspondingly, there is and individual \(\beta\) for each numeric variable estimated in the regression model. The coding of the SOA factor works just the same way. The interactions between specific levels are described by combining the respective numeric variables \(x\). So the model we are fitting is described by:
Family: gaussian
Links: mu = identity; sigma = identity
Formula: RT ~ List * SOA
Data: latinsquare_df (Number of observations: 144)
Draws: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
total post-warmup draws = 2000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 529.61 11.36 507.89 552.76 1.01 789 1202
ListL2 13.14 16.44 -19.40 44.81 1.01 874 953
ListL3 -0.40 16.35 -33.13 30.09 1.01 903 1185
SOAmedium 6.87 16.41 -25.60 37.40 1.01 818 1075
SOAshort 13.64 16.40 -18.45 44.95 1.01 850 1101
ListL2:SOAmedium 6.65 23.22 -37.61 53.57 1.01 924 784
ListL3:SOAmedium -20.96 22.80 -66.02 21.79 1.01 872 1184
ListL2:SOAshort -21.61 23.23 -68.84 20.75 1.01 840 1214
ListL3:SOAshort -20.46 22.89 -64.46 24.99 1.00 1017 1421
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 46.52 2.84 41.40 52.54 1.00 1876 1404
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Since both predictors are dummy-coded the intercept represents the reference level - the mean RT for List 1 and a long SOA. Following the same procedure as for the two-level variables you could calculate the estimated mean RTs for specific conditions.
Simple (Contrast) Coding
Simple coding only slightly differs from dummy-coding – the intercept of the model is the grand mean, not the mean RT of the reference level. Otherwise, the coefficients still denote the difference between the reference level and other specific levels.
Family: gaussian
Links: mu = identity; sigma = identity
Formula: RT ~ List_contr * SOA_contr
Data: latinsquare_df (Number of observations: 144)
Draws: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
total post-warmup draws = 2000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept 534.46 3.72 527.04 541.65 1.01 2125
List_contr2 7.81 9.36 -10.87 26.23 1.00 1743
List_contr3 -14.75 9.22 -33.54 2.55 1.00 1683
SOA_contr2 2.01 9.39 -16.54 20.58 1.00 1754
SOA_contr3 -0.63 9.58 -18.85 18.39 1.00 1809
List_contr2:SOA_contr2 5.51 22.32 -37.79 49.93 1.00 1364
List_contr3:SOA_contr2 -21.79 23.57 -68.16 23.81 1.00 1630
List_contr2:SOA_contr3 -22.88 22.85 -68.18 22.88 1.00 1580
List_contr3:SOA_contr3 -21.87 23.85 -68.44 24.95 1.00 1486
Tail_ESS
Intercept 1225
List_contr2 1629
List_contr3 1289
SOA_contr2 1366
SOA_contr3 1688
List_contr2:SOA_contr2 1459
List_contr3:SOA_contr2 1598
List_contr2:SOA_contr3 1527
List_contr3:SOA_contr3 1371
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 46.42 2.97 41.19 52.35 1.00 2278 1362
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Deviation (Sum) Coding
With increasing number of levels within the factors the complexity and messiness of interpreting the differences between levels against each other increases considerably. Hence it might make a lot of sense to use the deviation coding scheme which provides estimates of effects comapred to the grand mean. We again can use the R built-in function to assign deviation coding to our three-level factors:
latinsquare_df %>%mutate(List_dev = List,SOA_dev = SOA) -> latinsquare_dfcontrasts(latinsquare_df$List_dev) <-contr.sum(3) # insert number of levelscontrasts(latinsquare_df$SOA_dev) <-contr.sum(3)
For e.g. SOA our numeric variables now denote the effect of long SOA compared to the grand mean when \(x_{2_2}\) is a 1 and \(x_{2_3}\) is a 0; they denote the effect of medium SOA compared to the grand mean when \(x_{2_2}\) is a 0 and \(x_{2_3}\) is a 1; the effect of short SOA is never compared to the grand mean since it is always assigned a -1.
Family: gaussian
Links: mu = identity; sigma = identity
Formula: RT ~ List_helm * SOA_helm
Data: latinsquare_df (Number of observations: 144)
Draws: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
total post-warmup draws = 2000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept 534.52 3.75 527.02 541.90 1.00 3724
List_helm1 3.50 7.85 -12.25 18.85 1.00 4213
List_helm2 22.61 9.62 3.19 41.61 1.00 3414
SOA_helm1 -0.66 8.14 -16.34 15.47 1.00 4152
SOA_helm2 2.78 9.27 -15.41 20.26 1.00 4254
List_helm1:SOA_helm1 -14.19 17.40 -48.50 19.01 1.00 4911
List_helm2:SOA_helm1 -12.63 20.07 -51.14 26.18 1.00 4658
List_helm1:SOA_helm2 -14.35 20.56 -55.10 24.37 1.00 3622
List_helm2:SOA_helm2 28.63 23.36 -17.57 74.72 1.00 4086
Tail_ESS
Intercept 1609
List_helm1 1593
List_helm2 1336
SOA_helm1 1632
SOA_helm2 1308
List_helm1:SOA_helm1 1480
List_helm2:SOA_helm1 1617
List_helm1:SOA_helm2 1466
List_helm2:SOA_helm2 1456
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 46.50 2.82 41.36 52.28 1.01 3328 1396
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The main effect estimates denote the differences between the mean of List1 and the mean of (List2 + List3); and between the mean of List2 and the mean of List3. Respectively, they denote the differences between the mean of SOA long and the mean of (medium + short); and between the mean of SOA medium and the mean of short. The intercept is the grand mean.
Reverse Helmert Coding
The reverse Helmert coding scheme (also called difference coding) is quite similar to the Helmert coding, but compares the mean of a level to its previous levels. Since we basically reverse the coding we used in the previous scheme, we also ‘reverse’ the contrast matrix to create such a coding.
Family: gaussian
Links: mu = identity; sigma = identity
Formula: RT ~ List_rhelm * SOA_rhelm
Data: latinsquare_df (Number of observations: 144)
Draws: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
total post-warmup draws = 2000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept 534.67 3.77 527.30 541.88 1.00 3705
List_rhelm1 8.03 9.15 -10.11 25.75 1.00 4267
List_rhelm2 -18.54 8.14 -34.37 -1.97 1.00 3516
SOA_rhelm1 2.10 9.42 -16.31 20.16 1.00 4381
SOA_rhelm2 -1.78 7.72 -17.62 13.25 1.00 3787
List_rhelm1:SOA_rhelm1 6.07 23.22 -39.41 51.21 1.01 3217
List_rhelm2:SOA_rhelm1 -23.75 20.15 -64.52 15.70 1.00 3619
List_rhelm1:SOA_rhelm2 -25.73 20.31 -64.91 11.85 1.01 3683
List_rhelm2:SOA_rhelm2 2.53 17.56 -31.29 36.84 1.00 3847
Tail_ESS
Intercept 1476
List_rhelm1 1542
List_rhelm2 1332
SOA_rhelm1 1575
SOA_rhelm2 1651
List_rhelm1:SOA_rhelm1 1394
List_rhelm2:SOA_rhelm1 1344
List_rhelm1:SOA_rhelm2 1502
List_rhelm2:SOA_rhelm2 1597
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 46.50 2.85 41.29 52.60 1.01 3326 1376
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
In our example, the first estimate denotes the difference between the mean of List2 and the mean of List1; the second - the difference between the mean of List3 and the mean of (List1 + List2). The main effects of SOA can be interpreted similarly.
Mixed Schemes: Dummy and Deviation Coding
Just like with two-level factors, we might wish to use different coding schemes for different predictors. It might make sense to use dummy coding for a variable which has a control and two different treatment conditions, and to use deviation coding for a variable which has ‘equal’ levels.
For example, we could use dummy-coding for List and deviation-coding for SOA.
Family: gaussian
Links: mu = identity; sigma = identity
Formula: RT ~ List * SOA_dev
Data: latinsquare_df (Number of observations: 144)
Draws: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
total post-warmup draws = 2000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 536.89 6.69 523.92 549.97 1.00 2364 1659
ListL2 8.06 9.20 -10.53 25.72 1.00 2178 1350
ListL3 -14.52 9.50 -33.73 4.04 1.00 2621 1655
SOA_dev1 -7.01 9.67 -26.33 12.12 1.00 1129 1160
SOA_dev2 -0.24 9.60 -19.11 18.17 1.00 1035 1114
ListL2:SOA_dev1 5.30 13.54 -21.12 30.72 1.00 1214 1291
ListL3:SOA_dev1 13.81 13.89 -13.61 41.10 1.00 1343 1402
ListL2:SOA_dev2 12.14 13.64 -14.74 38.96 1.00 1081 1388
ListL3:SOA_dev2 -7.00 13.42 -33.44 19.08 1.00 1316 1391
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 46.44 2.89 40.98 52.46 1.00 2364 1476
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Towards the end of this tutorial, the main take-away is this: when you look at the estimates of (any) model, you could ask yourself a couple of questions like these to make sure you understand what was calculated:
What does the intercept represent?
What do the single estimates mean?
What do they tell me about my hypotheses?
Of course, you will also encounter experimental designs which use a two-level and a three-level categorical predictors – but the conceptual basics regarding how to choose the contrasts and how to interpret linear models are the same.
Conclusion
This tutorial provides an outline of most common coding schemes, but there are also others out there. However, having a basic understanding of the procedures described above, you are perfectly prepared to deal with a broad spectrum of experimental data in psychology and other disciplines. There are heuristics as to which coding schemes to choose; but most importantly, you should choose statistical tools that are the best fit for your specific research question. It is often helpful to write up the results of a pilot model to see if the interpretation of the results makes clear statements about your hypothesis.
References
P. Buerckner, 2016. “brms: An R package for Bayesian multilevel models using Stan”. In Journal of Statistical Software 80.1, pp. 1–28.
J. G. W. Raaijmakers et al., 1999. How to Deal with “The Language-as-Fixed-Effect Fallacy: Common Misconceptions and Alternative Solutions. In Journal of Memory and Language
B. Winter, S. Grauwunder, 2012. “The Phonetic Profile of Korean Formality”. In Journal of Phonetics 40, pp. 808–815.