Comparing groups of factor levels with the faintr package

Motivation

The faintr package provides convenience function for the evaluation of a model fit, obtained with the brms package, for a Bayesian regression model for data from a factorial design. If the original model fit used (default) dummy coding of factors, the faintr package allow extraction of many more meaningful comparisons. For example, it is possible to directly compare the difference between cells which are not comparable by dummy coding, and it is also possible to compare means in sets of cells, so as to recover the outcomes of deviance coding.

Installation

Install the faintr package with devtools from GitHub:

devtools::install_github('michael-franke/bayes_mixed_regression_tutorial/faintr', 
                         build_vignettes = TRUE)
library(faintr)

Example

Consider a data set on pitch frequency in the speech of female and male speakers in polite and informal contexts.

library(tidyverse)
politedata = read_csv('https://raw.githubusercontent.com/michael-franke/bayes_mixed_regression_tutorial/master/code/politeness_data.csv') 
head(politedata)
## # A tibble: 6 x 5
##   subject gender sentence context pitch
##   <chr>   <chr>  <chr>    <chr>   <dbl>
## 1 F1      F      S1       pol      213.
## 2 F1      F      S1       inf      204.
## 3 F1      F      S2       pol      285.
## 4 F1      F      S2       inf      260.
## 5 F1      F      S3       pol      204.
## 6 F1      F      S3       inf      287.

The cell means of this data set are:

politedata %>% group_by(gender, context) %>% summarize(mean_pitch = mean(pitch))
## # A tibble: 4 x 3
## # Groups:   gender [2]
##   gender context mean_pitch
##   <chr>  <chr>        <dbl>
## 1 F      inf           261.
## 2 F      pol           233.
## 3 M      inf           144.
## 4 M      pol           133.

A Bayesian regression model for a factorial design with by-subject and by-item random intercepts can be obtained with the brms package as follows:

library(brms)
m_dummy = brm(pitch ~ gender * context + (1 | subject + sentence), politedata)

The brm function uses dummy coding per default. Look at the estimated coefficients:

fixef(m_dummy)
##                      Estimate Est.Error        Q2.5     Q97.5
## Intercept           261.02934 22.093681  217.330291 303.64922
## genderM            -116.70584 29.203414 -177.641515 -57.72146
## contextpol          -27.16302  7.971296  -42.355886 -11.50569
## genderM:contextpol   15.27829 11.352729   -7.025012  37.81053

The reference cell is where gender:F and context:inf, so female speakers in informal contexts. The estimated mean for the cell with data from male speakers in informal contexts is retrievable by adding the estimated coefficient genderM in the output above from the estimated Intercept.

The faintr package provides convenience functions to compare different (groups of) cells to each other, based on a model fit like the above. Although the fit of the regression model uses a particular reference cell for dummy-coding, other contrasts of relevance can be retrieved from the posterior samples. For example, if we want to compare two cell diagonally, say, male speakers in informal contexts against female speakers in polite contexts, we can do this:

compare_groups(
  model = m_dummy, 
  higher = list(gender = "F", context = "pol"),
  lower = list(gender = "M", context = "inf")
)
## Outcome of comparing groups:
##  * higher:  gender:F context:pol 
##  * lower:   gender:M context:inf 
## Mean 'higher - lower':  89.54 
## 95% CI: [ 30.12 ; 149.4 ]
## P('higher - lower' > 0):  0.9962

We can also compare the effect of gender female against the grand mean, to retrieve the information normally obtained by deviance coding:

compare_groups(
  model = m_dummy, 
  higher = list(gender = "F"),
  lower = list()
)
## Outcome of comparing groups:
##  * higher:  gender:F 
##  * lower:   grand mean 
## Mean 'higher - lower':  54.53 
## 95% CI: [ 25.56 ; 84.37 ]
## P('higher - lower' > 0):  0.998

To explore all pairwise comparisons between design cells, try:

extract_posterior_cell_means(m_dummy)$all_cells_compared
##    gender_high context_high        cell_name_high gender_low context_low
## 1            M          inf gender:M__context:inf          F         inf
## 2            F          pol gender:F__context:pol          F         inf
## 3            M          pol gender:M__context:pol          F         inf
## 4            F          inf gender:F__context:inf          M         inf
## 5            F          pol gender:F__context:pol          M         inf
## 6            M          pol gender:M__context:pol          M         inf
## 7            F          inf gender:F__context:inf          F         pol
## 8            M          inf gender:M__context:inf          F         pol
## 9            M          pol gender:M__context:pol          F         pol
## 10           F          inf gender:F__context:inf          M         pol
## 11           M          inf gender:M__context:inf          M         pol
## 12           F          pol gender:F__context:pol          M         pol
##            cell_name_low posterior
## 1  gender:F__context:inf   0.00100
## 2  gender:F__context:inf   0.00050
## 3  gender:F__context:inf   0.00075
## 4  gender:M__context:inf   0.99900
## 5  gender:M__context:inf   0.99625
## 6  gender:M__context:inf   0.06675
## 7  gender:F__context:pol   0.99950
## 8  gender:F__context:pol   0.00375
## 9  gender:F__context:pol   0.00275
## 10 gender:M__context:pol   0.99925
## 11 gender:M__context:pol   0.93325
## 12 gender:M__context:pol   0.99725

We can also extract the estimated means of each cell:

extract_posterior_cell_means(m_dummy)$cell_summary
## # A tibble: 4 x 4
##   cell                  `lower 95% CI`  mean `upper 95% CI`
##   <chr>                          <dbl> <dbl>          <dbl>
## 1 gender:F__context:inf          219.   261.           305.
## 2 gender:F__context:pol          192.   234.           279.
## 3 gender:M__context:inf           98.2  144.           191.
## 4 gender:M__context:pol           87.3  132.           180.