Instructions

Grading scheme

Suggested readings

Required R packages


As always, load the required packages and change the seed to your last name. Remember to set eval = TRUE.

library(TeachingDemos)
library(tidyverse)
library(rstan)
library(ggmcmc)
library(brms)
library(loo)
library(plotrix)

# set cores to use to the total number of cores (minimally 4)
options(mc.cores = max(parallel::detectCores(), 4))
# save a compiled version of the Stan model file
rstan_options(auto_write = TRUE)

lastname <- "YOURLASTNAME"

char2seed(lastname)
library(TeachingDemos)
library(tidyverse)
library(rstan)
library(ggmcmc)
library(brms)
library(loo)
library(plotrix)

# set cores to use to the total number of cores (minimally 4)
options(mc.cores = max(parallel::detectCores(), 4))
# save a compiled version of the Stan model file
rstan_options(auto_write = TRUE)

lastname <- "bayes"

char2seed(lastname)

For this homework, you will analyze data from a speech production experiment in which male and female speakers produced utterances in a number of scenarios. Scenarios differed in attitude: either informal or polite. For example, a scenario could be something like a request for salt. An informal request could be “Can I get the salt?” and a formal request would be “Would you mind passing the salt, please?”.

The data gives the mean voice pitch frequency of the utterances. This research is interested in finding whether there are differences in pitch frequency between informal and polite scenarios for either female or male speaker. Concretely, we are interested in three hypotheses:

  1. Do female speakers have a lower average frequency in polite scenarios than in informal scenarios?
  2. Do male speakers have a lower average frequency in polite scenarios than in informal scenarios?
  3. Do male speakers have a lower average frequency in informal scenarios than female speakers in polite scenarios?

1. Read and plot the data

a. Load and clean the data (1 point)

Remember to set eval = TRUE.

#read the data file 
politeness_data <- ..FILL ME..%>% 
  
  # remove an entry with a missing data point for frequency
  filter(..FILL ME..) %>%  
  
  # specify gender, subject, scenario and attitude as factors
  mutate(..FILL ME ..)

#show the tibble
politeness_data
politeness_data <- read_csv('data/politeness_data.csv') %>% 
  # remove an entry with a missing data point
  filter(!is.na(frequency)) %>%  
  # specify factors
  mutate(scenario = as.factor(scenario),
         attitude = as.factor(attitude),
         subject = as.factor(subject),
         gender = as.factor(gender))
## Parsed with column specification:
## cols(
##   subject = col_character(),
##   gender = col_character(),
##   scenario = col_double(),
##   attitude = col_character(),
##   frequency = col_double()
## )
politeness_data
## # A tibble: 83 x 5
##    subject gender scenario attitude frequency
##    <fct>   <fct>  <fct>    <fct>        <dbl>
##  1 F1      F      1        pol           213.
##  2 F1      F      1        inf           204.
##  3 F1      F      2        pol           285.
##  4 F1      F      2        inf           260.
##  5 F1      F      3        pol           204.
##  6 F1      F      3        inf           287.
##  7 F1      F      4        pol           251.
##  8 F1      F      4        inf           277.
##  9 F1      F      5        pol           232.
## 10 F1      F      5        inf           252.
## # … with 73 more rows

b. Create a summary of the data (1 point)

Create a summary of mean frequency with standard error by gender and attitude. Remember to set eval = TRUE.

politeness_summary <- politeness_data %>% 
  
  # group by gender and attitude
  group_by(..FILL ME..) %>% 
  
  # create a summary of mean freq and standard error
  summarize(mean_frequency = ..FILL ME..,
            standard_error = plotrix::std.error(frequency))
politeness_summary <- politeness_data %>% 
  group_by(gender, attitude) %>% 
  summarize(mean_frequency = mean(frequency),
            standard_error = plotrix::std.error(frequency))

c. Plot the summary (1 point)

Make a bar plot with gender on the x-axis, mean frequency on the y-axis and vary fill colour by attitude. Include error bars for standard error.

Remember to set eval = TRUE.

politeness_summary %>% 
  ggplot(aes(x = ..FILL ME..,
             y = ..FILL ME..,
             fill = ..FILL ME..)) +
  
  # bars for mean frequency
  geom_bar(stat = "identity", position = "dodge") +
  
  # error bars for standard error
  geom_errorbar(aes(..FILL ME..), position = "dodge" )
politeness_summary %>% 
  ggplot(aes(x = gender,
              y = mean_frequency,
              fill = attitude)) + 
  geom_bar(stat = "identity", position = "dodge") +
    geom_errorbar(aes(ymin = mean_frequency - standard_error,
                      ymax = mean_frequency + standard_error),position = "dodge")

2. Fit some regression models

a. Specify regression models (3 points)

Specify the formula argument for three regression models.

The first one, model_FE, only has fixed effects. It tries to explain frequency in terms of gender, attitude and their interaction.

The second one, model_intercept_only, is like model_FE but also adds random intercepts for both scenario and subject.

Finally, model_max_RE is like model_FE but also specifies the following random effects structure: by-scenario random intercepts, and random slopes for gender, attitude and their interaction, as well as by-subject random intercepts and random slopes for attitude.

Remember to set eval = TRUE.

model_FE <- brm(formula = ..FILL ME.., 
               data = politeness_data)

model_intercept_only <- brm(formula = ..FILL ME.., 
                          data = politeness_data)

model_max_RE <- brm(formula = ..FILL ME.., 
                  data = politeness_data)
model_FE <- brm(formula = frequency ~ gender * attitude, 
                data = politeness_data)
## Compiling the C++ model
## Start sampling
model_intercept_only <- brm(formula = frequency ~ gender * attitude +
                              (1 | scenario + subject), 
                            data = politeness_data)
## Compiling the C++ model
## Start sampling
model_max_RE <- brm(formula = frequency ~ gender * attitude +
                      (1 + gender * attitude | scenario) +
                      (1 + attitude | subject), 
                    data = politeness_data)
## Compiling the C++ model
## Start sampling
## Warning: There were 10 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems

b. Extracting comparisons (3 points)

The following function attempts to calculate the probability of each hypothesis being true, given the model.

  1. Fill in the correct formula for the “male-informal” cell in the design matrix.

  2. Specify the correct names and comparisons for hypothesis 2 and 3

  3. Run the function for each model and interpret the output

Remember to set eval = TRUE.

extract_comparisons <- function(model) {
  
  # get posterior samples
  post_samples <- posterior_samples(model)
  
  # mnemonic names for reconstructed predictor values for all cells in the design matrix
  
  # female-informal
  F_inf <- post_samples$b_Intercept

  # female-polite
  F_pol <- post_samples$b_Intercept + 
    post_samples$b_attitudepol
  
  # male-informal
  M_inf <- ..FILL ME..
  
  # male-polite
  M_pol <- post_samples$b_Intercept + 
    post_samples$b_genderM + 
    post_samples$b_attitudepol + 
    post_samples$`b_genderM:attitudepol`
  
  # create a tibble for recording hypotheses probabilities
  tibble(
    # names of hypotheses
    hypothesis = c(
      # h1: 
      "Female-polite < Female-informal",
      # h2:
      "Male-polite < Male-informal",
      # h3:
      "..FILL ME.."
      ),
         
    # probability of hypotheses being true
    probability = c(
      # h1:
      mean(F_pol < F_inf),
      # h2:
      mean(..FILL ME..),
      # h3
      mean(..FILL ME..)
      )
    )
}

# run the function on each model
# YOUR CODE HERE
extract_comparisons <- function(model) {
  
  # get posterior samples
  post_samples <- posterior_samples(model)
  
  # mnemonic names for reconstructed predictor values for all cells in the design matrix
  # female informal
  F_inf <- post_samples$b_Intercept
  # female polite
  F_pol <- post_samples$b_Intercept + 
    post_samples$b_attitudepol
  # male informal
  M_inf <- post_samples$b_Intercept + 
    post_samples$b_genderM
  # male polite
  M_pol <- post_samples$b_Intercept + 
    post_samples$b_genderM + 
    post_samples$b_attitudepol + 
    post_samples$`b_genderM:attitudepol`
  
  # create a tibble of for recording hypotheses
  tibble(hypothesis = c("Female-polite < Female-informal", 
                        "Male-polite < Male-informal",
                        "Male-informal < Female-polite"),
         
         probability = c(mean(F_pol < F_inf),
                         mean(M_pol < M_inf),
                         mean(M_inf < F_pol)
         ))
}

# run the function on each model
extract_comparisons(model_FE)
## # A tibble: 3 x 2
##   hypothesis                      probability
##   <chr>                                 <dbl>
## 1 Female-polite < Female-informal       0.992
## 2 Male-polite < Male-informal           0.846
## 3 Male-informal < Female-polite         1
extract_comparisons(model_intercept_only)
## # A tibble: 3 x 2
##   hypothesis                      probability
##   <chr>                                 <dbl>
## 1 Female-polite < Female-informal       1    
## 2 Male-polite < Male-informal           0.931
## 3 Male-informal < Female-polite         0.992
extract_comparisons(model_max_RE)
## # A tibble: 3 x 2
##   hypothesis                      probability
##   <chr>                                 <dbl>
## 1 Female-polite < Female-informal       0.977
## 2 Male-polite < Male-informal           0.814
## 3 Male-informal < Female-polite         0.985

Given the output, what would you say about the probability of each hypothesis being true? Which, if any, are likely to be true under all models? Which, if any, depend on the model considered?

ANSWER:

SOLUTION:

Under all models the posterior probability that hypotheses 1 and 3 are true is very high (bigger than .95). There is less posterior credence that hypothesis 2 is true on all accounts.

c. Inspecting random effects (1 point)

Look at the estimates for the coefficients in the random effects structure of the model with the maximal RE structure (get them from summary()).

# YOUR CODE HERE
summary(model_max_RE)
## Warning: There were 10 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help.
## See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: frequency ~ gender * attitude + (1 + gender * attitude | scenario) + (1 + attitude | subject) 
##    Data: politeness_data (Number of observations: 83) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~scenario (Number of levels: 7) 
##                                      Estimate Est.Error l-95% CI u-95% CI
## sd(Intercept)                           21.98     10.67     7.21    48.65
## sd(genderM)                             10.55      8.53     0.40    31.50
## sd(attitudepol)                         14.99     11.35     0.60    42.62
## sd(genderM:attitudepol)                 16.85     13.00     0.78    49.68
## cor(Intercept,genderM)                  -0.23      0.44    -0.90     0.71
## cor(Intercept,attitudepol)               0.02      0.42    -0.76     0.80
## cor(genderM,attitudepol)                -0.06      0.44    -0.83     0.77
## cor(Intercept,genderM:attitudepol)      -0.09      0.43    -0.83     0.72
## cor(genderM,genderM:attitudepol)        -0.05      0.44    -0.82     0.78
## cor(attitudepol,genderM:attitudepol)    -0.14      0.45    -0.88     0.73
##                                      Eff.Sample Rhat
## sd(Intercept)                              1633 1.00
## sd(genderM)                                2156 1.00
## sd(attitudepol)                            1434 1.00
## sd(genderM:attitudepol)                    1882 1.00
## cor(Intercept,genderM)                     3746 1.00
## cor(Intercept,attitudepol)                 3927 1.00
## cor(genderM,attitudepol)                   2497 1.00
## cor(Intercept,genderM:attitudepol)         4114 1.00
## cor(genderM,genderM:attitudepol)           3061 1.00
## cor(attitudepol,genderM:attitudepol)       3060 1.00
## 
## ~subject (Number of levels: 6) 
##                            Estimate Est.Error l-95% CI u-95% CI Eff.Sample
## sd(Intercept)                 36.49     17.96    14.99    85.94       1476
## sd(attitudepol)                9.41      9.34     0.32    34.78       1716
## cor(Intercept,attitudepol)     0.03      0.57    -0.94     0.94       4221
##                            Rhat
## sd(Intercept)              1.00
## sd(attitudepol)            1.00
## cor(Intercept,attitudepol) 1.00
## 
## Population-Level Effects: 
##                     Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept             259.64     25.24   207.02   308.67       1168 1.00
## genderM              -113.74     33.87  -177.99   -40.64       1115 1.00
## attitudepol           -27.30     12.92   -52.64    -1.34       2171 1.00
## genderM:attitudepol    15.84     17.58   -19.26    51.38       2153 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma    25.01      2.35    20.88    30.08       2713 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Which of these random effects coefficients are very, very credibly estimated to be different from zero?

ANSWER:

SOLUTION:

None except the by-subject and by-scenario intercepts. The lower 95% CIs for the standard deviations are very close to zero.

3. Model comparison

a. Leave-one-out cross-validation (1 point)

Compare the three models with leave-one-out cross-validation.

Remember to set eval = TRUE.

loo(..FILL ME..,
    reloo = T)
loo(model_FE, model_intercept_only, model_max_RE,
    reloo = T)
## No problematic observations found. Returning the original 'loo' object.
## No problematic observations found. Returning the original 'loo' object.
## 1 problematic observation(s) found.
## The model will be refit 1 times.
## 
## Fitting model 1 out of 1 (leaving out observation 31)
## Start sampling
##                                      LOOIC    SE
## model_FE                            834.91 11.75
## model_intercept_only                788.76 17.06
## model_max_RE                        796.28 17.90
## model_FE - model_intercept_only      46.15 11.24
## model_FE - model_max_RE              38.63 12.24
## model_intercept_only - model_max_RE  -7.52  3.77

b. Interpretation (1 point)

Which model is best?

Keep in mind that:

  • lower LOO-IC scores are better
  • the SE is a standard error of the LOO-IC
  • we judge any difference between LOO-IC scores to be meaningful if the SE of the difference does not exceed the absolute value of the LOO-IC score of the difference

ANSWER:

SOLUTION:

As all the differences are greater than their standard errors, they are all meaningful. The intercept only model is the best as it has the lowest LOO-IC.


End of assignment