Instructions

Work on your own, this is an individual assignment. Discussing concepts is perfectly fine, but your answers should be your own.
Make sure you have R and RStudio installed. If you are an advanced user and aren’t using RStudio, make sure you have rmarkdown working in order to ‘knit’ the HTML output.
Download this zip file. This contains the R Markdown file you will edit and a csv data file.
Open the .Rmd file in RStudio.
Fill in your name in the ‘author’ heading.
Fill in the required code and answers.
‘Knit’ the document (ctrl/cmd + shift + K in RStudio) to produce a HTML file.
Create a ZIP archive called “BDACM_HW5-LastnameFirstname.zip” containing:
- an RMarkdown file:
  - “BDACM_HW5-LastnameFirstname.Rmd”
- a rendered HTML document:
  - “BDACM_HW5-LastnameFirstname.html”
- the unchanged data file “BDACM_HW5_data.csv”
Upload the ZIP archive through the ‘Tasks’ page on Stud.IP before the deadline. You may upload as many times as you like before the deadline, only your final submission will count.

Grading scheme

Total points: 13, points as indicated and 1 point for correctly following submission instructions (correct naming of the ZIP file, correct contents, etc.)

Required R packages

rmarkdown (comes with RStudio)
tidyverse (or ggplot2, dplyr, purrr, tibble)
TeachingDemos
rstan
ggmcmc
brms
loo
plotrix

As always, load the required packages and change the seed to your last name. Remember to set eval = TRUE.

library(TeachingDemos)
library(tidyverse)
library(rstan)
library(ggmcmc)
library(brms)
library(loo)
library(plotrix)

# set cores to use to the total number of cores (minimally 4)
options(mc.cores = max(parallel::detectCores(), 4))
# save a compiled version of the Stan model file
rstan_options(auto_write = TRUE)

lastname <- "YOURLASTNAME"

char2seed(lastname)

For this homework, you will analyze data from a speech production experiment in which male and female speakers produced utterances in a number of scenarios. Scenarios differed in attitude: either informal or polite. For example, a scenario could be something like a request for salt. An informal request could be “Can I get the salt?” and a formal request would be “Would you mind passing the salt, please?”.

The data gives the mean voice pitch frequency of the utterances. This research is interested in finding whether there are differences in pitch frequency between informal and polite scenarios for either female or male speaker. Concretely, we are interested in three hypotheses:

Do female speakers have a lower average frequency in polite scenarios than in informal scenarios?
Do male speakers have a lower average frequency in polite scenarios than in informal scenarios?
Do male speakers have a lower average frequency in informal scenarios than female speakers in polite scenarios?

1. Read and plot the data

a. Load and clean the data (1 point)

Remember to set eval = TRUE.

#read the data file 
politeness_data <- ..FILL ME..%>% 
  
  # remove an entry with a missing data point for frequency
  filter(..FILL ME..) %>%  
  
  # specify gender, subject, scenario and attitude as factors
  mutate(..FILL ME ..)

#show the tibble
politeness_data

b. Create a summary of the data (1 point)

Create a summary of mean frequency with standard error by gender and attitude. Remember to set eval = TRUE.

politeness_summary <- politeness_data %>% 
  
  # group by gender and attitude
  group_by(..FILL ME..) %>% 
  
  # create a summary of mean freq and standard error
  summarize(mean_frequency = ..FILL ME..,
            standard_error = plotrix::std.error(frequency))

c. Plot the summary (1 point)

Make a bar plot with gender on the x-axis, mean frequency on the y-axis and vary fill colour by attitude. Include error bars for standard error.

Remember to set eval = TRUE.

politeness_summary %>% 
  ggplot(aes(x = ..FILL ME..,
             y = ..FILL ME..,
             fill = ..FILL ME..)) +
  
  # bars for mean frequency
  geom_bar(stat = "identity", position = "dodge") +
  
  # error bars for standard error
  geom_errorbar(aes(..FILL ME..), position = "dodge" )

2. Fit some regression models

a. Specify regression models (3 points)

Specify the formula argument for three regression models.

The first one, model_FE, only has fixed effects. It tries to explain frequency in terms of gender, attitude and their interaction.

The second one, model_intercept_only, is like model_FE but also adds random intercepts for both scenario and subject.

Finally, model_max_RE is like model_FE but also specifies the following random effects structure: by-scenario random intercepts, and random slopes for gender, attitude and their interaction, as well as by-subject random intercepts and random slopes for attitude.

Remember to set eval = TRUE.

model_FE <- brm(formula = ..FILL ME.., 
               data = politeness_data)

model_intercept_only <- brm(formula = ..FILL ME.., 
                          data = politeness_data)

model_max_RE <- brm(formula = ..FILL ME.., 
                  data = politeness_data)

b. Extracting comparisons (3 points)

The following function attempts to calculate the probability of each hypothesis being true, given the model.

Fill in the correct formula for the “male-informal” cell in the design matrix.
Specify the correct names and comparisons for hypothesis 2 and 3
Run the function for each model and interpret the output

Remember to set eval = TRUE.

extract_comparisons <- function(model) {
  
  # get posterior samples
  post_samples <- posterior_samples(model)
  
  # mnemonic names for reconstructed predictor values for all cells in the design matrix
  
  # female-informal
  F_inf <- post_samples$b_Intercept

  # female-polite
  F_pol <- post_samples$b_Intercept + 
    post_samples$b_attitudepol
  
  # male-informal
  M_inf <- ..FILL ME..
  
  # male-polite
  M_pol <- post_samples$b_Intercept + 
    post_samples$b_genderM + 
    post_samples$b_attitudepol + 
    post_samples$`b_genderM:attitudepol`
  
  # create a tibble for recording hypotheses probabilities
  tibble(
    # names of hypotheses
    hypothesis = c(
      # h1: 
      "Female-polite < Female-informal",
      # h2:
      "Male-polite < Male-informal",
      # h3:
      "..FILL ME.."
      ),
         
    # probability of hypotheses being true
    probability = c(
      # h1:
      mean(F_pol < F_inf),
      # h2:
      mean(..FILL ME..),
      # h3
      mean(..FILL ME..)
      )
    )
}

# run the function on each model
# YOUR CODE HERE

Given the output, what would you say about the probability of each hypothesis being true? Which, if any, are likely to be true under all models? Which, if any, depend on the model considered?

ANSWER:

(your answer here)

c. Inspecting random effects (1 point)

Look at the estimates for the coefficients in the random effects structure of the model with the maximal RE structure (get them from summary()).

# YOUR CODE HERE

Which of these random effects coefficients are very, very credibly estimated to be different from zero?

ANSWER:

(your answer here)

3. Model comparison

a. Leave-one-out cross-validation (1 point)

Compare the three models with leave-one-out cross-validation.

Remember to set eval = TRUE.

loo(..FILL ME..,
    reloo = T)

b. Interpretation (1 point)

Which model is best?

Keep in mind that:

lower LOO-IC scores are better
the SE is a standard error of the LOO-IC
we judge any difference between LOO-IC scores to be meaningful if the SE of the difference does not exceed the absolute value of the LOO-IC score of the difference

ANSWER:

(your answer here)

End of assignment

Homework 5 – Generalized linear model

Due: Sunday, March 3 23:59CET

Instructions

Grading scheme

Suggested readings

Required R packages

1. Read and plot the data

a. Load and clean the data (1 point)

b. Create a summary of the data (1 point)

c. Plot the summary (1 point)

2. Fit some regression models

a. Specify regression models (3 points)

b. Extracting comparisons (3 points)

c. Inspecting random effects (1 point)

3. Model comparison

a. Leave-one-out cross-validation (1 point)

b. Interpretation (1 point)