rmarkdown
working in order to ‘knit’ the HTML output.ctrl/cmd
+ shift
+ K
in RStudio) to produce a HTML file.rmarkdown
(comes with RStudio)tidyverse
(or ggplot2
, dplyr
, purrr
, tibble
)TeachingDemos
rstan
ggmcmc
brms
loo
plotrix
As always, load the required packages and change the seed to your last name. Remember to set eval = TRUE
.
library(TeachingDemos)
library(tidyverse)
library(rstan)
library(ggmcmc)
library(brms)
library(loo)
library(plotrix)
# set cores to use to the total number of cores (minimally 4)
options(mc.cores = max(parallel::detectCores(), 4))
# save a compiled version of the Stan model file
rstan_options(auto_write = TRUE)
lastname <- "YOURLASTNAME"
char2seed(lastname)
For this homework, you will analyze data from a speech production experiment in which male and female speakers produced utterances in a number of scenarios. Scenarios differed in attitude: either informal or polite. For example, a scenario could be something like a request for salt. An informal request could be “Can I get the salt?” and a formal request would be “Would you mind passing the salt, please?”.
The data gives the mean voice pitch frequency of the utterances. This research is interested in finding whether there are differences in pitch frequency between informal and polite scenarios for either female or male speaker. Concretely, we are interested in three hypotheses:
Remember to set eval = TRUE
.
#read the data file
politeness_data <- ..FILL ME..%>%
# remove an entry with a missing data point for frequency
filter(..FILL ME..) %>%
# specify gender, subject, scenario and attitude as factors
mutate(..FILL ME ..)
#show the tibble
politeness_data
Create a summary of mean frequency with standard error by gender and attitude. Remember to set eval = TRUE
.
politeness_summary <- politeness_data %>%
# group by gender and attitude
group_by(..FILL ME..) %>%
# create a summary of mean freq and standard error
summarize(mean_frequency = ..FILL ME..,
standard_error = plotrix::std.error(frequency))
Make a bar plot with gender on the x-axis, mean frequency on the y-axis and vary fill colour by attitude. Include error bars for standard error.
Remember to set eval = TRUE
.
politeness_summary %>%
ggplot(aes(x = ..FILL ME..,
y = ..FILL ME..,
fill = ..FILL ME..)) +
# bars for mean frequency
geom_bar(stat = "identity", position = "dodge") +
# error bars for standard error
geom_errorbar(aes(..FILL ME..), position = "dodge" )
Specify the formula
argument for three regression models.
The first one, model_FE
, only has fixed effects. It tries to explain frequency
in terms of gender
, attitude
and their interaction.
The second one, model_intercept_only
, is like model_FE
but also adds random intercepts for both scenario
and subject
.
Finally, model_max_RE
is like model_FE
but also specifies the following random effects structure: by-scenario random intercepts, and random slopes for gender
, attitude
and their interaction, as well as by-subject random intercepts and random slopes for attitude
.
Remember to set eval = TRUE
.
model_FE <- brm(formula = ..FILL ME..,
data = politeness_data)
model_intercept_only <- brm(formula = ..FILL ME..,
data = politeness_data)
model_max_RE <- brm(formula = ..FILL ME..,
data = politeness_data)
The following function attempts to calculate the probability of each hypothesis being true, given the model.
Fill in the correct formula for the “male-informal” cell in the design matrix.
Specify the correct names and comparisons for hypothesis 2 and 3
Run the function for each model and interpret the output
Remember to set eval = TRUE
.
extract_comparisons <- function(model) {
# get posterior samples
post_samples <- posterior_samples(model)
# mnemonic names for reconstructed predictor values for all cells in the design matrix
# female-informal
F_inf <- post_samples$b_Intercept
# female-polite
F_pol <- post_samples$b_Intercept +
post_samples$b_attitudepol
# male-informal
M_inf <- ..FILL ME..
# male-polite
M_pol <- post_samples$b_Intercept +
post_samples$b_genderM +
post_samples$b_attitudepol +
post_samples$`b_genderM:attitudepol`
# create a tibble for recording hypotheses probabilities
tibble(
# names of hypotheses
hypothesis = c(
# h1:
"Female-polite < Female-informal",
# h2:
"Male-polite < Male-informal",
# h3:
"..FILL ME.."
),
# probability of hypotheses being true
probability = c(
# h1:
mean(F_pol < F_inf),
# h2:
mean(..FILL ME..),
# h3
mean(..FILL ME..)
)
)
}
# run the function on each model
# YOUR CODE HERE
Given the output, what would you say about the probability of each hypothesis being true? Which, if any, are likely to be true under all models? Which, if any, depend on the model considered?
ANSWER:
(your answer here)
Look at the estimates for the coefficients in the random effects structure of the model with the maximal RE structure (get them from summary()
).
# YOUR CODE HERE
Which of these random effects coefficients are very, very credibly estimated to be different from zero?
ANSWER:
(your answer here)
Compare the three models with leave-one-out cross-validation.
Remember to set eval = TRUE
.
loo(..FILL ME..,
reloo = T)
Which model is best?
Keep in mind that:
ANSWER:
(your answer here)
End of assignment