Instructions

Grading scheme

Suggested readings

Required R packages

As always, load the required packages and change the seed to your last name.

library(TeachingDemos)
library(ggm)
library(maps)
library(tidyverse)
library(rstan)
library(ggmcmc)

# set cores to use to the total number of cores (minimally 4)
options(mc.cores = max(parallel::detectCores(), 4)) 
# save a compiled version of the Stan model file
rstan_options(auto_write = TRUE)

lastname <- "YOURLASTNAME"

char2seed(lastname)

2. Inferring a difference between means

Previously during this course, we have looked at a particular instantiation of a \(t\)-test as tools to decide whether to retain or abandon a null hypothesis that the means of two sets of samples are equal. Let’s now use Bayesian parameter inference to address the same kind of question.

Here is some data:

set.seed(2018)
N1 <- 20
N2 <- 20
dataList <- list(y1 = rnorm(N1, mean = 100, sd = 20), 
                 y2 = rnorm(N2, mean = 107, sd = 20),
                 N1 = N1,
                 N2 = N2)

Here is a (clumsy!) Stan model for this data. Notice that we estimate the means of both vectors/groups independently and use the generated quantities block to obtain samples to approximate our posterior beliefs about the difference between these means.


data {
  int N1;
  int N2;
  real y1[N1] ;
  real y2[N2] ;
}
parameters {
  real mu1 ;
  real mu2 ;
  real sd1 ;
  real sd2 ;
} 
model {
  mu1 ~ normal(100, 100); // diffuse but roughly accurate priors
  mu2 ~ normal(100, 100); 
  sd1 ~ gamma(1, 100); // sds are assumed to be low a priori
  sd2 ~ gamma(1, 100); 
  y1 ~ normal(mu1,sd1);
  y2 ~ normal(mu2,sd2);
}
generated quantities {
  real delta;
  delta = mu1 - mu2;
}

a. Run this model by completing the following code and produce a summary of the fit.

model_string <-  "

paste the model code here

"

fit <- stan(model_code = model_string,
           iter = 10000,
           data = dataList)

b. Based on which information in this summary would you conclude that it is plausible to assume that the samples are representative of the target (posterior) distribution?

ANSWER:

(your answer here)

c. Produce a plot of the posterior density of all model parameters.

# your code here

d. Comment on each of the parameters whether the inferred values are reasonable given the data (of which we know the distributions that generated them) and the model.

(your answer here)

e. Calculate a 95% HDI for the posterior of delta. (Hint: we first extract the samples from the stan_fit object into a tibble, and then use a function from the packages HDInterval, which you may have to install.)

samples <- ggmcmc::ggs(stan_fit)
HDInterval::hdi( ... something with samples and delta ... )

f. Interpret the result from the HDI computation. If the model is true, should we conclude from the data that there is likely a difference in means?

(your answer here)

g. Here is the output of the corresponding \(t\)-test in a frequentist paradigm.

with(dataList, t.test(y1,y2, paired = FALSE, var.equal = FALSE))
## 
##  Welch Two Sample t-test
## 
## data:  y1 and y2
## t = -1.5525, df = 37.525, p-value = 0.1289
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -22.492760   2.971995
## sample estimates:
## mean of x mean of y 
##  97.17022 106.93060

How would you interpret this result? Would you reject the null hypothesis (that the means are equal)?

(your answer here)


End of assignment