rmarkdown
working in order to ‘knit’ the HTML output.ctrl/cmd
+ shift
+ K
in RStudio) to produce a HTML file.rmarkdown
(comes with RStudio)tidyverse
(or ggplot2
, dplyr
, purrr
, tibble
)TeachingDemos
ggm
maps
rstan
ggmcmc
___As always, load the required packages and change the seed to your last name.
library(TeachingDemos)
library(ggm)
library(maps)
library(tidyverse)
library(rstan)
library(ggmcmc)
# set cores to use to the total number of cores (minimally 4)
options(mc.cores = max(parallel::detectCores(), 4))
# save a compiled version of the Stan model file
rstan_options(auto_write = TRUE)
lastname <- "YOURLASTNAME"
char2seed(lastname)
library(TeachingDemos)
library(ggm)
library(maps)
library(tidyverse)
library(rstan)
library(ggmcmc)
# set cores to use to the total number of cores (minimally 4)
options(mc.cores = max(parallel::detectCores(), 4))
# save a compiled version of the Stan model file
rstan_options(auto_write = TRUE)
lastname <- "bayes"
char2seed(lastname)
Previously during this course, we have looked at a particular instantiation of a \(t\)-test as tools to decide whether to retain or abandon a null hypothesis that the means of two sets of samples are equal. Let’s now use Bayesian parameter inference to address the same kind of question.
Here is some data:
set.seed(2018)
N1 <- 20
N2 <- 20
dataList <- list(y1 = rnorm(N1, mean = 100, sd = 20),
y2 = rnorm(N2, mean = 107, sd = 20),
N1 = N1,
N2 = N2)
Here is a (clumsy!) Stan model for this data. Notice that we estimate the means of both vectors/groups independently and use the generated quantities
block to obtain samples to approximate our posterior beliefs about the difference between these means.
data {
int N1;
int N2;
real y1[N1] ;
real y2[N2] ;
}
parameters {
real mu1 ;
real mu2 ;
real sd1 ;
real sd2 ;
}
model {
mu1 ~ normal(100, 100); // diffuse but roughly accurate priors
mu2 ~ normal(100, 100);
sd1 ~ gamma(1, 100); // sds are assumed to be low a priori
sd2 ~ gamma(1, 100);
y1 ~ normal(mu1,sd1);
y2 ~ normal(mu2,sd2);
}
generated quantities {
real delta;
delta = mu1 - mu2;
}
model_string <- "
paste the model code here
"
fit <- stan(model_code = model_string,
iter = 10000,
data = dataList)
model_string <- "
data {
int N1;
int N2;
real y1[N1] ;
real y2[N2] ;
}
parameters {
real mu1 ;
real mu2 ;
real sd1 ;
real sd2 ;
}
model {
mu1 ~ normal(100, 100); // diffuse but roughly accurate priors
mu2 ~ normal(100, 100);
sd1 ~ gamma(1, 100); // sds are assumed to be low a priori
sd2 ~ gamma(1, 100);
y1 ~ normal(mu1,sd1);
y2 ~ normal(mu2,sd2);
}
generated quantities {
real delta;
delta = mu1 - mu2;
}
"
stan_fit <- stan(model_code = model_string,
iter = 10000,
data = dataList)
## recompiling to avoid crashing R session
summary(stan_fit)
## Length Class Mode
## 1 stanfit S4
ANSWER:
SOLUTION: We look at the R-hat values and see that they are all under 1.1. This is an indicator that the chains converged, so we can believe that the samples are representative of the posterior distribution based on the model.
# your code here
ggplot(ggs(stan_fit), aes(x = value)) + geom_density() + facet_wrap(.~Parameter, scales = "free")
plot(stan_fit, plotfun = "dens")
SOLUTION: The inferred means are close to the ‘true means’, consequently so is delta. The standard deviations are estimated to be perceptible lower because of the rather strong prior assumption that they are small.
delta
. (Hint: we first extract the samples from the stan_fit
object into a tibble, and then use a function from the packages HDInterval
, which you may have to install.)samples <- ggmcmc::ggs(stan_fit)
HDInterval::hdi( ... something with samples and delta ... )
samples <- ggmcmc::ggs(stan_fit)
HDInterval::hdi(filter(samples, Parameter == "delta") %>% select(value))
## value
## lower -12.387110
## upper -7.164576
## attr(,"credMass")
## [1] 0.95
SOLUTION: The value 0 lies way outside the 95% HDI of the posterior of delta, so we are safe to conclude that, given data and model, it is not very plausible that the means are identical.
with(dataList, t.test(y1,y2, paired = FALSE, var.equal = FALSE))
##
## Welch Two Sample t-test
##
## data: y1 and y2
## t = -1.5525, df = 37.525, p-value = 0.1289
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -22.492760 2.971995
## sample estimates:
## mean of x mean of y
## 97.17022 106.93060
How would you interpret this result? Would you reject the null hypothesis (that the means are equal)?
SOLUTION: No, in this case we have no grounds to reject the null hypothesis.
End of assignment