Instructions

Grading scheme

Suggested readings

Required R packages

As always, load the required packages and change the seed to your last name.

library(TeachingDemos)
library(ggm)
library(maps)
library(tidyverse)
library(rstan)
library(ggmcmc)

# set cores to use to the total number of cores (minimally 4)
options(mc.cores = max(parallel::detectCores(), 4)) 
# save a compiled version of the Stan model file
rstan_options(auto_write = TRUE)

lastname <- "YOURLASTNAME"

char2seed(lastname)

3. Playing with samples from the prior predictive distribution

We say in class that we can (ab-)use Stan to obtain samples from the prior predictive distribution. Here is a wild script which completely sidesteps the model block (the variable theta_dummy is only included because otherwise Stan would complain about an empty model block). The code generates (feedforward) samples in the generated quantities block. The code is trying but failing to do something sensible, in fact something that we have already computed during the course.

data {
  int<lower=0> N ;
  int<lower=0, upper = N> k ;
}
parameters {
  real<lower=0,upper=1> theta_dummy;
} 
model {
  theta_dummy ~ beta(1,1);
}
generated quantities {
  real<lower=0,upper=1> theta_prior;
  int<lower=0> prior_pred_k;
  real<upper=1> llh_rep;
  real<upper=1> llh_hypo;
  int<lower=0,upper=1> at_least_as_extreme;
  theta_prior = 0.5;
  prior_pred_k = binomial_rng(N, theta_prior);
  llh_rep = binomial_lpmf(prior_pred_k | N, theta_prior);
  llh_hypo = binomial_lpmf(k | N, theta_prior);
  at_least_as_extreme = llh_rep >= llh_hypo ? 1 : 0;
}

a. What is this code trying to compute?

ANSWER:

(your answer here)

b. There is a typo somewhere in here. It’s just one symbol that needs to be fixed. Fix it in the code box below.

data {
  int<lower=0> N ;
  int<lower=0, upper = N> k ;
}
parameters {
  real<lower=0,upper=1> theta_dummy;
} 
model {
  theta_dummy ~ beta(1,1);
}
generated quantities {
  real<lower=0,upper=1> theta_prior;
  int<lower=0> prior_pred_k;
  real<upper=1> llh_rep;
  real<upper=1> llh_hypo;
  int<lower=0,upper=1> at_least_as_extreme;
  theta_prior = 0.5;
  prior_pred_k = binomial_rng(N, theta_prior);
  llh_rep = binomial_lpmf(prior_pred_k | N, theta_prior);
  llh_hypo = binomial_lpmf(k | N, theta_prior);
  at_least_as_extreme = llh_rep >= llh_hypo ? 1 : 0;
}

c. Use the following R code to run this Stan model (copy paste the code in as a string). Is the output what you would have expected?

model_string = '''

paste the code here after fixing the typo

'''

dataList <- tibble(N = 24, k = 7)
stan_fit <- stan(model_code = model_string, 
                data = dataList,
                iter = 50000)

# get posterior probability of flag `at_least_as_extreme` to be 1
ggmcmc::ggs(stan_fit) %>% filter(Parameter %in% c("at_least_as_extreme")) %>% 
  group_by(Parameter) %>% summarise(value_of_interest = mean(value)) %>% show()

ANSWER:

(your answer here)

d. The previous Stan code computed an interesting value for a model that fixed the parameter theta at level 0.5. Change the Stan model code to compute the same kind of output value, but for a model which assumes that theta is sampled from a uniform distribution over the unit interval. (Hint: you only need to modify one line very slightly (the resulting code is not pretty or the best code to communicate the intend, but it suffices).)

model_string <- '''

paste the code here after fixing the typo and adapting 
it to a Bayesian model with maximal uncertainty about theta

'''

dataList <- tibble(N = 24, k = 7)
stan_fit <- stan(model_code = model_string, 
                data = dataList,
                iter = 50000)

# get posterior probability of flag `at_least_as_extreme` to be 1
ggmcmc::ggs(stan_fit) %>% filter(Parameter %in% c("at_least_as_extreme")) %>% 
  group_by(Parameter) %>% summarise(value_of_interest = mean(value)) %>% show()

e. How would you interpret/describe the number you have just computed?

ANSWER:

(your answer here)


End of assignment