Instructions
echo = F
(so as not to have it show up in your output):knitr::opts_chunk$set(
warning = FALSE, # supress warnings per default
message = FALSE, # supress messages per default
cache = TRUE # caches results of computation unless the code changes
)
tidyverse
and greta
.library(tidyverse)
library(greta)
Fix a model \(M\) with a single parameter \(\theta\) and an uninformative prior over some finite interval \([a;b]\), so that \(\theta\) is uniformly distributed with \(\theta \sim \mathrm{Uniform}(a,b)\). Show that the frequentist MLE is identical to the Bayesian MAP.
Possible answer:
Maximum Likelihood Estimation (MLE): We consider only the likelihood.
\[\theta_{MLE}=\arg\max_{\theta} P(X|\theta)=\arg\max_{\theta}\sum_i\log P(x_i|\theta)\]
Maximum A Posteriori (MAP): In the Bayesian context we consider the likelihood and a prior belief.
\[\theta_{MAP}=\arg\max_{\theta} P(X|\theta) P(\theta)=\arg\max_{\theta}\sum_i\log P(x_i|\theta)+\log P(\theta)\]
The only difference between both equations is the weightning of the likelihood through the prior in \(\theta_{MAP}\).
Using as noninformative prior the uniform prior, i.e., \(\theta \sim Uniform(a,b)\), means that all possible values of \(\theta\) are equally weighted by some constant. But as we maximize both equations a constant canceled out, thus:
\[\begin{align} \theta_{MAP}&=\arg\max_{\theta}\sum_i\log P(x_i|\theta)+\log P(\theta)\\ &= \arg\max_{\theta}\sum_i\log P(x_i|\theta)+ const.\\ &= \arg\max_{\theta}\sum_i\log P(x_i|\theta)\\ &= \theta_{MLE}. \end{align}\]
greta
(40 points)In this exercise we will test whether there is plausibly a negative effect of (the log of) total_amount_sold
on average_price
in the avocado data set. The model used is shown in the figure below, albeit that we will add specific priors for a Bayesian analysis.
knitr::include_graphics("visuals/linear-regression-model.png")
We will use MLEs and Bayesian estimation. To achive the latter we will implemented the model in greta
and we will look at the posterior estimates of the slope parameter \(\beta_1\), comparing it against the null value of \(\beta_1=0\) (representing no linear relationship).
Load the data into a variable avocado_data
and preprocess it, using the code from the course notes (e.g., in Section 5.2.1).
avocado_data <- read_csv(url('https://raw.githubusercontent.com/michael-franke/intro-data-analysis/master/data_sets/avocado.csv')) %>%
# remove currently irrelevant columns
select( -X1 , - contains("Bags"), - year, - region) %>%
# rename variables of interest for convenience
rename(
total_volume_sold = `Total Volume`,
average_price = `AveragePrice`,
small = '4046',
medium = '4225',
large = '4770'
)
Make a scatterplot of the logarithm of total_amount_sold
on the \(x\)-axis and average_price
on the \(y\)-axis. Add a linear regression line (without any grouping), using function geom_smooth
. Your output could look roughly like this:
avocado_data %>%
ggplot(
aes(
x = log(total_volume_sold),
y = average_price
)
) +
geom_point(alpha = 0.3, color = "gray") +
geom_smooth(method = "lm", color = "firebrick")
Find the best fitting parameter values of the model, i.e., the best values for \(\beta_0\) (= intercept), \(\beta_1\) (= slope) and \(\sigma\) (= standard deviation), for the model’s likelihood function. Use the optim
function for this purpose. The course notes have very relevant code in Section 9.5.1, but you need to make some adjustments to that code. Here, we want to regress average_price
on the logarithm of total_amount_sold
. Finish your calculation with a clear statement or code output that identifies the estimated values to each parameter. A (wrong!) example of how your answer should look like it given below.
# function for the negative log-likelihood of the given
# data and fixed parameter values
nll = function(y, x, beta_0, beta_1, sd) {
# negative sigma is logically impossible
if (sd <= 0) {return( Inf )}
# predicted values
yPred = beta_0 + x * beta_1
# negative log-likelihood of each data point
nll = -dnorm(y, mean=yPred, sd=sd, log = T)
# sum over all observations
sum(nll)
}
fit_lh = optim(
# initial parameter values
par = c(1.5, 0, 0.5),
# function to optimize
fn = function(par) {
with(avocado_data,
nll(average_price, log(total_volume_sold), # add log here
par[1], par[2], par[3])
)
}
)
fit_lh$par
## [1] 2.5650516 -0.1024267 0.3270426
Use the previous plot from above, where you added a regression line with geom_smooth
. But now add a second regression line (in a different color) as the final layer of the plot. Use geom_abline
, which takes as argument an aesthetic with aes(slope = ..., intercept = ...)
and supply the MLEs for slope and intercept. (Hint: After doing this, you should, as was promised earlier in the course, understand exactly what geom_smooth
is doing under its default method lm
(= linear model).)
avocado_data %>%
ggplot(
aes(
x = log(total_volume_sold),
y = average_price
)
) +
geom_point(alpha = 0.3, color = "gray") +
geom_smooth(method = "lm", color = "firebrick") +
geom_abline(aes(slope = -0.1024267, intercept = 2.5650516), color = "darkorange")
greta
(6 points)Complete the code below to obtain an implementation in greta
.
# select data to use
price <- as_data(avocado_data$average_price)
log_sold <- as_data(avocado_data$total_volume_sold %>% log)
# latent variables and priors
intercept <- student(df= 1, mu = 0, sigma = 10)
slope <- student(df= 1, mu = 0, sigma = 10)
sigma <- normal(0 , 5, truncation = c(0, Inf))
# derived latent variable (linear model)
mean <- intercept + slope * log_sold
# likelihood
distribution(price) <- normal(mean, sigma)
# finalize model, register which parameters to monitor
m <- model(intercept, slope, sigma)
Draw 2000 samples from each of 4 chains, after a warmup of 1000 using greta::mcmc
.
draws <- greta::mcmc(m, n_samples = 2000, chains = 4, warmup = 1000)
draws_ex2 <- draws
saveRDS(draws_ex2, 'draws_ex2.rds')
# draws <- readRDS('draws_ex2.rds')
Get a tidy version of your samples using the function ggmcmc::ggs
and produce the Bayesian point- and interval-based estimates (95% credible intervals a.k.a. HDIs) for all parameters. Interpret the results with respect to our research question of whether the log amount sold has a linear effect on the average price.
tidy_draws = ggmcmc::ggs(draws)
Bayes_estimates <- tidy_draws %>%
group_by(Parameter) %>%
summarise(mean = mean(value),
'|95%' = HDInterval::hdi(value)[1],
'95|%' = HDInterval::hdi(value)[2])
Bayes_estimates
## # A tibble: 3 x 4
## Parameter mean `|95%` `95|%`
## <fct> <dbl> <dbl> <dbl>
## 1 intercept 2.56 2.54 2.59
## 2 sigma 0.327 0.324 0.330
## 3 slope -0.102 -0.104 -0.100
# we see that the 95% HDI for the slope does not include 0; we conclude that a negative linear effect is plausible
greta
(6 points)Rerun the greta
model code above, but use improper priors, i.e., do not specify any prior for the parameters, just declaring that these are variables like so
# latent variables and improper priors
intercept <- variable()
slope <- variable()
sigma <- variable(lower = 0)
Define the rest of the model (and the data) as before. Instead of calling greta::mcmcm
, call the function greta::opt
for model m
. Inspect and explain the output. What are the values returned from this function? Have you seen them before?
# select data to use
price <- as_data(avocado_data$average_price)
log_sold <- as_data(avocado_data$total_volume_sold %>% log)
# latent variables and priors
intercept <- variable()
slope <- variable()
sigma <- variable(lower = 0)
# derived latent variable (linear model)
mean <- intercept + slope * log_sold
# likelihood
distribution(price) <- normal(mean, sigma)
# finalize model, register which parameters to monitor
m <- model(intercept, slope, sigma)
# optimize
MLEs_greta <- greta::opt(m)
MLEs_greta
## $par
## $par$intercept
## [1] 2.565098
##
## $par$slope
## [1] -0.1024293
##
## $par$sigma
## [1] 0.3270453
##
##
## $value
## [1] 5497.596
##
## $iterations
## [1] 17
##
## $convergence
## [1] 0
opt_ex2 <- MLEs_greta
saveRDS(opt_ex2, 'opt_ex2.rds')
# MLEs_greta <- readRDS('opt_ex2.rds')
Let’s find out whether the King of France data gives any reason to believe that sentences like “The King of France is bald.” (Condition 0) tend to be judged differently from sentences of the form “France has a King, and he is bald.” (Condition 1). Remember that this question is theoretically interesting because the first sentence presupposes tha there is a King of France, while the second asserts this information explcitly. Does falsly pressuposing cause different truth-value judgements than falsely asserting?
Load the cleaned KoF data and extract the relevant counts: the number \(k_0\) of “true” responses out of \(N_0\) total observations for Condition 0, and the same for Condition 1.
data_KoF_cleaned <- read_csv(url('https://raw.githubusercontent.com/michael-franke/intro-data-analysis/master/data_sets/king-of-france_data_cleaned.csv'))
count_data <- data_KoF_cleaned %>%
filter(condition %in% c("Condition 0", "Condition 1")) %>%
dplyr::count(condition, response)
k0 <- as_data(7)
N0 <- as_data(75)
k1 <- as_data(3)
N1 <- as_data(79)
greta
model to compare latent biases (10 points)Adapt the T-Test model given in the course notes to apply to this case. Instead of two means, represent two biases \(\theta_0\) and \(\theta_1\). Use the version of the T-Test model which does not define a prior on the difference. Rather, use the following priors: \(\theta_0, \theta_1 \sim \text{Beta}(1,1)\). Calculate \(\delta = \theta_0 - \theta_1\) inside the greta
model.
# priors
theta_0 <- beta(1,1)
theta_1 <- beta(1,1)
# derived prameters
delta <- theta_0 - theta_1
# likelihood
distribution(k0) <- binomial(N0, theta_0)
distribution(k1) <- binomial(N1, theta_1)
# model
m <- model(delta)
Take 2000 samples after a warmup of 1000, from 4 chains. Look at Bayesian 95% credible intervals for the parameter \(\delta\) and interpret the result with respect to the research question of interest (see above).
# sample
draws <- greta::mcmc(m, warmup = 1000, n_samples = 2000)
draws_ex3 <- draws
saveRDS(draws_ex3, 'draws_ex3.rds')
# draws <- readRDS('draws_ex3.rds')
# tidy up
tidy_draws = ggmcmc::ggs(draws)
# summarize
Bayes_estimates <- tidy_draws %>%
group_by(Parameter) %>%
summarise(
'|95%' = HDInterval::hdi(value)[1],
mean = mean(value),
'95|%' = HDInterval::hdi(value)[2]
)
Bayes_estimates
## # A tibble: 1 x 4
## Parameter `|95%` mean `95|%`
## <fct> <dbl> <dbl> <dbl>
## 1 delta -0.0225 0.0541 0.143
# delta is not credibly different from zero, so we we conclude that, given model and data, it is not plausible to assume that answer rates are different between conditions