allows us to specify this model inside of an R script in a syntax that looks like we are using regular R functions, even if in fact we are not.
Greta remembers what operation to do and estimates size and shape of the result, which is treated like a placeholder, for which results are calculated later.
This way, we can construct models using greta arrays that represent unknown variables.
# create greta arrays: zeros() ones() greta_array()
# create matrix z
(z <- ones(3, 3))
## greta array (data)
##
## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 1 1 1
## [3,] 1 1 1
# manipulate z
(z2 <- z * 6)
## greta array (operation)
##
## [,1] [,2] [,3]
## [1,] ? ? ?
## [2,] ? ? ?
## [3,] ? ? ?
# explicit declaration: convert other objects to greta arrays
x <- as_data(iris$Petal.Length)
We can let greta arrays follow a probabilty distribution which we can define. This is important e.g. for Bayesian modeling, when you need to e.g. define priors over parameters.
Greta offers various probabilty distributions which you can use on random variables. Each of these distribution functions takes as arguments the distribution’s parameters. They return a variable greta array which follows the specified distribution, e.g.:
normal(mean, sd, dim = NULL, truncation = c(-Inf, Inf))
student(df, mu, sigma, dim = NULL, truncation = c(-Inf, Inf))
p <- normal(0, 1) #mean, sd
It’s also possible to do frequentist inference in greta, using variable(). Creates greta arrays not associated with a probability distribution.
(int <- variable())
We need to specify a comprehensive model, before we make inferences on it. We use model().
m <- model(intercept, slope, sigma)
greta uses Markov chain Monte Carlo (MCMC) methods for samplimg & parameter estimation for greta models.
MCMC = Markov chain Monte Carlo methods. In greta, the default sampler for mcmc() is “Hamiltonian Monte Carlo algorithm”.
MCMC methods randomly sample inside a probabilistic space to approximate the posterior distribution.
HMC algorithm picks a random parameter value to consider. Continues to simulate random values and subjects them to assessment whether they closer approximate the best parameter values, by computing how likely each value is to explain the data. HMC uses the gradients of the joint density (provided by the model) to efficiently explore the set of parameters.
If a randomly generated parameter value is better than the last one, it is added to the chain of parameter values with a certain probability determined by how much better it is.
This information gets stored in a Markov chain as a sequence of porbablistic states.
After convergence, MCMC sampling yields a set of points which are samples from the desired posterior distribution!
Any statistics calculated on the set of samples generated by MCMC sampling is approximately similar to that statistic on the true posterior distribution.
draws <- mcmc(model, sampler = hmc(), n_samples = 1000, #n_samples = samples to draw per chain
warmup = 1000, chains = 4, #warmup = 1000 iterations
initial_values = init) #init_values = optional
# results in 4000 samples
There are other ways to do statistical inference on greta models (stashed_samples() extra_samples() initials() opt()), but we will stick to mcmc().
Lastly, you can always look up greta functions, using help()!
Linear regression is a model for predicting a numerical quantity, mapping, in the case below, one input to a numerical output. The factor that is being predicted (the factor that the equation solves for) is called the dependent variable.
The simple linear regression model is represented like this: y = (β0 +β1*X + Ε)
The example below covers a hypothetical Bayesian model to estimate the parameters of a linear regression between two of the variables in the “iris” dataset (inbuilt).
Example below extracted from greta guide.
# data gets declared as greta objects
x <- as_data(iris$Petal.Length) # IV
y <- as_data(iris$Sepal.Length) # DV
# create variables int, coef & sd and state prior distributions
int <- normal(0, 1) #mean, sd
coef <- normal(0, 3) #mean, sd
sd <- student(3, 0, 1, truncation = c(0, Inf)) #df, mu, sigma
# calculate the predicted sepal lengths: intercept + coefficient(slope) * data(IV)
mean <- int + coef * x
# we can take the model that we need to find as the mean of the normal distribution that we assume y will have
# distribution() function sets up the likelihood function for the model
# distribution takes as argument the variable we are modeling, hence DV
distribution(y) <- normal(mean, sd)
# We can pass greta arrays as arguments to model() to flag them as the parameters we’re
# interested in. When sampling from the model with mcmc() those will be the greta arrays for
# which samples will be returned.
m <- model(int, coef, sd)
# visual representation
plot(m)
# greta model gives us joint density for a candidate set of values
# perform inference with mcmc
draws <- mcmc(m, n_samples = 1000)
summary(draws)
##
## Iterations = 1:1000
## Thinning interval = 1
## Number of chains = 4
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## int 4.2808 0.08017 0.0012676 0.0015631
## coef 0.4143 0.01934 0.0003058 0.0003821
## sd 0.4106 0.02366 0.0003741 0.0004482
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## int 4.1233 4.2264 4.2798 4.3371 4.4363
## coef 0.3777 0.4011 0.4139 0.4269 0.4530
## sd 0.3658 0.3945 0.4097 0.4258 0.4592
#plot the data with regression line
iris %>%
ggplot(
aes(
x = Petal.Length,
y = Sepal.Length)) +
# add line
geom_jitter(alpha = 0.7, color = "darkblue") +
geom_smooth(method = "lm", color = "darkorange")
Find the best fitting parameter values of the model above on your feet!
We may also use optim() to retrieve an MLE for the three parameters of a simple linear regression, therefore find best values for β0 (= intercept), β1 (= slope) and σ (= standard deviation) that maximiste the likelihood function.
optim() is useful for finding the the minimum(!) of a function. If a maximum is needed, such as would be in MLE, we can simply multiply by −1 and again search the minimum with optim(). In this case, the nll function needs to be multiplied by -1.
keep in mind, we can also use optim() on the posterior distribution in the case of Bayesian probabilstic modeling.
Code below copied from web-book 9.5.1., using iris data.
# function for the negative log-likelihood of the given
# data and fixed parameter values
nll = function(y, x, beta_0, beta_1, sd) { #intercept, slope, sd, y = DV = Sepal.Length, x = IV = Petal.Legths
# negative sigma is logically impossible
if (sd <= 0) {return( Inf )}
# calculate predicted values
yPred = beta_0 + beta_1 * x #intercept + slope * independent data
# negative log-likelihood of each data point
nll = -dnorm(y, mean=yPred, sd=sd, log = T)
# sum over all observations
sum(nll)
}
# Here we use optim to minimize our neg log likelihood function for thr three parameters
fit_lh = optim(
# par = initial values for the parameters to be optimized over
par = c(1.5, 0, 0.5),# initial guesses for β0 (= intercept), β1 (= slope), σ (= standard deviation), where the function starts its search
# fn = A function to be minimized (or maximized)
# in this case, Min of the the nll = Max of the ll or likelihood
# Max the ll == max the likelihood also
fn = function(par) {
with(iris,
nll(Sepal.Length, Petal.Length, #y,x
par[1], par[2], par[3]) # β0 (= intercept), β1 (= slope), σ (= standard deviation)
)
}
)
fit_lh$par
## [1] 4.3062376 0.4090193 0.4044848
What do you think happens in every line of the code?
Code below copied from current HW07, using iris data.
# select data to use
Sepal.Length <- as_data(FILL_ME)
log_Petal.Length <- as_data(FILL_ME)
# latent variables and priors
intercept <- student(df= 1, mu = 0, sigma = 10)
slope <- student(df= 1, mu = 0, sigma = 10)
sigma <- normal(0 , 5, truncation = c(0, Inf))
# derived latent variable (linear model)
mean <- FILL_ME
# likelihood
FILL_ME <- normal(mean, sigma)
# finalize model, register which parameters to monitor
m <- model(intercept, slope, sigma)
# Draw samples from the above model
draws <- mcmc(m, n_samples = 2000, chains = 4, warmup = 1000)
summary(draws)
code below copied from web-book section 9.6.2., using coin example.
k <- as_data(67) # k = number of successfully coining "head"/observed data/DV
# we are trying to infer the probability of getting a head(k) on every single coin toss
N <- as_data(127) # n = number of coin tosses/IV
# coin bias & prior (here: uninformative)
# for beta distribution uninformative prior are symmetrical, e.g.: alpha=beta=1
theta <- beta(1,1) #alpha & beta values
# likelihood of k given theta
distribution(k) <- binomial(N, theta) # binomial(size, prob, dim = NULL)
# tell greta which parameters the MCMC sampler should return information about (here:theta).
m <- model(theta)
# take 4 chains of 1000 samples
draws <- mcmc(
model = m,
n_samples = 1000,
warmup = 1000,
chains = 4)
# The resulting draws object is a MCMC.list. We can simply transform it into a representation we prefer
# cast results (type 'mcmc.list') into tidy tibble
tidy_draws = ggmcmc::ggs(draws)
summary(tidy_draws)
## Iteration Chain Parameter value
## Min. : 1.0 Min. :1.00 theta:4000 Min. :0.3774
## 1st Qu.: 250.8 1st Qu.:1.75 1st Qu.:0.4991
## Median : 500.5 Median :2.50 Median :0.5295
## Mean : 500.5 Mean :2.50 Mean :0.5285
## 3rd Qu.: 750.2 3rd Qu.:3.25 3rd Qu.:0.5574
## Max. :1000.0 Max. :4.00 Max. :0.6748