GRETA

GRETA SYNTAX

Greta remembers what operation to do and estimates size and shape of the result, which is treated like a placeholder, for which results are calculated later.

This way, we can construct models using greta arrays that represent unknown variables.

# create greta arrays: zeros() ones() greta_array()

# create matrix z
(z <- ones(3, 3))
## greta array (data)
## 
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1    1    1
## [3,]    1    1    1
# manipulate z
(z2 <- z * 6)
## greta array (operation)
## 
##      [,1] [,2] [,3]
## [1,]  ?    ?    ?  
## [2,]  ?    ?    ?  
## [3,]  ?    ?    ?
# explicit declaration: convert other objects to greta arrays
x <- as_data(iris$Petal.Length)

We can let greta arrays follow a probabilty distribution which we can define. This is important e.g. for Bayesian modeling, when you need to e.g. define priors over parameters.

Greta offers various probabilty distributions which you can use on random variables. Each of these distribution functions takes as arguments the distribution’s parameters. They return a variable greta array which follows the specified distribution, e.g.:

normal(mean, sd, dim = NULL, truncation = c(-Inf, Inf))

student(df, mu, sigma, dim = NULL, truncation = c(-Inf, Inf))

p  <- normal(0, 1) #mean, sd

It’s also possible to do frequentist inference in greta, using variable(). Creates greta arrays not associated with a probability distribution.

(int <- variable())

We need to specify a comprehensive model, before we make inferences on it. We use model().

m <- model(intercept, slope, sigma)

greta uses Markov chain Monte Carlo (MCMC) methods for samplimg & parameter estimation for greta models.

MCMC = Markov chain Monte Carlo methods. In greta, the default sampler for mcmc() is “Hamiltonian Monte Carlo algorithm”.

MCMC methods randomly sample inside a probabilistic space to approximate the posterior distribution.

HMC algorithm picks a random parameter value to consider. Continues to simulate random values and subjects them to assessment whether they closer approximate the best parameter values, by computing how likely each value is to explain the data. HMC uses the gradients of the joint density (provided by the model) to efficiently explore the set of parameters.

If a randomly generated parameter value is better than the last one, it is added to the chain of parameter values with a certain probability determined by how much better it is.

This information gets stored in a Markov chain as a sequence of porbablistic states.

After convergence, MCMC sampling yields a set of points which are samples from the desired posterior distribution!

Any statistics calculated on the set of samples generated by MCMC sampling is approximately similar to that statistic on the true posterior distribution.

draws <- mcmc(model, sampler = hmc(), n_samples = 1000, #n_samples = samples to draw per chain
  warmup = 1000, chains = 4, #warmup = 1000 iterations
  initial_values = init) #init_values = optional

# results in 4000 samples

There are other ways to do statistical inference on greta models (stashed_samples() extra_samples() initials() opt()), but we will stick to mcmc().

Lastly, you can always look up greta functions, using help()!

BAYESIAN MODEL FOR UNIVARIATE LINEAR REGRESSION

Linear regression is a model for predicting a numerical quantity, mapping, in the case below, one input to a numerical output. The factor that is being predicted (the factor that the equation solves for) is called the dependent variable.

The simple linear regression model is represented like this: y = (β0 +β1*X + Ε)

The example below covers a hypothetical Bayesian model to estimate the parameters of a linear regression between two of the variables in the “iris” dataset (inbuilt).

Example below extracted from greta guide.

# data gets declared as greta objects
x <- as_data(iris$Petal.Length) # IV
y <- as_data(iris$Sepal.Length) # DV

# create variables int, coef & sd and state prior distributions
int  <- normal(0, 1) #mean, sd
coef <- normal(0, 3) #mean, sd
sd   <- student(3, 0, 1, truncation = c(0, Inf)) #df, mu, sigma

# calculate the predicted sepal lengths: intercept + coefficient(slope) * data(IV)
mean <- int + coef * x

# we can take the model that we need to find as the mean of the normal distribution that we assume y will have

#  distribution() function sets up the likelihood function for the model
#  distribution takes as argument the variable we are modeling, hence DV
distribution(y) <- normal(mean, sd) 

# We can pass greta arrays as arguments to model() to flag them as the parameters we’re
# interested in. When sampling from the model with mcmc() those will be the greta arrays for
# which samples will be returned.
m <- model(int, coef, sd)

# visual representation
plot(m)
# greta model gives us joint density for a candidate set of values 
# perform inference with mcmc

draws <- mcmc(m, n_samples = 1000)
summary(draws)
## 
## Iterations = 1:1000
## Thinning interval = 1 
## Number of chains = 4 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##        Mean      SD  Naive SE Time-series SE
## int  4.2808 0.08017 0.0012676      0.0015631
## coef 0.4143 0.01934 0.0003058      0.0003821
## sd   0.4106 0.02366 0.0003741      0.0004482
## 
## 2. Quantiles for each variable:
## 
##        2.5%    25%    50%    75%  97.5%
## int  4.1233 4.2264 4.2798 4.3371 4.4363
## coef 0.3777 0.4011 0.4139 0.4269 0.4530
## sd   0.3658 0.3945 0.4097 0.4258 0.4592

MLE MODEL FOR UNIVARIATE LINEAR REGRESSION

#plot the data with regression line

iris %>% 
  ggplot(
    aes(
      x = Petal.Length, 
      y = Sepal.Length)) +
  
  # add line
  geom_jitter(alpha = 0.7, color = "darkblue")  + 
  geom_smooth(method = "lm", color = "darkorange")

Find the best fitting parameter values of the model above on your feet!

USING OPTIM()

We may also use optim() to retrieve an MLE for the three parameters of a simple linear regression, therefore find best values for β0 (= intercept), β1 (= slope) and σ (= standard deviation) that maximiste the likelihood function.

optim() is useful for finding the the minimum(!) of a function. If a maximum is needed, such as would be in MLE, we can simply multiply by −1 and again search the minimum with optim(). In this case, the nll function needs to be multiplied by -1.

keep in mind, we can also use optim() on the posterior distribution in the case of Bayesian probabilstic modeling.

Code below copied from web-book 9.5.1., using iris data.

# function for the negative log-likelihood of the given
# data and fixed parameter values
nll = function(y, x, beta_0, beta_1, sd) { #intercept, slope, sd, y = DV = Sepal.Length, x = IV = Petal.Legths
  
  # negative sigma is logically impossible
  if (sd <= 0) {return( Inf )}
  
  # calculate predicted values 
  yPred = beta_0 + beta_1 * x #intercept + slope * independent data
  
  # negative log-likelihood of each data point 
  nll = -dnorm(y, mean=yPred, sd=sd, log = T)
  
  # sum over all observations
  sum(nll)
}

# Here we use optim to minimize our neg log likelihood function for thr three parameters
fit_lh = optim(
  
  # par = initial values for the parameters to be optimized over
  par = c(1.5, 0, 0.5),# initial guesses for β0 (= intercept), β1 (= slope), σ (= standard deviation), where the function starts its search
  
  # fn = A function to be minimized (or maximized)
  # in this case, Min of the the nll = Max of the ll or likelihood
  # Max the ll == max the likelihood also
  fn = function(par) {
    with(iris, 
         nll(Sepal.Length, Petal.Length, #y,x
             par[1], par[2], par[3]) # β0 (= intercept), β1 (= slope), σ (= standard deviation)

    )
  }
)

fit_lh$par
## [1] 4.3062376 0.4090193 0.4044848

SYNTHESIZE EVERYTHING YOU KNOW: ANOTHER BAYESIAN MODEL FOR LINEAR REGRESSION IN GRETA

What do you think happens in every line of the code?

Code below copied from current HW07, using iris data.

# select data to use
Sepal.Length      <- as_data(FILL_ME)
log_Petal.Length  <- as_data(FILL_ME)

# latent variables and priors
intercept <- student(df= 1, mu = 0, sigma = 10)
slope     <- student(df= 1, mu = 0, sigma = 10)
sigma     <- normal(0 , 5, truncation = c(0, Inf))

# derived latent variable (linear model)
mean <- FILL_ME

# likelihood 
FILL_ME <- normal(mean, sigma)

# finalize model, register which parameters to monitor
m <- model(intercept, slope, sigma)

# Draw samples from the above model
draws <- mcmc(m, n_samples = 2000, chains = 4, warmup = 1000)
summary(draws)

BINOMIAL BAYESIAN MODEL FOR COIN FLIP IN GRETA

code below copied from web-book section 9.6.2., using coin example.

k <- as_data(67) #  k = number of successfully coining "head"/observed data/DV 
# we are trying to infer the probability of getting a head(k) on every single coin toss

N <- as_data(127) #  n = number of coin tosses/IV

# coin bias & prior (here: uninformative) 
# for beta distribution uninformative prior are symmetrical, e.g.: alpha=beta=1

theta   <- beta(1,1) #alpha & beta values

# likelihood of k given theta
distribution(k) <- binomial(N, theta) # binomial(size, prob, dim = NULL)

# tell greta which parameters the MCMC sampler should return information about (here:theta).
m <- model(theta)

# take 4 chains of 1000 samples
draws <- mcmc(
  model = m, 
  n_samples = 1000,
  warmup = 1000,
  chains = 4)

# The resulting draws object is a MCMC.list. We can simply transform it into a representation we prefer
# cast results (type 'mcmc.list') into tidy tibble
tidy_draws = ggmcmc::ggs(draws)
summary(tidy_draws)
##    Iteration          Chain      Parameter        value       
##  Min.   :   1.0   Min.   :1.00   theta:4000   Min.   :0.3774  
##  1st Qu.: 250.8   1st Qu.:1.75                1st Qu.:0.4991  
##  Median : 500.5   Median :2.50                Median :0.5295  
##  Mean   : 500.5   Mean   :2.50                Mean   :0.5285  
##  3rd Qu.: 750.2   3rd Qu.:3.25                3rd Qu.:0.5574  
##  Max.   :1000.0   Max.   :4.00                Max.   :0.6748