\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]

recap

generalized linear model

 

terminology

  • \(y\) predicted variable, data, observation, …
  • \(X\) predictor variables for \(y\), explanatory variables, …

 

blueprint of a GLM

\[ \begin{align*} \eta & = \text{linear_combination}(X) \\ \mu & = \text{link_fun}( \ \eta, \theta_{\text{LF}} \ ) \\ y & \sim \text{lh_fun}( \ \mu, \ \theta_{\text{LH}} \ ) \end{align*} \]

glm_scheme

common link & likelihood function

linear regression: a Bayesian approach

Bayes: likelihood + prior

inspect posterior distribution over \(\beta_0\), \(\beta_1\) and \(\sigma_{\text{err}}\) given the data \(y\) and the model:

\[ \begin{align*} y_{\text{pred}} & = \beta_0 + \beta_1 x & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y & \sim \mathcal{N}(\mu = y_{\text{pred}}, \sigma_{err}) \\ \beta_i & \sim \mathcal{U}(-\infty, \infty) & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \sigma_{err} & \sim \mathcal{U}(-\infty, \infty) \end{align*} \]

data { 
  int<lower=1> N;  // total number of observations 
  vector[N] murder_rate;  // dependent variable 
  vector[N] low_income;  // independent variable
} 
parameters { 
  real Intercept;  real beta;  
  real<lower=0> sigma;  
} 
model { 
  // predictor value = expected value
  vector[N] mu = Intercept + low_income * beta;
  // likelihood 
  target += normal_lpdf(murder_rate | mu, sigma);
} 

overview

overview

 

  • linear models with several metric predictors
    • multiplicative interactions
    • correlated predictors
  • robust regression
  • one categorical predictor
  • GLMs with other types of predicted variables
    • binary outcomes: logistic regression
    • nominal outcomes multi-logit regression
    • ordinal outcomes: ordinal (logit/probit) regression
  • case study: logistic regression with multiple categorical predictors

LM with multiple predictors

linear model

data

  • \(y\): \(n \times 1\) vector of predicted variables (metric)

  • \(X\): \(n \times k\) matrix of predictor variables (metric)

parameters

  • \(\beta\): \(k \times 1\) vector of coefficients
  • \(\sigma_{err}\): standard deviation of Gaussian noise

model

\[ \begin{align*} \eta & = X_i \cdot \beta & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y_i & \sim \mathcal{N}(\mu = \eta_i, \sigma_{err}) \\ \end{align*} \]

example

murder_data %>% head
## # A tibble: 6 x 4
##   murder_rate low_income unemployment population
##         <dbl>      <dbl>        <dbl>      <dbl>
## 1        11.2       16.5          6.2     587000
## 2        13.4       20.5          6.4     643000
## 3        40.7       26.3          9.3     635000
## 4         5.3       16.5          5.3     692000
## 5        24.8       19.2          7.3    1248000
## 6        12.7       16.5          5.9     643000
  • predicted variable \(y\): murder_data[,1]
  • predictor matrix \(X\): murder_data[,2:4]

predictor value for two predictor variables