\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]
terminology
blueprint of a GLM
\[ \begin{align*} \eta & = \text{linear_combination}(X) \\ \mu & = \text{link_fun}( \ \eta, \theta_{\text{LF}} \ ) \\ y & \sim \text{lh_fun}( \ \mu, \ \theta_{\text{LH}} \ ) \end{align*} \]
type of \(y\) | (inverse) link function | likelihood function |
---|---|---|
metric | \(\mu = \eta\) | \(y \sim \text{Normal}(\mu, \sigma)\) |
binary | \(\mu = \text{logistic}(\eta, \theta, \gamma) = (1 + \exp(-\gamma (\eta - \theta)))^{-1}\) | \(y \sim \text{Binomial}(\mu)\) |
nominal | \(\mu_k = \text{soft-max}(\eta_k, \lambda) \propto \exp(\lambda \eta_k)\) | \(y \sim \text{Multinomial}({\mu})\) |
ordinal | \(\mu_k = \text{threshold-Phi}(\eta_k, \sigma, {\delta})\) | \(y \sim \text{Multinomial}({\mu})\) |
count | \(\mu = \exp(\eta)\) | \(y \sim \text{Poisson}(\mu)\) |
Bayes: likelihood + prior
inspect posterior distribution over \(\beta_0\), \(\beta_1\) and \(\sigma_{\text{err}}\) given the data \(y\) and the model:
\[ \begin{align*} y_{\text{pred}} & = \beta_0 + \beta_1 x & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y & \sim \mathcal{N}(\mu = y_{\text{pred}}, \sigma_{err}) \\ \beta_i & \sim \mathcal{N}(0, \sigma_{\beta}) & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \frac{1}{\sigma_{err}^2} & \sim \text{Gamma}(0.1,0.1) \end{align*} \]
model{ sigma_e = 1/sqrt(tau_e) tau_e ~ dgamma(0.1,0.1) b0 ~ dnorm(0, 1/10000000) b1 ~ dnorm(0, 1/10000000) for (i in 1:k){ yPred[i] = b0 + b1 * x[i] y[i] ~ dnorm(yPred[i], tau_e) } }
data
\(y\): \(n \times 1\) vector of predicted values (metric)
\(X\): \(n \times k\) matrix of predictor values (metric)
parameters
model
\[ \begin{align*} \eta & = X_i \cdot \beta & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y_i & \sim \mathcal{N}(\mu = \eta_i, \sigma_{err}) \\ \end{align*} \]
example
murder_data %>% head
## # A tibble: 6 x 4 ## murder_rate low_income unemployment population ## <dbl> <dbl> <dbl> <int> ## 1 11.2 16.5 6.2 587000 ## 2 13.4 20.5 6.4 643000 ## 3 40.7 26.3 9.3 635000 ## 4 5.3 16.5 5.3 692000 ## 5 24.8 19.2 7.3 1248000 ## 6 12.7 16.5 5.9 643000
murder_data[,1]
murder_data[,2:3]
intercept
\(X_{i,1} = 1\) for all \(i\)
interaction terms
\(X_{i,l} = X_{i,j} \cdot X_{i,k}\) for all \(i\) with \(j \neq k\)
transformed variables
\(X_{i,k} = F(X_{i,j})\) (e.g., \(F(x) = \log(x)\) or \(F(x) = x^2\))
crazy terms\(^*\)
\(X_{i,l} = \exp((X_{i,j} - X_{i,k})^2 + \sqrt{X_{i,m}})\)
\(^*\) at which point it gets pointless to speak of a "linear model" still, even if formally subsumable
GGally::ggpairs(murder_data, title = "Murder rate data")
point estimates
glm(murder_rate ~ low_income + population, data = murder_data) %>% broom::tidy()
## term estimate std.error statistic p.value ## 1 (Intercept) -3.121522e+01 8.300600e+00 -3.760598 0.0015586420 ## 2 low_income 2.595519e+00 4.032796e-01 6.436029 0.0000061507 ## 3 population 4.198061e-07 7.674560e-07 0.547010 0.5914808122
Bayes
BF_fit_1 = BayesFactor::lmBF( murder_rate ~ low_income + population, data = murder_data %>% as.data.frame) post_samples_1 = BayesFactor::posterior( model = BF_fit_1, iterations = 5000)
point estimates
glm(murder_rate ~ low_income + unemployment, data = murder_data) %>% broom::tidy()
## term estimate std.error statistic p.value ## 1 (Intercept) -34.072533 6.7265451 -5.065384 9.558945e-05 ## 2 low_income 1.223931 0.5682047 2.154031 4.587777e-02 ## 3 unemployment 4.398936 1.5261685 2.882340 1.034228e-02
Bayes
BF_fit_2 = BayesFactor::lmBF( murder_rate ~ low_income + unemployment, data = murder_data %>% as.data.frame) post_samples_2 = BayesFactor::posterior( model = BF_fit_2, iterations = 5000)
uncorrelated
correlated
murder_rate ~ low_income + population
murder_rate ~ low_income + unemployment
BF_group_fit = BayesFactor::regressionBF( murder_rate ~ low_income + unemployment + population, data = murder_data %>% as.data.frame) plot(BF_group_fit)
BF_group_fit = BayesFactor::regressionBF( murder_rate ~ low_income + inv_income, data = murder_data %>% mutate(inv_income = -low_income) %>% as.data.frame) plot(BF_group_fit)
BF_complex = BayesFactor::regressionBF( murder_rate ~ low_income + inv_income, rscaleCont = 5, # set a wider prior for all slopes data = murder_data %>% mutate(inv_income = -low_income) %>% as.data.frame) plot(BF_complex)
general problem
correlated predictors distort parameter inference in linear models
(same for point-estimation and Bayes)
point estimates
Bayes
linear model (so far)
standard likelihood function: normal distribution
\[ \begin{align*} \eta & = X_i \cdot \beta & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y_i & \sim \mathcal{N}(\mu = \eta_i, \sigma_{err}) \\ \end{align*} \]
\(\Rightarrow\) encodes an assumption about noise
robust regression
uses Student's \(t\) distribution instead
\[ \begin{align*} \eta & = X_i \cdot \beta & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y_i & \sim \mathcal{t}(\mu = \eta_i, \sigma_{err}, \nu) \\ \end{align*} \]
systematic and independent outlier group
\[ \begin{align*} \eta & = X_i \cdot \beta & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y_i & \sim (1-\epsilon) \mathcal{N}(\mu = \eta_i, \sigma_{err}) + \epsilon \mathcal{N}(\mu = \mu_{\text{outlier}}, \sigma_{err}) \\ \end{align*} \]
scale-dependent noise
\[ \begin{align*} \eta & = X_i \cdot \beta & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y_i & \sim \mathcal{N}(\mu = \eta_i, \sigma = a\eta_i + b) \\ \end{align*} \]
caveat: these are just gestures towards possibilities; not necessarily fully functional!
iq_data = readr::read_csv('../data/07_Kruschke_TwoGroupIQ.csv') summary(iq_data)
## Score Group ## Min. : 50.00 Length:120 ## 1st Qu.: 92.75 Class :character ## Median :102.00 Mode :character ## Mean :104.13 ## 3rd Qu.:112.00 ## Max. :208.00
possible research questions?
from Kruschke (2015, Chapter 16)
Bayesian GLM:
only coefficient is intercept \(\beta_0\), with prior knowledge encoded in priors:
\[ \begin{align*} y_{\text{pred}} & = \beta_0 & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y & \sim \mathcal{N}(\mu = y_{\text{pred}}, \sigma_{err}) \\ \beta_0 & \sim \mathcal{N}(100, 15) & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \frac{1}{\sigma_{err}^2} & \sim \text{Gamma}(0.1,0.1) \end{align*} \]
model{ sigma_e = 1/sqrt(tau_e) tau_e ~ dgamma(0.1,0.1) b0 ~ dnorm(100, 1/15^2) for (i in 1:k){ yPred[i] = b0 y[i] ~ dnorm(yPred[i], tau_e) } }
## # A tibble: 1 x 4 ## Parameter HDI_lo mean HDI_hi ## <fctr> <dbl> <dbl> <dbl> ## 1 b0 96.11298 102.8397 109.3491
Bayesian GLM:
x = ifelse(Group == "Control", 0, 1)
(dummy coding)\[ \begin{align*} y_{\text{pred}} & = \beta_0 + \beta_1 x & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y & \sim \mathcal{N}(\mu = y_{\text{pred}}, \sigma_{err}) \\ \beta_0 & \sim \mathcal{N}(100, 15) & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \frac{1}{\sigma_{err}^2} & \sim \text{Gamma}(0.1,0.1) \\ \beta_1 & \sim \mathcal{N}(0, 30) \end{align*} \]
model{ sigma_e = 1/sqrt(tau_e) tau_e ~ dgamma(0.1,0.1) b0 ~ dnorm(100, 1/1/15^2) b1 ~ dnorm(0, 1/1/30^2) for (i in 1:k){ yPred[i] = b0 y[i] ~ dnorm(yPred[i], tau_e) } }
## # A tibble: 2 x 4 ## Parameter HDI_lo mean HDI_hi ## <fctr> <dbl> <dbl> <dbl> ## 1 b0 94.4408927 100.098591 105.73580 ## 2 b1 -0.3854978 7.678485 15.38381
terminology
blueprint of a GLM
\[ \begin{align*} \eta & = \text{linear_combination}(X) \\ \mu & = \text{link_fun}( \ \eta, \theta_{\text{LF}} \ ) \\ y & \sim \text{lh_fun}( \ \mu, \ \theta_{\text{LH}} \ ) \end{align*} \]
type of \(y\) | (inverse) link function | likelihood function |
---|---|---|
metric | \(\mu = \eta\) | \(y \sim \text{Normal}(\mu, \sigma)\) |
binary | \(\mu = \text{logistic}(\eta, \theta, \gamma) = (1 + \exp(-\gamma (\eta - \theta)))^{-1}\) | \(y \sim \text{Binomial}(\mu)\) |
nominal | \(\mu_k = \text{soft-max}(\eta_k, \lambda) \propto \exp(\lambda \eta_k)\) | \(y \sim \text{Multinomial}({\mu})\) |
ordinal | \(\mu_k = \text{threshold-Phi}(\eta_k, \sigma, {\delta})\) | \(y \sim \text{Multinomial}({\mu})\) |
count | \(\mu = \exp(\eta)\) | \(y \sim \text{Poisson}(\mu)\) |
\[ \begin{align*} \eta & = X_i \cdot \beta \\ \mu &= \text{logistic}(\eta, \theta = 0, \gamma = 1) = (1 + \exp(-\eta))^{-1} \\ y & \sim \text{Bernoulli}(\mu) \end{align*} \]
\[\text{logistic}(\eta, \theta, \gamma) = \frac{1}{(1 + \exp(-\gamma (\eta - \theta)))}\]
dummy
dummy
threshold \(\theta\)
gain \(\gamma\)
\[ \begin{align*} \eta_j & = X_i \cdot \beta_j \\ \mu_j & \propto \exp(\eta_j) \\ y & \sim \text{Categorical}(\mu) \end{align*} \]
\[ \begin{align*} \eta & = X_i \cdot \beta \\ \mu_j & = \Phi(\theta_{j} - \eta) - \Phi(\theta_{j+1} - \eta) \\ y & \sim \text{Categorical}(\mu) \end{align*} \]
terminology
blueprint of a GLM
\[ \begin{align*} \eta & = \text{linear_combination}(X) \\ \mu & = \text{link_fun}( \ \eta, \theta_{\text{LF}} \ ) \\ y & \sim \text{lh_fun}( \ \mu, \ \theta_{\text{LH}} \ ) \end{align*} \]
Tuesday
Friday