\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]
terminology
blueprint of a GLM
\[ \begin{align*} \eta & = \text{linear_combination}(X) \\ \mu & = \text{link_fun}( \ \eta, \theta_{\text{LF}} \ ) \\ y & \sim \text{lh_fun}( \ \mu, \ \theta_{\text{LH}} \ ) \end{align*} \]
type of \(y\) | (inverse) link function | likelihood function |
---|---|---|
metric | \(\mu = \eta\) | \(y \sim \text{Normal}(\mu, \sigma)\) |
binary | \(\mu = \text{logistic}(\eta, \theta, \gamma) = (1 + \exp(-\gamma (\eta - \theta)))^{-1}\) | \(y \sim \text{Binomial}(\mu)\) |
nominal | \(\mu_k = \text{soft-max}(\eta_k, \lambda) \propto \exp(\lambda \eta_k)\) | \(y \sim \text{Multinomial}({\mu})\) |
ordinal | \(\mu_k = \text{threshold-Phi}(\eta_k, \sigma, {\delta})\) | \(y \sim \text{Multinomial}({\mu})\) |
count | \(\mu = \exp(\eta)\) | \(y \sim \text{Poisson}(\mu)\) |
Bayes: likelihood + prior
inspect posterior distribution over \(\beta_0\), \(\beta_1\) and \(\sigma_{\text{err}}\) given the data \(y\) and the model:
\[ \begin{align*} y_{\text{pred}} & = \beta_0 + \beta_1 x & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y & \sim \mathcal{N}(\mu = y_{\text{pred}}, \sigma_{err}) \\ \beta_i & \sim \mathcal{U}(-\infty, \infty) & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \sigma_{err} & \sim \mathcal{U}(-\infty, \infty) \end{align*} \]
data { int<lower=1> N; // total number of observations vector[N] murder_rate; // dependent variable vector[N] low_income; // independent variable } parameters { real Intercept; real beta; real<lower=0> sigma; } model { // predictor value = expected value vector[N] mu = Intercept + low_income * beta; // likelihood target += normal_lpdf(murder_rate | mu, sigma); }
data
\(y\): \(n \times 1\) vector of predicted variables (metric)
\(X\): \(n \times k\) matrix of predictor variables (metric)
parameters
model
\[ \begin{align*} \eta & = X_i \cdot \beta & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y_i & \sim \mathcal{N}(\mu = \eta_i, \sigma_{err}) \\ \end{align*} \]
example
murder_data %>% head
## # A tibble: 6 x 4 ## murder_rate low_income unemployment population ## <dbl> <dbl> <dbl> <dbl> ## 1 11.2 16.5 6.2 587000 ## 2 13.4 20.5 6.4 643000 ## 3 40.7 26.3 9.3 635000 ## 4 5.3 16.5 5.3 692000 ## 5 24.8 19.2 7.3 1248000 ## 6 12.7 16.5 5.9 643000
murder_data[,1]
murder_data[,2:4]
intercept
\(X_{i,1} = 1\) for all \(i\)
multiplicative interaction
\(X_{i,l} = X_{i,j} \cdot X_{i,k}\) for all \(i\) with \(j \neq k\)
transformed variables
\(X_{i,k} = F(X_{i,j})\) (e.g., \(F(x) = \log(x)\) or \(F(x) = x^2\))
crazy terms\(^*\)
\(X_{i,l} = \exp((X_{i,j} - X_{i,k})^2 + \sqrt{X_{i,m}})\)
\(^*\) at which point it gets pointless to speak of a "linear model" still, even if formally subsumable
GGally::ggpairs(murder_data, title = "Murder rate data")
point estimates
glm(murder_rate ~ low_income + population, data = murder_data) %>% broom::tidy()
## # A tibble: 3 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -31.2 8.30 -3.76 0.00156 ## 2 low_income 2.60 0.403 6.44 0.00000615 ## 3 population 0.000000420 0.000000767 0.547 0.591
Bayes
BF_fit_1 = BayesFactor::lmBF( murder_rate ~ low_income + population, data = murder_data %>% as.data.frame) post_samples_1 = BayesFactor::posterior( model = BF_fit_1, iterations = 5000)
point estimates
glm(murder_rate ~ low_income + unemployment, data = murder_data) %>% broom::tidy()
## # A tibble: 3 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -34.1 6.73 -5.07 0.0000956 ## 2 low_income 1.22 0.568 2.15 0.0459 ## 3 unemployment 4.40 1.53 2.88 0.0103
Bayes
BF_fit_2 = BayesFactor::lmBF( murder_rate ~ low_income + unemployment, data = murder_data %>% as.data.frame) post_samples_2 = BayesFactor::posterior( model = BF_fit_2, iterations = 5000)
uncorrelated
correlated
murder_rate ~ low_income + population
murder_rate ~ low_income + unemployment
BF_group_fit = BayesFactor::regressionBF( murder_rate ~ low_income + unemployment + population, data = murder_data %>% as.data.frame) plot(BF_group_fit)
BF_group_fit = BayesFactor::regressionBF( murder_rate ~ low_income + inv_income, data = murder_data %>% mutate(inv_income = -low_income) %>% as.data.frame) plot(BF_group_fit)
murder_data = murder_data %>% mutate(inv_income = -low_income)
glm(murder_rate ~ low_income + inv_income, data = murder_data) %>% broom::tidy()
## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -29.9 7.79 -3.84 0.00120 ## 2 low_income 2.56 0.390 6.56 0.00000364
m = brm(murder_rate ~ low_income + inv_income, data = murder_data) broom::tidy(m)
## term estimate std.error lower upper ## 1 b_Intercept -30.10417 8.442471 -43.810329 -16.042556 ## 2 b_low_income 222.92690 3737.368597 -6061.069129 5405.508617 ## 3 b_inv_income 220.36852 3737.365239 -6063.784900 5403.503213 ## 4 sigma 5.88133 1.367383 4.107823 8.354835 ## 5 lp__ -67.65668 1.338055 -70.020305 -66.075633
posterior_samples(m) %>% as.tibble() -> n qplot(x = n$b_low_income, y = n$b_inv_income) + xlab("low income") + ylab("inverse low-income")
general problem
correlated predictors distort parameter inference in linear models
(same for point-estimation and Bayes)
point estimates
Bayes
linear model (so far)
standard likelihood function: normal distribution
\[ \begin{align*} \eta & = X_i \cdot \beta & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y_i & \sim \mathcal{N}(\mu = \eta_i, \sigma_{err}) \\ \end{align*} \]
\(\Rightarrow\) encodes an assumption about noise
robust regression
uses Student's \(t\) distribution instead
\[ \begin{align*} \eta & = X_i \cdot \beta & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y_i & \sim \mathcal{t}(\mu = \eta_i, \sigma_{err}, \nu) \\ \end{align*} \]
systematic and independent outlier group
\[ \begin{align*} \eta & = X_i \cdot \beta & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y_i & \sim (1-\epsilon) \mathcal{N}(\mu = \eta_i, \sigma_{err}) + \epsilon \mathcal{N}(\mu = \mu_{\text{outlier}}, \sigma_{err}) \\ \end{align*} \]
scale-dependent noise
\[ \begin{align*} \eta & = X_i \cdot \beta & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y_i & \sim \mathcal{N}(\mu = \eta_i, \sigma = a\eta_i + b) \\ \end{align*} \]
caveat: these are just gestures towards possibilities; not necessarily fully functional!
iq_data = readr::read_csv('data/03_Kruschke_TwoGroupIQ.csv') summary(iq_data)
## Score Group ## Min. : 50.00 Length:120 ## 1st Qu.: 92.75 Class :character ## Median :102.00 Mode :character ## Mean :104.13 ## 3rd Qu.:112.00 ## Max. :208.00
possible research questions?
from Kruschke (2015, Chapter 16)
Bayesian GLM:
inspect posterior distribution over \(\beta_0\) and \(\sigma_{\text{err}}\) given the data \(y\) and the model:
\[ \begin{align*} y_{\text{pred}} & = \beta_0 & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y & \sim \mathcal{N}(\mu = y_{\text{pred}}, \sigma_{err}) \\ \beta_0 & \sim \mathcal{N}(100, \frac{1}{10000}) & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \sigma_{err} & \sim \text{N}(0,10000)I[0,] \end{align*} \]
## # A tibble: 1 x 4 ## Parameter HDI_lo mean HDI_hi ## <fct> <dbl> <dbl> <dbl> ## 1 b0 101. 108. 114.
brm_fit_ttest = brm(iq_score_treatment ~ 1, data = tibble(iq_score_treatment = iq_score_treatment)) brm_fit_ttest
## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: iq_score_treatment ~ 1 ## Data: tibble(iq_score_treatment = iq_score_treatment) (Number of observations: 63) ## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; ## total post-warmup samples = 4000 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat ## Intercept 107.85 3.23 101.47 114.25 2824 1.00 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat ## sigma 25.63 2.34 21.48 30.65 2867 1.00 ## ## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample ## is a crude measure of effective sample size, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1).
Bayesian GLM:
x = ifelse(Group == "Control", 0, 1)
(dummy coding)\[ \begin{align*} y_{\text{pred}} & = \beta_0 + \beta_1 x & \ \ \ \ \ \ \ \ \ \ \ \ \ \ y & \sim \mathcal{N}(\mu = y_{\text{pred}}, \sigma_{err}) \\ \beta_0 & \sim \mathcal{N}(100, 15) & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \sigma_{err} & \sim \mathcal{N}(0,100)T[0;\infty] \\ \beta_1 & \sim \mathcal{N}(0, 30) \end{align*} \]
prior1 = c(set_prior("normal(0,30)", class = "b", coef= "GroupSmartDrug"), set_prior("normal(0, 100)", class = "sigma"), set_prior("normal(100,15)", class = "Intercept")) brm_fit = brm( formula = Score ~ Group, data = iq_data, prior = prior1 )
## term estimate std.error lower upper ## 1 b_Intercept 100.100380 2.946309 95.262982 105.05456 ## 2 b_GroupSmartDrug 7.618633 4.097243 1.007568 14.36550 ## 3 sigma 22.427686 1.489774 20.101028 24.95942 ## 4 lp__ -552.498173 1.241103 -554.954430 -551.15690
terminology
blueprint of a GLM
\[ \begin{align*} \eta & = \text{linear_combination}(X) \\ \mu & = \text{link_fun}( \ \eta, \theta_{\text{LF}} \ ) \\ y & \sim \text{lh_fun}( \ \mu, \ \theta_{\text{LH}} \ ) \end{align*} \]
type of \(y\) | (inverse) link function | likelihood function |
---|---|---|
metric | \(\mu = \eta\) | \(y \sim \text{Normal}(\mu, \sigma)\) |
binary | \(\mu = \text{logistic}(\eta, \theta, \gamma) = (1 + \exp(-\gamma (\eta - \theta)))^{-1}\) | \(y \sim \text{Binomial}(\mu)\) |
nominal | \(\mu_k = \text{soft-max}(\eta_k, \lambda) \propto \exp(\lambda \eta_k)\) | \(y \sim \text{Multinomial}({\mu})\) |
ordinal | \(\mu_k = \text{threshold-Phi}(\eta_k, \sigma, {\delta})\) | \(y \sim \text{Multinomial}({\mu})\) |
count | \(\mu = \exp(\eta)\) | \(y \sim \text{Poisson}(\mu)\) |
\[ \begin{align*} \eta & = X_i \cdot \beta \\ \mu &= \text{logistic}(\eta, \theta = 0, \gamma = 1) = (1 + \exp(-\eta))^{-1} \\ y & \sim \text{Bernoulli}(\mu) \end{align*} \]
\[\text{logistic}(\eta, \theta, \gamma) = \frac{1}{(1 + \exp(-\gamma (\eta - \theta)))}\]
dummy
dummy
threshold \(\theta\)
gain \(\gamma\)
\[ \begin{align*} \eta_j & = X_i \cdot \beta_j \\ \mu_j & \propto \exp(\eta_j) \\ y & \sim \text{Categorical}(\mu) \end{align*} \]
\[ \begin{align*} \eta & = X_i \cdot \beta \\ \mu_j & = \Phi(\theta_{j} - \eta) - \Phi(\theta_{j+1} - \eta) \\ y & \sim \text{Categorical}(\mu) \end{align*} \]
terminology
blueprint of a GLM
\[ \begin{align*} \eta & = \text{linear_combination}(X) \\ \mu & = \text{link_fun}( \ \eta, \theta_{\text{LF}} \ ) \\ y & \sim \text{lh_fun}( \ \mu, \ \theta_{\text{LH}} \ ) \end{align*} \]
Tuesday
after that