\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]

overview

 

  • generalized linear model (GLM)
    • types of variables
      • metric, nominal, ordinal, count
  • linear model
    • ordinary least squares regresssion
    • maximum likelihood regression
    • Bayesian approaches
  • generalization to simplest \(t\)-test scenario

generalized linear model

probabilistic models

 

standard notion

model = likelihood function \(P(D \mid \theta)\)

Bayesian

model = likelihood \(P(D \mid \theta)\) + prior \(P(\theta)\)

approaches to modeling

 

flavors_of_modeling

Generalized Context Model ::: Model

CGM

generalized linear model

 

terminology

  • \(y\) predicted variable, data, observation, …
  • \(X\) predictor variables for \(y\), explanatory variables, …

 

blueprint of a GLM

\[ \begin{align*} \eta & = \text{linear_combination}(X) \\ \mu & = \text{link_fun}( \ \eta, \theta_{\text{LF}} \ ) \\ y & \sim \text{lh_fun}( \ \mu, \ \theta_{\text{LH}} \ ) \end{align*} \]

glm_scheme

types of variables

 

type examples
metric speed of a car, reading time, average time spent cooking p.d., …
binary coin flip, truth-value judgement, experience with R, …
nominal gender, political party, favorite philosopher, …
ordinal level of education, rating scale judgement, …
count number of cars passing under bridge in 1h, …

common link & likelihood function

linear regression

muder rate data set

 

murder_data = readr::read_csv('data/02_murder_rates.csv') %>% 
  rename(murder_rate = annual_murder_rate_per_million_inhabitants,
         low_income = percentage_low_income, 
         unemployment = percentage_unemployment) %>% 
  select(murder_rate, low_income, unemployment,population)
murder_data %>% head
## # A tibble: 6 x 4
##   murder_rate low_income unemployment population
##         <dbl>      <dbl>        <dbl>      <int>
## 1       11.2        16.5         6.20     587000
## 2       13.4        20.5         6.40     643000
## 3       40.7        26.3         9.30     635000
## 4        5.30       16.5         5.30     692000
## 5       24.8        19.2         7.30    1248000
## 6       12.7        16.5         5.90     643000

visualize data

GGally::ggpairs(murder_data, 
                title = "Murder rate data")