$\definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}}$ $\definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}}$ $\newcommand{\set}[1]{\{#1\}}$ $\newcommand{\tuple}[1]{\langle#1\rangle}$ $\newcommand{\States}{{T}}$ $\newcommand{\state}{{t}}$ $\newcommand{\pow}[1]{{\mathcal{P}(#1)}}$

## overview

Â

• generalized linear model (GLM)
• types of variables
• metric, nominal, ordinal, count
• linear model
• ordinary least squares regresssion
• maximum likelihood regression
• Bayesian approaches
• generalization to simplest $$t$$-test scenario

## probabilistic models

Â

standard notion

model = likelihood function $$P(D \mid \theta)$$

Bayesian

model = likelihood $$P(D \mid \theta)$$ + prior $$P(\theta)$$

approaches to modeling

Â

## generalized linear model

Â

terminology

• $$y$$ predicted variable, data, observation, â€¦
• $$X$$ predictor variables for $$y$$, explanatory variables, â€¦

Â

blueprint of a GLM

\begin{align*} \eta & = \text{linear_combination}(X) \\ \mu & = \text{link_fun}( \ \eta, \theta_{\text{LF}} \ ) \\ y & \sim \text{lh_fun}( \ \mu, \ \theta_{\text{LH}} \ ) \end{align*}

## types of variables

Â

type examples
metric speed of a car, reading time, average time spent cooking p.d., â€¦
binary coin flip, truth-value judgement, experience with R, â€¦
nominal gender, political party, favorite philosopher, â€¦
ordinal level of education, rating scale judgement, â€¦
count number of cars passing under bridge in 1h, â€¦

## muder rate data set

Â

murder_data = readr::read_csv('data/02_murder_rates.csv') %>%
rename(murder_rate = annual_murder_rate_per_million_inhabitants,
low_income = percentage_low_income,
unemployment = percentage_unemployment) %>%
select(murder_rate, low_income, unemployment,population)
murder_data %>% head
## # A tibble: 6 x 4
##   murder_rate low_income unemployment population
##         <dbl>      <dbl>        <dbl>      <int>
## 1       11.2        16.5         6.20     587000
## 2       13.4        20.5         6.40     643000
## 3       40.7        26.3         9.30     635000
## 4        5.30       16.5         5.30     692000
## 5       24.8        19.2         7.30    1248000
## 6       12.7        16.5         5.90     643000

## visualize data

GGally::ggpairs(murder_data,
title = "Murder rate data")