Bayesian data analysis & cognitive modeling

\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]

road map for today

probability
- discrete, continuous, cumulative
- subjective vs. objective chance
- example distributions

conditional probability & Bayes rule

\(p\)-values & confidence intervals

probability: discrete

definition

a discrete probability distribution over a finite set of mutually exclusive world states \(\States\) is a function \(P \colon \States \rightarrow [0;1]\) such that \(P(\States)=1\).

for finite \(\States\), \(P(\state)\) is \(\state\)'s probability mass

example

okay, winter is coming; but who will sit the Iron Throne next spring?

house.probs = c(6,3,1,2,4) %>% (function(x) x/sum(x))
names(house.probs)= c("Targaryen", "Lannister", "Baratheon", "Greyjoy", "Stark")
round(house.probs,3)

## Targaryen Lannister Baratheon   Greyjoy     Stark 
##     0.375     0.188     0.062     0.125     0.250

sum(house.probs)

## [1] 1

example

house.probs.df = as_tibble(house.probs) %>% 
  mutate(house = names(house.probs) %>% factor() %>% fct_inorder()) %>% 
  rename(probability = value)
house.plot = ggplot(house.probs.df, aes(x = house, y = probability)) + 
  geom_bar(stat = "identity", fill = "firebrick")
house.plot

notation

if \(f \colon \States \rightarrow \mathbb{R}^{\ge0}\), then

\[ P(\state) \propto f(\state) \]

is shorthand notation for

\[ P(\state) = \frac{f(\state)}{ \sum_{\state' \in \States} f(\state')} \]

example

house.probs = c(6,3,1,2,4) %>% (function(x) x/sum(x))

binomial

the binomial distribution gives the probability of observing \(k\) successes in \(n\) coin flips with a bias of \(\theta\):

\[ B(k ; n,\theta) = \binom{n}{k} \theta^{k} \, (1-\theta)^{n-k} \]

negative binomial

the negative binomial distribution gives the probability of needing \(n\) coin flips to observe \(k\) successes with a bias of \(\theta\):

\[ NB(n ; k, \theta) = \frac{k}{n} \binom{n}{k} \theta^{k} \, (1-\theta)^{n - k}\]

multinomial

probability of seeing \(\tuple{x_1, \dots, x_k}\) in \(n\) draws from a discrete probability distribution with \(\tuple{p_1, \dots, p_n}\) where \(x_i\) is the number of times that category \(i\) was drawn (\(\sum_{i = 1}^k x_i = n\))

\[ \mathrm{MultiNom}(\tuple{x_1, \dots, x_k} ; \tuple{p_1, \dots, p_n} ) = \frac{n!}{x_1! \dots x_n!} p_1^{x_1} \dots p_n^{x_n} \]

Poisson

Poisson distribution gives the probability of an event occurring \(k\) times in a fixed interval, when the expected value and variance of \(k\) is \(\lambda\)

\[ \mathrm{Poisson}(k ; \lambda) = \frac{\lambda^k \exp(-\lambda)}{k!}\]

cumulative distribution

\(\States\) is ordinal if there is a total strict ordering such that for all \(\state, \state' \in \States\):

\[ \state > \state' \ \ \mathrm{or} \ \ \state < \state'\]

the cumulative distribution of \(P\) is \(P_{\le}(\state) = \sum_{\state' \le \state}P(\state')\)

example: cumulative Poisson

probability: continuous

definition

a probability distribution over an infinite set (a convex, continuous interval) \(\States \subseteq \mathbb{R}\) is a function \(P \colon \pow{\States} \rightarrow \mathbb{R}^{\ge 0}\) such that \(\int P(\state) \mathrm{d}\state = 1\)

for all intervals \(I = [a;b] \subseteq \States\): \(\Pr(I) = \int_{a}^{b} f(\state) \ \text{d}\state\)

for infinite \(\States\), \(P(\state)\) is \(\state\)'s probability density

Normal distribution

for mean \(\mu\) and standard deviation \(\sigma\), the normal distribution is:

\[ \mathcal{N}(x ; \mu, \sigma) = \frac{1}{\sqrt{2 \sigma^2 \pi}} \exp \left ( - \frac{(x-\mu)^2}{2 \sigma^2} \right) \]

beta distribution

the beta distribution has a support on the unit interval \([0;1]\) and models a wide variety of shapes for shape parameters \(alpha\) and \(\beta\)

it is the conjugate prior for a binomial model in Bayesian analysis (more soon!)

\[ \text{Beta}(x ; \alpha, \beta) \propto x^{\alpha-1} \, (1-x)^{\beta-1} \]

interpretations of probability

objective vs. subjective probability

what's your subjective belief about the chance of drawing a red ball?

urn picture

interpretations of probability

objective

probabilities correspond to limit frequencies
probability is an inherent property of the outside world

subjective

probabilities are levels of credence of an agent
they inform (rational) decision making
beliefs can themselves be rational

more on this topic in SEP & an underground classic

conditional probability & Bayes rule

probability logic

let \(X,Y \subseteq T\) for finite \(\States\)

axioms

\(P(\emptyset) = 0\)
\(P(X) = \sum_{\state \in X} P(\state)\)

corollaries

\(P(X \cup Y) = P(X) + P(Y) - P(X \cap Y)\)
\(P(\States \setminus X) = 1 - P(X)\)

conditional probability

the conditional probability of \(X \subseteq \States\) given \(Y \subseteq \States\) is:

\[ P(X \mid Y) = \frac{P(X \cap Y)}{P(Y)} \]

the conditional probability of \(\state\) given \(Y \subseteq \States\) is:

\[ P(\state \mid Y) = \begin{cases}\frac{P(\state)}{P(Y)} & \mathrm{if } \ \ \state \in Y \\ 0 & \mathrm{otherwise} \end{cases} \]

requires \(P(Y) \neq 0\)

example

original belief:

house.probs

## Targaryen Lannister Baratheon   Greyjoy     Stark 
##    0.3750    0.1875    0.0625    0.1250    0.2500

updated after learning that it's not a northern house:

house.probs.upated = c(house.probs[1:3], 0, 0) %>% (function(x) x / sum(x))
house.probs.upated

## Targaryen Lannister Baratheon                     
##       0.6       0.3       0.1       0.0       0.0

NB: probability ratios stay intact

Bayes rule

given probability \(P(Y \mid X)\), we derive probability \(P(X \mid Y)\):

\[ \begin{align*} \red{P(X \mid Y)}\ & \red{\propto P(Y\mid X) \cdot P(X)} \\ & = \frac{P(Y\mid X) \cdot P(X)}{\sum_{X'} P(Y \mid X') \cdot P(X')} = \frac{P(Y\mid X) \cdot P(X)}{P(Y)} = \frac{P(X \cap Y)}{P(Y)} \end{align*} \]

why useful?

reasoning from cause to effect ("abduction")
inferring latent, unobservable stuff (model parameters) from concrete data

\(p\)-values and confidence intervals

wide-spread confusion

many statistical concepts are misunderstood by the majority of researchers

Oakes (1986): \(p\)-values are misinterpreted all the time
Haller & Kraus (2002): also true for senior researchers and statistics teachers
Hoekstra et al. (2014): the same holds true for confidence intervals

plot from Haller

quiz

\(p\)-values quantify subjective confidence that the null hypothesis is true/false
\(p\)-values quantify objective evidence for/against the null hypothesis
\(p\)-values quantify how surprising the data is under the null hypothsis
\(p\)-values quantify probability that the null hypothesis is true/false
non-significant \(p\)-values mean that the null hypothesis is true
significant \(p\)-values mean that the null hypothesis is false
significant \(p\)-values mean that the null hypothesis is unlikely

which of these are true?

\(p\)-values

sloppy version: the \(p\)-value (of a two-sided exact test) gives the probability, under the null hypothesis, of an outcome at least as unlikely as the actual outcome

better version: the \(p\)-value is the probability of observing, under infinite hypothetical repetitions of the same experiment, a less extreme value of a test statistic than that of the oberved data, given that the null hypothesis is true

wagenmakers diagram

example: fair coin?

we flip \(n=24\) times and observe \(k = 7\) successes
null hypothesis: \(\theta = 0.5\)

example: fair coin?

binom.test(7,24)

## 
##  Exact binomial test
## 
## data:  7 and 24
## number of successes = 7, number of trials = 24, p-value = 0.06391
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.1261521 0.5109478
## sample estimates:
## probability of success 
##              0.2916667

significance

fix a significance level, e.g.: \(0.05\)

we say that a test result is significant a \(p\)-value is below the pre-determined significance level

we reject the null hypothesis in case of significant test result

the significance level thereby determines the \(alpha\)-error of falsely rejecting the null hypothesis

aka: type-I error / incorrect rejection / false positive

confidence interval

the null hypothesis fixes a concrete value of some parameter (e.g., coin bias \(p = 0.5\))

we have an experiment \(E\) and we have a set of all possible outcomes \(O(E)\) of that experiment

the observed outcome is \(o^*\)

consider a function \(I \colon o \mapsto I_o\) mapping each possible outcome \(o \in O(E)\) to an interval of relevant parameter values

this construction is a level-\((1-p)\) confidence interval iff, under the assumption that the null hypothesis is correct, when we repeat the experiment infinitely often, the proportion of intervals \(I_{o}\), associated with each outcome \(o\) of a hypothetical repetition, that contain the the true parameter value is \(1-p\)

great visualization

here

what do we learn from a CI?

range of values we would not reject (at the given significance level)
range of values that would not make the data surprising (at the given level)
range of values that are most likely given the data
range of values that it is rational to belief in / bet on
that the true value lies in this interval
that the true value is likely in this interval
that, if we repeat the experiment, the outcome will likely lie in this interval

fini

outlook

Friday

some oddities of \(p\)-values & Rmarkdown

Tuesday

introduction to a Bayesian approach to statistical inference

to prevent boredom

obligatory

prepare Wagenmakers (2007)
- only up to page 788 (you may stop at section "Bayesian inference"; you may also go on)

optional

read R for Data Science part V on Rmarkdown