\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]

road map for today

 

  • probability
    • discrete, continuous, cumulative
    • subjective vs. objective chance
    • example distributions

 

  • conditional probability & Bayes rule

 

  • \(p\)-values & confidence intervals

probability: discrete

definition

 

a discrete probability distribution over a finite set of mutually exclusive world states \(\States\) is a function \(P \colon \States \rightarrow [0;1]\) such that \(P(\States)=1\).

 

for finite \(\States\), \(P(\state)\) is \(\state\)'s probability mass

example

okay, winter is coming; but who will sit the Iron Throne next spring?

 

house.probs = c(6,3,1,2,4) %>% (function(x) x/sum(x))
names(house.probs)= c("Targaryen", "Lannister", "Baratheon", "Greyjoy", "Stark")
round(house.probs,3)
## Targaryen Lannister Baratheon   Greyjoy     Stark 
##     0.375     0.188     0.062     0.125     0.250

 

sum(house.probs)
## [1] 1

example

house.probs.df = as_tibble(house.probs) %>% 
  mutate(house = names(house.probs) %>% factor() %>% fct_inorder()) %>% 
  rename(probability = value)
house.plot = ggplot(house.probs.df, aes(x = house, y = probability)) + 
  geom_bar(stat = "identity", fill = "firebrick")
house.plot

notation

 

if \(f \colon \States \rightarrow \mathbb{R}^{\ge0}\), then

\[ P(\state) \propto f(\state) \]

is shorthand notation for

\[ P(\state) = \frac{f(\state)}{ \sum_{\state' \in \States} f(\state')} \]

 

example

house.probs = c(6,3,1,2,4) %>% (function(x) x/sum(x))

binomial

the binomial distribution gives the probability of observing \(k\) successes in \(n\) coin flips with a bias of \(\theta\):

\[ B(k ; n,\theta) = \binom{n}{k} \theta^{k} \, (1-\theta)^{n-k} \]

negative binomial

the negative binomial distribution gives the probability of needing \(n\) coin flips to observe \(k\) successes with a bias of \(\theta\):

\[ NB(n ; k, \theta) = \frac{k}{n} \binom{n}{k} \theta^{k} \, (1-\theta)^{n - k}\]

multinomial

probability of seeing \(\tuple{x_1, \dots, x_k}\) in \(n\) draws from a discrete probability distribution with \(\tuple{p_1, \dots, p_n}\) where \(x_i\) is the number of times that category \(i\) was drawn (\(\sum_{i = 1}^k x_i = n\))

\[ \mathrm{MultiNom}(\tuple{x_1, \dots, x_k} ; \tuple{p_1, \dots, p_n} ) = \frac{n!}{x_1! \dots x_n!} p_1^{x_1} \dots p_n^{x_n} \]

Poisson

Poisson distribution gives the probability of an event occurring \(k\) times in a fixed interval, when the expected value and variance of \(k\) is \(\lambda\)

\[ \mathrm{Poisson}(k ; \lambda) = \frac{\lambda^k \exp(-\lambda)}{k!}\]

cumulative distribution

\(\States\) is ordinal if there is a total strict ordering such that for all \(\state, \state' \in \States\):

\[ \state > \state' \ \ \mathrm{or} \ \ \state < \state'\]

the cumulative distribution of \(P\) is \(P_{\le}(\state) = \sum_{\state' \le \state}P(\state')\)

example: cumulative Poisson

probability: continuous

definition

 

a probability distribution over an infinite set (a convex, continuous interval) \(\States \subseteq \mathbb{R}\) is a function \(P \colon \pow{\States} \rightarrow \mathbb{R}^{\ge 0}\) such that \(\int P(\state) \mathrm{d}\state = 1\)

 

for all intervals \(I = [a;b] \subseteq \States\): \(\Pr(I) = \int_{a}^{b} f(\state) \ \text{d}\state\)

 

for infinite \(\States\), \(P(\state)\) is \(\state\)'s probability density

Normal distribution

for mean \(\mu\) and standard deviation \(\sigma\), the normal distribution is:

\[ \mathcal{N}(x ; \mu, \sigma) = \frac{1}{\sqrt{2 \sigma^2 \pi}} \exp \left ( - \frac{(x-\mu)^2}{2 \sigma^2} \right) \]

 

beta distribution

the beta distribution has a support on the unit interval \([0;1]\) and models a wide variety of shapes for shape parameters \(alpha\) and \(\beta\)

it is the conjugate prior for a binomial model in Bayesian analysis (more soon!)

\[ \text{Beta}(x ; \alpha, \beta) \propto x^{\alpha-1} \, (1-x)^{\beta-1} \]

 

interpretations of probability

objective vs. subjective probability

 

what's your subjective belief about the chance of drawing a red ball?

urn picture

interpretations of probability

 

objective

  • probabilities correspond to limit frequencies
  • probability is an inherent property of the outside world

 

subjective

  • probabilities are levels of credence of an agent
  • they inform (rational) decision making
  • beliefs can themselves be rational

more on this topic in SEP & an underground classic

conditional probability & Bayes rule

probability logic

 

let \(X,Y \subseteq T\) for finite \(\States\)

 

axioms

  • \(P(\emptyset) = 0\)
  • \(P(X) = \sum_{\state \in X} P(\state)\)

corollaries

  • \(P(X \cup Y) = P(X) + P(Y) - P(X \cap Y)\)
  • \(P(\States \setminus X) = 1 - P(X)\)

more on probability & logic

conditional probability

 

the conditional probability of \(X \subseteq \States\) given \(Y \subseteq \States\) is:

\[ P(X \mid Y) = \frac{P(X \cap Y)}{P(Y)} \]

 

the conditional probability of \(\state\) given \(Y \subseteq \States\) is:

\[ P(\state \mid Y) = \begin{cases}\frac{P(\state)}{P(Y)} & \mathrm{if } \ \ \state \in Y \\ 0 & \mathrm{otherwise} \end{cases} \]

requires \(P(Y) \neq 0\)

example

original belief:

house.probs
## Targaryen Lannister Baratheon   Greyjoy     Stark 
##    0.3750    0.1875    0.0625    0.1250    0.2500

updated after learning that it's not a northern house:

house.probs.upated = c(house.probs[1:3], 0, 0) %>% (function(x) x / sum(x))
house.probs.upated
## Targaryen Lannister Baratheon                     
##       0.6       0.3       0.1       0.0       0.0

NB: probability ratios stay intact

Bayes rule

 

given probability \(P(Y \mid X)\), we derive probability \(P(X \mid Y)\):

\[ \begin{align*} \red{P(X \mid Y)}\ & \red{\propto P(Y\mid X) \cdot P(X)} \\ & = \frac{P(Y\mid X) \cdot P(X)}{\sum_{X'} P(Y \mid X') \cdot P(X')} = \frac{P(Y\mid X) \cdot P(X)}{P(Y)} = \frac{P(X \cap Y)}{P(Y)} \end{align*} \]

 

why useful?

  • reasoning from cause to effect ("abduction")
  • inferring latent, unobservable stuff (model parameters) from concrete data

\(p\)-values and confidence intervals

wide-spread confusion

many statistical concepts are misunderstood by the majority of researchers

  • Oakes (1986): \(p\)-values are misinterpreted all the time
  • Haller & Kraus (2002): also true for senior researchers and statistics teachers
  • Hoekstra et al. (2014): the same holds true for confidence intervals

plot from Haller

quiz

 

  1. \(p\)-values quantify subjective confidence that the null hypothesis is true/false
  2. \(p\)-values quantify objective evidence for/against the null hypothesis
  3. \(p\)-values quantify how surprising the data is under the null hypothsis
  4. \(p\)-values quantify probability that the null hypothesis is true/false
  5. non-significant \(p\)-values mean that the null hypothesis is true
  6. significant \(p\)-values mean that the null hypothesis is false
  7. significant \(p\)-values mean that the null hypothesis is unlikely

which of these are true?

\(p\)-values

sloppy version: the \(p\)-value (of a two-sided exact test) gives the probability, under the null hypothesis, of an outcome at least as unlikely as the actual outcome

better version: the \(p\)-value is the probability of observing, under infinite hypothetical repetitions of the same experiment, a less extreme value of a test statistic than that of the oberved data, given that the null hypothesis is true

wagenmakers diagram

example: fair coin?

  • we flip \(n=24\) times and observe \(k = 7\) successes
  • null hypothesis: \(\theta = 0.5\)

example: fair coin?

binom.test(7,24)
## 
##  Exact binomial test
## 
## data:  7 and 24
## number of successes = 7, number of trials = 24, p-value = 0.06391
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.1261521 0.5109478
## sample estimates:
## probability of success 
##              0.2916667

significance

fix a significance level, e.g.: \(0.05\)

 

we say that a test result is significant a \(p\)-value is below the pre-determined significance level

 

we reject the null hypothesis in case of significant test result

 

the significance level thereby determines the \(alpha\)-error of falsely rejecting the null hypothesis

  • aka: type-I error / incorrect rejection / false positive

confidence interval

the null hypothesis fixes a concrete value of some parameter (e.g., coin bias \(p = 0.5\))

we have an experiment \(E\) and we have a set of all possible outcomes \(O(E)\) of that experiment

the observed outcome is \(o^*\)

consider a function \(I \colon o \mapsto I_o\) mapping each possible outcome \(o \in O(E)\) to an interval of relevant parameter values

this construction is a level-\((1-p)\) confidence interval iff, under the assumption that the null hypothesis is correct, when we repeat the experiment infinitely often, the proportion of intervals \(I_{o}\), associated with each outcome \(o\) of a hypothetical repetition, that contain the the true parameter value is \(1-p\)

great visualization

what do we learn from a CI?

 

  1. range of values we would not reject (at the given significance level)
  2. range of values that would not make the data surprising (at the given level)
  3. range of values that are most likely given the data
  4. range of values that it is rational to belief in / bet on
  5. that the true value lies in this interval
  6. that the true value is likely in this interval
  7. that, if we repeat the experiment, the outcome will likely lie in this interval

fini

outlook

 

Friday

  • some oddities of \(p\)-values & Rmarkdown

 

Tuesday

  • introduction to a Bayesian approach to statistical inference

to prevent boredom

obligatory

  • prepare Wagenmakers (2007)
    • only up to page 788 (you may stop at section "Bayesian inference"; you may also go on)

 

optional