- probability
- discrete, continuous, cumulative
- subjective vs. objective chance
- example distributions
- conditional probability & Bayes rule
- \(p\)-values & confidence intervals
\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]
a discrete probability distribution over a finite set of mutually exclusive world states \(\States\) is a function \(P \colon \States \rightarrow [0;1]\) such that \(P(\States)=1\).
for finite \(\States\), \(P(\state)\) is \(\state\)'s probability mass
okay, winter is coming; but who will sit the Iron Throne next spring?
house.probs = c(6,3,1,2,4) %>% (function(x) x/sum(x)) names(house.probs)= c("Targaryen", "Lannister", "Baratheon", "Greyjoy", "Stark") round(house.probs,3)
## Targaryen Lannister Baratheon Greyjoy Stark ## 0.375 0.188 0.062 0.125 0.250
sum(house.probs)
## [1] 1
house.probs.df = as_tibble(house.probs) %>% mutate(house = names(house.probs) %>% factor() %>% fct_inorder()) %>% rename(probability = value) house.plot = ggplot(house.probs.df, aes(x = house, y = probability)) + geom_bar(stat = "identity", fill = "firebrick") house.plot
if \(f \colon \States \rightarrow \mathbb{R}^{\ge0}\), then
\[ P(\state) \propto f(\state) \]
is shorthand notation for
\[ P(\state) = \frac{f(\state)}{ \sum_{\state' \in \States} f(\state')} \]
example
house.probs = c(6,3,1,2,4) %>% (function(x) x/sum(x))
the binomial distribution gives the probability of observing \(k\) successes in \(n\) coin flips with a bias of \(\theta\):
\[ B(k ; n,\theta) = \binom{n}{k} \theta^{k} \, (1-\theta)^{n-k} \]
the negative binomial distribution gives the probability of needing \(n\) coin flips to observe \(k\) successes with a bias of \(\theta\):
\[ NB(n ; k, \theta) = \frac{k}{n} \binom{n}{k} \theta^{k} \, (1-\theta)^{n - k}\]
probability of seeing \(\tuple{x_1, \dots, x_k}\) in \(n\) draws from a discrete probability distribution with \(\tuple{p_1, \dots, p_n}\) where \(x_i\) is the number of times that category \(i\) was drawn (\(\sum_{i = 1}^k x_i = n\))
\[ \mathrm{MultiNom}(\tuple{x_1, \dots, x_k} ; \tuple{p_1, \dots, p_n} ) = \frac{n!}{x_1! \dots x_n!} p_1^{x_1} \dots p_n^{x_n} \]
Poisson distribution gives the probability of an event occurring \(k\) times in a fixed interval, when the expected value and variance of \(k\) is \(\lambda\)
\[ \mathrm{Poisson}(k ; \lambda) = \frac{\lambda^k \exp(-\lambda)}{k!}\]
\(\States\) is ordinal if there is a total strict ordering such that for all \(\state, \state' \in \States\):
\[ \state > \state' \ \ \mathrm{or} \ \ \state < \state'\]
the cumulative distribution of \(P\) is \(P_{\le}(\state) = \sum_{\state' \le \state}P(\state')\)
example: cumulative Poisson
a probability distribution over an infinite set (a convex, continuous interval) \(\States \subseteq \mathbb{R}\) is a function \(P \colon \pow{\States} \rightarrow \mathbb{R}^{\ge 0}\) such that \(\int P(\state) \mathrm{d}\state = 1\)
for all intervals \(I = [a;b] \subseteq \States\): \(\Pr(I) = \int_{a}^{b} f(\state) \ \text{d}\state\)
for infinite \(\States\), \(P(\state)\) is \(\state\)'s probability density
for mean \(\mu\) and standard deviation \(\sigma\), the normal distribution is:
\[ \mathcal{N}(x ; \mu, \sigma) = \frac{1}{\sqrt{2 \sigma^2 \pi}} \exp \left ( - \frac{(x-\mu)^2}{2 \sigma^2} \right) \]
the beta distribution has a support on the unit interval \([0;1]\) and models a wide variety of shapes for shape parameters \(alpha\) and \(\beta\)
it is the conjugate prior for a binomial model in Bayesian analysis (more soon!)
\[ \text{Beta}(x ; \alpha, \beta) \propto x^{\alpha-1} \, (1-x)^{\beta-1} \]
what's your subjective belief about the chance of drawing a red ball?
objective
subjective
more on this topic in SEP & an underground classic
let \(X,Y \subseteq T\) for finite \(\States\)
axioms
corollaries
more on probability & logic
the conditional probability of \(X \subseteq \States\) given \(Y \subseteq \States\) is:
\[ P(X \mid Y) = \frac{P(X \cap Y)}{P(Y)} \]
the conditional probability of \(\state\) given \(Y \subseteq \States\) is:
\[ P(\state \mid Y) = \begin{cases}\frac{P(\state)}{P(Y)} & \mathrm{if } \ \ \state \in Y \\ 0 & \mathrm{otherwise} \end{cases} \]
requires \(P(Y) \neq 0\)
original belief:
house.probs
## Targaryen Lannister Baratheon Greyjoy Stark ## 0.3750 0.1875 0.0625 0.1250 0.2500
updated after learning that it's not a northern house:
house.probs.upated = c(house.probs[1:3], 0, 0) %>% (function(x) x / sum(x)) house.probs.upated
## Targaryen Lannister Baratheon ## 0.6 0.3 0.1 0.0 0.0
NB: probability ratios stay intact
given probability \(P(Y \mid X)\), we derive probability \(P(X \mid Y)\):
\[ \begin{align*} \red{P(X \mid Y)}\ & \red{\propto P(Y\mid X) \cdot P(X)} \\ & = \frac{P(Y\mid X) \cdot P(X)}{\sum_{X'} P(Y \mid X') \cdot P(X')} = \frac{P(Y\mid X) \cdot P(X)}{P(Y)} = \frac{P(X \cap Y)}{P(Y)} \end{align*} \]
why useful?
many statistical concepts are misunderstood by the majority of researchers
which of these are true?
sloppy version: the \(p\)-value (of a two-sided exact test) gives the probability, under the null hypothesis, of an outcome at least as unlikely as the actual outcome
better version: the \(p\)-value is the probability of observing, under infinite hypothetical repetitions of the same experiment, a less extreme value of a test statistic than that of the oberved data, given that the null hypothesis is true
binom.test(7,24)
## ## Exact binomial test ## ## data: 7 and 24 ## number of successes = 7, number of trials = 24, p-value = 0.06391 ## alternative hypothesis: true probability of success is not equal to 0.5 ## 95 percent confidence interval: ## 0.1261521 0.5109478 ## sample estimates: ## probability of success ## 0.2916667
fix a significance level, e.g.: \(0.05\)
we say that a test result is significant a \(p\)-value is below the pre-determined significance level
we reject the null hypothesis in case of significant test result
the significance level thereby determines the \(alpha\)-error of falsely rejecting the null hypothesis
the null hypothesis fixes a concrete value of some parameter (e.g., coin bias \(p = 0.5\))
we have an experiment \(E\) and we have a set of all possible outcomes \(O(E)\) of that experiment
the observed outcome is \(o^*\)
consider a function \(I \colon o \mapsto I_o\) mapping each possible outcome \(o \in O(E)\) to an interval of relevant parameter values
this construction is a level-\((1-p)\) confidence interval iff, under the assumption that the null hypothesis is correct, when we repeat the experiment infinitely often, the proportion of intervals \(I_{o}\), associated with each outcome \(o\) of a hypothetical repetition, that contain the the true parameter value is \(1-p\)
Friday
Tuesday
obligatory
optional