\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]
parameter estimation
\[\underbrace{P(\theta \, | \, D)}_{posterior} \propto \underbrace{P(\theta)}_{prior} \ \underbrace{P(D \, | \, \theta)}_{likelihood}\]
model comparison
\[\underbrace{\frac{P(M_1 \mid D)}{P(M_2 \mid D)}}_{\text{posterior odds}} = \underbrace{\frac{P(D \mid M_1)}{P(D \mid M_2)}}_{\text{Bayes factor}} \ \underbrace{\frac{P(M_1)}{P(M_2)}}_{\text{prior odds}}\]
prior & posterior predictive
\[ P(D) = \int P(\theta) \ P(D \mid \theta) \ \text{d}\theta \] \[ P(D \mid D') = \int P(\theta \mid D') \ P(D \mid \theta) \ \text{d}\theta \]
standard approach
Bayes
y
for 100 subjects after time t
in secondsy = c(.94, .77, .40, .26, .24, .16)
t = c( 1, 3, 6, 9, 12, 18)
obs = y*100
example from Myung (2007), JoMP, tutorial on MLE
recall probabilities for different times \(t\)
exponential model
\[P(t \ ; \ a, b) = a \exp (-bt)\]
\[\text{where } a,b>0 \]
power model
\[P(t \ ; \ c, d) = ct^{-d}\]
\[\text{where } c,d>0 \]
exponential
nLL.exp <- function(w1,w2) {
if (w1 < 0 | w2 < 0 |
w1 > 20 | w2 > 20) {
return(NA)
}
p = w1*exp(-w2*t)
p[p <= 0.0] = 1.0e-5
p[p >= 1.0] = 1-1.0e-5
- sum(dbinom(x = obs, prob = p,
size = 100, log = T))
}
show(nLL.exp(1,1))
## [1] 1092.446
power
nLL.pow <- function(w1,w2) {
if (w1 < 0 | w2 < 0 |
w1 > 20 | w2 > 20) {
return(NA)
}
p = w1*t^(-w2)
p[p <= 0.0] = 1.0e-5
p[p >= 1.0] = 1-1.0e-5
- sum(dbinom(x = obs, prob = p,
size = 100, log = T))
}
show(nLL.pow(1,1))
## [1] 142.1096
require('stats4')
bestExpo = mle(nLL.exp, start = list(w1=1,w2=1))
bestPow = mle(nLL.pow, start = list(w1=1,w2=1))
MLEstimates = data.frame(model = rep(c("expo", "power"), each = 2),
parameter = c("a", "b", "c", "d"),
value = c(coef(bestExpo), coef(bestPow)))
knitr::kable(MLEstimates)
model | parameter | value |
---|---|---|
expo | a | 1.0701173 |
expo | b | 0.1308300 |
power | c | 0.9531186 |
power | d | 0.4979311 |
which model is better?
predExp = expo(t,a,b)
predPow = power(t,c,d)
modelStats = data.frame(model = c("expo", "power"),
logLike = round(c(logLik(bestExpo), logLik(bestPow)),3),
pData = exp(c(logLik(bestExpo), logLik(bestPow))),
r = round(c(cor(predExp, y), cor(predPow,y)),3),
LSE = round(c( sum((predExp-y)^2), sum((predPow-y)^2)),3))
modelStats
## model logLike pData r LSE
## 1 expo -18.666 7.820887e-09 0.982 0.019
## 2 power -26.726 2.471738e-12 0.948 0.057
motivation
definition
Let \(M\) be a model with \(k\) parameters, and \(D\) be some data:
\[\text{AIC}(M, D) = 2k - 2\ln P(D \mid \hat{\theta})\]
The smaller the AIC, the better the model.
## model logLike pData r LSE AIC
## 1 expo -18.666 7.820887e-09 0.982 0.019 41.33294
## 2 power -26.726 2.471738e-12 0.948 0.057 57.45220
show(AIC(bestExpo, bestPow))
## df AIC
## bestExpo 2 41.33294
## bestPow 2 57.45220
\[\underbrace{\frac{P(M_1 \mid D)}{P(M_2 \mid D)}}_{\text{posterior odds}} = \underbrace{\frac{P(D \mid M_1)}{P(D \mid M_2)}}_{\text{Bayes factor}} \ \underbrace{\frac{P(M_1)}{P(M_2)}}_{\text{prior odds}}\]
The Bayes factor is the factor by which our prior odds are changed by the data.
Bayes factor in favor of model \(M_1\)
\[\text{BF}(M_1 > M_2) = \frac{P(D \mid M_1)}{P(D \mid M_2)}\]
marginal likelihood of data under model \(M_i\)
\[P(D \mid M_i) = \int P(\theta_i \mid M_i) \ P(D \mid \theta_i, M_i) \text{ d}\theta_i\]
BF(M1 > M2) | interpretation |
---|---|
1 | irrelevant data |
1 - 3 | hardly worth ink or breath |
3 - 6 | anecdotal |
6 - 10 | now we’re talking: substantial |
10 - 30 | strong |
30 - 100 | very strong |
100 + | decisive (bye, bye \(M_2\)!) |
exponential
priorExp = function(a, b){
dunif(a, 0, 1.5) * dunif(b, 0, 1.5)
}
lhExp = function(a, b){
p = a*exp(-b*t)
p[p <= 0.0] = 1.0e-5
p[p >= 1.0] = 1-1.0e-5
prod(dbinom(x = obs, prob = p, size = 100))
}
power
priorPow = function(c, d){
dunif(c, 0, 1.5) * dunif(d, 0, 1.5)
}
lhPow = function(c, d){
p = c*t^(-d)
p[p <= 0.0] = 1.0e-5
p[p >= 1.0] = 1-1.0e-5
prod(dbinom(x = obs, prob = p, size = 100))
}
lhExp = Vectorize(lhExp)
lhPow = Vectorize(lhPow)
stepsize = 0.01
evidence = expand.grid(x = seq(0.005, 1.495, by = stepsize),
y = seq(0.005, 1.495, by = stepsize)) %>%
mutate(lhExp = lhExp(x,y), priExp = 1 / length(x), # uniform priors!
lhPow = lhPow(x,y), priPow = 1 / length(x))
paste0("BF in favor of exponential model: ",
with(evidence, sum(priExp*lhExp)/ sum(priPow*lhPow)) %>% round(2))
## [1] "BF in favor of exponential model: 1221.39"
overwhelming evidence in favor of the exponential model
generally:
\[\int f(\theta) \ P(\theta) \ \text{d}\theta \approx \frac{1}{n} \sum^{n}_{\theta_i \sim P(\theta)} f(\theta)\]
in particular:
\[P(D) = \int P(D \mid \theta) \ P(\theta) \ \text{d}\theta \approx \frac{1}{n} \sum^{n}_{\theta_i \sim P(\theta)} P(D \mid \theta)\]
nSamples = 1000000
a = runif(nSamples, 0, 1.5)
b = runif(nSamples, 0, 1.5)
lhExpVec = lhExp(a,b)
lhPowVec = lhPow(a,b)
paste0("BF in favor of exponential model: ",
sum(lhExpVec) / sum(lhPowVec))
## [1] "BF in favor of exponential model: 1242.49990813332"
BFVec = sapply(seq(10000,nSamples, by = 500),
function(i) sum(lhExpVec[1:i]) / sum(lhPowVec[1:i]))
ggplot(data.frame(i = seq(10000,nSamples, by = 500), BF = BFVec), aes(x=i, y=BF)) +
geom_line() + geom_hline(yintercept = 1221, color = "firebrick") + xlab("number of samples")
standard approach
Bayes
Thursday
Tuesday
obligatory
optional