\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{mygray}{RGB}{178,34,34} \newcommand{\mygray}[1]{{\color{mygray}{#1}}} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \] \[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\pow}[1]{{\mathcal{P}(#1)}}\]
which of two models is more likely, given the data?
\[\underbrace{\frac{P(M_1 \mid D)}{P(M_2 \mid D)}}_{\text{posterior odds}} = \underbrace{\frac{P(D \mid M_1)}{P(D \mid M_2)}}_{\text{Bayes factor}} \ \underbrace{\frac{P(M_1)}{P(M_2)}}_{\text{prior odds}}\]
| standard | Bayes | |
|---|---|---|
| today's focus | Akaike's information criterion | Bayes factor |
| what counts as model to be compared? | likelihood function \(P(D \mid \theta)\) | likelihood \(P(D \mid \theta)\) + prior \(P(\theta)\) |
| from which point of view do we compare models? | ex post: after seeing the data | ex ante: before seeing the data |
| how to penalize model complexity? | count number of free parameters | implicitly weigh how effective a parameter is |
| are we guaranteed to select the true model in the end? | no | yes |
| hard to compute? | relatively easy | relatively hard |
\[ \begin{align*} \text{BF}(M_0 > M_1) & = \frac{P(D \mid M_0)}{P(D \mid M_1)} \\ & = \frac{\text{Binomial}(k,N,0.5)}{\int_0^1 \text{Beta}(\theta, 1, 1) \ \text{Binomial}(k,N, \theta) \text{ d}\theta} \\ & = \frac{\binom{N}{k} 0.5^{k} \, (1-0.5)^{N - k}}{\int_0^1 \binom{N}{k} \theta^{k} \, (1-\theta)^{N - k} \text{ d}\theta} \\ & = \frac{\binom{N}{k} 0.5^{N}}{\binom{N}{k} \int_0^1 \theta^{k} \, (1-\theta)^{N - k} \text{ d}\theta} \\ & = \frac{0.5^{N}}{BetaFunction(k+1, N-k+1)} \approx 0.516 \end{align*} \]
unraveling beta function
\[ \begin{align*} BetaFunction(k+1, N-k+1) & = \frac{\Gamma(k+1) \Gamma(N-k+1)}{\Gamma(N+2)} \\ & = \frac{(k)! (N-k)!}{(N+1)!} \\ & = \frac{(k)! (N-k)!}{(N)!} \frac{1}{N+1} \\ & = \frac{1}{\binom{N}{k}} \frac{1}{N+1} \\ \end{align*} \]
so:
\[ \begin{align*} \text{BF}(M_0 > M_1) & = \frac{P(D \mid M_0)}{P(D \mid M_1)} \\ & = \frac{0.5^{N}}{BetaFunction(k+1, N-k+1)} \\ & = \binom{N}{k} 0.5^{N} (N + 1) \approx 0.516 \end{align*} \]
let \(M_0\) be properly nested under \(M_1\) s.t. \(M_0\) fixes \(\theta_i = x_i, \dots, \theta_n = x_n\)
\[ \begin{align*} \text{BF}(M_0 > M_1) & = \frac{P(D \mid M_0)}{P(D \mid M_1)} \\ & = \frac{P(\theta_i = x_i, \dots, \theta_n = x_n \mid D, M_1)}{P(\theta_i = x_i, \dots, \theta_n = x_n \mid M_1)} \end{align*} \]
\[ \begin{align*} P(D \mid M_0) & = \int P(D \mid \psi, M_0) P(\psi \mid M_0) \ \text{d}\psi \\ & = \int P(D \mid \psi, \phi = \phi_0, M_1) P(\psi \mid \phi = \phi_0, M_1) \ \text{d}\psi\\ & = P(D \mid \phi = \phi_0, M_1) \ \ \ \ \ \ \text{(by Bayes rule)} \\ & = \frac{P(\phi = \phi_0 \mid D, M_1) P(D \mid M_1)}{P(\phi = \phi_0 \mid M_1)} \end{align*} \]
y for 100 trials after time t in secondsy = c(.94, .77, .40, .26, .24, .16) t = c( 1, 3, 6, 9, 12, 18) obs = y*100
example from Myung (2007), JoMP, tutorial on MLE
recall probabilities for different times \(t\)
exponential model
\[P(t \ ; \ a, b) = a \exp (-bt)\]
\[\text{where } a,b>0 \]
exponential
priorExp = function(a, b){
dunif(a, 0, 1.5) * dunif(b, 0, 1.5)
}
lhExp = function(a, b){
p = a*exp(-b*t)
p[p <= 0.0] = 1.0e-5
p[p >= 1.0] = 1-1.0e-5
prod(dbinom(x = obs, prob = p, size = 100))
}power
priorPow = function(c, d){
dunif(c, 0, 1.5) * dunif(d, 0, 1.5)
}
lhPow = function(c, d){
p = c*t^(-d)
p[p <= 0.0] = 1.0e-5
p[p >= 1.0] = 1-1.0e-5
prod(dbinom(x = obs, prob = p, size = 100))
}lhExp = Vectorize(lhExp)
lhPow = Vectorize(lhPow)
stepsize = 0.01
evidence = expand.grid(x = seq(0.005, 1.495, by = stepsize),
y = seq(0.005, 1.495, by = stepsize)) %>%
mutate(lhExp = lhExp(x,y), priExp = 1 / length(x), # uniform priors!
lhPow = lhPow(x,y), priPow = 1 / length(x))
paste0("BF in favor of exponential model: ",
with(evidence, sum(priExp*lhExp)/ sum(priPow*lhPow)) %>% round(2))
## [1] "BF in favor of exponential model: 1221.39"
overwhelming evidence in favor of the exponential model
generally:
\[\int f(\theta) \ P(\theta) \ \text{d}\theta \approx \frac{1}{n} \sum^{n}_{\theta_i \sim P(\theta)} f(\theta)\]
in particular:
\[P(D) = \int P(D \mid \theta) \ P(\theta) \ \text{d}\theta \approx \frac{1}{n} \sum^{n}_{\theta_i \sim P(\theta)} P(D \mid \theta)\]
nSamples = 200000
a = runif(nSamples, 0, 1.5)
b = runif(nSamples, 0, 1.5)
lhExpVec = lhExp(a,b)
lhPowVec = lhPow(a,b)
paste0("BF in favor of exponential model: ",
with(evidence, sum(lhExpVec)/ sum(lhPowVec)) %>% round(2))
## [1] "BF in favor of exponential model: 1182.32"
BFVec = sapply(seq(10000,nSamples, by = 200),
function(i) sum(lhExpVec[1:i]) / sum(lhPowVec[1:i]))
ggplot(data.frame(i = seq(10000,nSamples, by = 200), BF = BFVec), aes(x=i, y=BF)) +
geom_line() + geom_hline(yintercept = 1221, color = "firebrick") + xlab("number of samples")
take an arbitrary function \(h(\theta)\) such that \(\int h(\theta) \text{d}\theta = 1\)
\[ \begin{align*} \frac{1}{P(D)} & = \frac{P(\theta \mid D)}{P(D \mid \theta) P(\theta)}\\ & = \frac{P(\theta \mid D)}{P(D \mid \theta) P(\theta)} \int h(\theta) \text{d}\theta \\ & = \int \frac{h(\theta) P(\theta \mid D)}{P(D \mid \theta) P(\theta)} \text{d}\theta \\ & \approx \frac{1}{n} \sum^{n}_{\theta_i \sim P(\theta \mid D)} \frac{h(\theta)}{P(D \mid \theta) P(\theta)} \end{align*} \]
choose a \(h(\theta)\) that resembles the posterior
model{
m ~ dbern(0.5)
a ~ dunif(0,1.5)
b ~ dunif(0,1.5)
c ~ dunif(0,1.5)
d ~ dunif(0,1.5)
for (i in 1: 6){
pT[i] = ifelse(m == 0, a*exp(-t[i]*b), c*t[i]^(-d))
p[i] = min(max( pT[i], 0.0001), 0.9999)
obs[i] ~ dbinom(p[i], 100)
}
}
modelString = "
model{
a ~ dunif(0,1.5)
b ~ dunif(0,1.5)
for (i in 1: 6){
pT[i] = a*exp(-t[i]*b)
p[i] = min(max( pT[i], 0.00001), 0.99999)
obs[i] ~ dbinom(p[i], 100)
}
}"
# prepare for JAGS
dataList = list(obs = obs, t = t)
# set up and run model
jagsModel = jags.model(file = textConnection(modelString),
data = dataList,
n.chains = 2)
update(jagsModel, n.iter = 25000)
codaSamples = coda.samples(jagsModel,
variable.names = c("a", "b"),
n.iter = 5000)
ms = ggs(codaSamples)
getGammaApprox = function(samples){
s = sd(samples)
m = mean(samples)
ra = ( m + sqrt( m^2 + 4*s^2 ) ) / ( 2 * s^2 )
sh = 1 + m * ra
return(c(shape = sh, rate = ra))
}
wideSamples = dcast(ms, Iteration + Chain ~ Parameter)
paramA = getGammaApprox(filter(ms, Parameter == "a")$value)
paramB = getGammaApprox(filter(ms, Parameter == "b")$value)
## parameter shape rate ## 1 a 1121.0532 1056.1074 ## 2 b 202.4942 1549.5401 ## 3 c 2331.6908 2462.5925 ## 4 d 232.5322 465.8463
homework!
make model index a parameter
dummy dummy
model{
m ~ dbern(0.5)
a ~ dunif(0,1.5)
b ~ dunif(0,1.5)
c ~ dunif(0,1.5)
d ~ dunif(0,1.5)
for (i in 1: 6){
pT[i] = ifelse(m == 0, a*exp(-t[i]*b), c*t[i]^(-d))
p[i] = min(max( pT[i], 0.0001), 0.9999)
obs[i] ~ dbinom(p[i], 100)
}
}
tweaking prior model odds & using pseudo-priors
model{
mT ~ dbern(0.9995)
m = mT + 1
a[1] ~ dunif(0,1.5)
b[1] ~ dunif(0,1.5)
c[2] ~ dunif(0,1.5)
d[2] ~ dunif(0,1.5)
a[2] ~ dgamma(1124.84336, 1058.83299)
b[2] ~ dgamma(199.82804, 1527.78959)
c[1] ~ dgamma(61.88311, 29.28403)
d[1] ~ dgamma(115.44879, 126.33878)
for (i in 1: 6){
pExp[i] = a[m]*exp(-t[i]*b[m])
pPow[i] = c[m]*t[i]^(-d[m])
pT[i] = ifelse(m == 1, pExp[i], pPow[i])
p[i] = min(max( pT[i], 0.00001), 0.99999)
obs[i] ~ dbinom(p[i], 100)
}
}
## [1] "BF in favor of Exponential Model (using transdimensional MCMC): 948.77"
| standard | Bayes | |
|---|---|---|
| today's focus | Akaike's information criterion | Bayes factor |
| what counts as model to be compared? | likelihood function \(P(D \mid \theta)\) | likelihood \(P(D \mid \theta)\) + prior \(P(\theta)\) |
| from which point of view do we compare models? | ex post: after seeing the data | ex ante: before seeing the data |
| how to penalize model complexity? | count number of free parameters | implicitly weigh how effective a parameter is |
| are we guaranteed to select the true model in the end? | no | yes |
| hard to compute? | relatively easy | relatively hard |
Friday
Tuesday