Instructions
echo = F
(so as not to have it show up in your output):knitr::opts_chunk$set(
warning = FALSE, # supress warnings per default
message = FALSE # supress messages per default
)
library(tidyverse)
Consider a flip-and-draw scenario similar to one we discussed in class. First, we flip a coin with a bias of .75 of landing heads. If we observe heads, we draw from urn 1, otherwise from urn 2. The content of the urns is:
Calculate and write down (maybe using Rmarkdown) the joint probability table for this scenario.
A | B | P(A,B) |
---|---|---|
W | H | \(0.75 \cdot 0.6 = 0.45\) |
W | T | \(0.25 \cdot 0.2 = 0.05\) |
B | H | \(0.75 \cdot 0.3 = 0.225\) |
B | T | \(0.25 \cdot 0.5 = 0.125\) |
R | H | \(0.75 \cdot 0.1 = 0.075\) |
R | T | \(0.25 \cdot 0.3 = 0.075\) |
Additionally (not part of the task):
\(\textrm{P(A|B)}\) | A = W | A = B | A = R |
---|---|---|---|
B = H | 0.6 | 0.3 | 0.1 |
B = T | 0.2 | 0.5 | 0.3 |
B = H | B = T | |
---|---|---|
P(B) | 0.75 | 0.25 |
Calculate the marginal probability of drawing a red ball.
\[\begin{align} P(A=R)&=\sum_{i=1}^kP(A=R|B_i) \cdot P(B_i),\\ &=P(A=R|B=H) \cdot P(B=H) + P(A=R|B=T) \cdot P(B=T),\\ &=0.1 \cdot 0.75 + 0.3 \cdot 0.25,\\ &=0.075 + 0.075=0.15. \end{align}\]
Calculate the conditional probability of observing a red ball given that the ball we observed was not black. In symbols, calculate: \(P(\text{red} \mid \neg \text{black}) = P(\text{red} \mid \{ \text{red}, \text{white} \} )\).
General rule: \[P(A|B)=\frac{P(A\cap B)}{P(B)}.\]
\[\begin{align} P(A=R|B=\{R, W\})&=\frac{P(R \cap \{R, W\})}{P(\{R, W\})}=\frac{P(R)}{P(\{R, W\})}.\\ \textrm{Marginal Probability } P(R) &=0.15.\\ \textrm{Marginal Probability } P(\{R,W\}) &=0.45+0.05+0.075+0.075=0.65.\\ \frac{P(R)}{P(\{R, W\})}&=\frac{0.15}{0.65} \approx 0.23077. \end{align}\]
Using Bayes rule, calculate the probability of a heads outcome given that we observed a draw of a red ball.
General rule: \[P(B_i|A)=\frac{P(B_i\cap A)}{P(A)}=\frac{P(A|B_i) \cdot P(B_i)}{P(A)}.\]
\[\begin{align} P(B=H|A=R)&=\frac{P(A=R|B=H)\cdot P(H)}{P(A=R)}\\ &=\frac{0.1 \cdot 0.75}{0.15}\\ &=0.5 \end{align}\]
Here is a common mistake in probabilistic reasoning. Jones knows that a medical test has only a 0.5% chance of yielding a false alarm, i.e., diagnose a diseas when in fact there is none. The test indicates that Jones has the diseas. So, Jones thinks that the chance that they are affected is 99.5%. That’s not true. Let’s find out why.
Imagine the following for concreteness. A new blood glucose meter should enable patients to recognize elevated blood glucose. If a certain threshold was exceeded the device gives a warning signal. It is known that 50 of 1000 people have an elevated blood sugar. The device gives a warning signal with 99.5% probability, if the threshold was exceeded. With a probability of 2% the device gives also a warning signal, although the threshold was not exceeded.
Assume a person X to whom an increased blood sugar value has not yet been noticed. How certain can person X be, that he or she has an elevated blood sugar value, if the device gives a warning signal?
Solution:
Given:
Searched: \[P(D|W)=?\]
Solution: \[P(D|W)=\frac{P(W|D) \cdot P(D)}{P(W|D) \cdot P(D)+P(W|\bar D) \cdot P(\bar D)}\]
(0.995*0.05)/(0.995*0.05+0.02*0.95)
## [1] 0.7236364
Suppose you are presented with three desks, each with two drawers containing one coin each. There are two kinds of coins: Silver(S) and Gold(G). The desks are such that one has Gold coins in both the drawers (GG), one has silver coins in both (SS), and the third desk has a gold coin in one drawer and a silver coin in the other (GS).
Now, suppose you are free to choose ONE of the three desks at will. After you choose a desk, one of the two drawers is opened at random, and you find a gold coin inside it. What is the chance that the other drawer also contains a gold coin?
Note: You may be tempted initially to conclude that the said probability of the second drawer also containing a gold coin is 1/2 since the choice is equally likely between the GS desk and the GG desk (SS desk being eliminated now from the scenario). But there is a flaw in this reasoning!
Use Bayes’ rule to find out the answer. The observation here is ‘sighting of a gold coin’ and you are looking for the conditional probability \(P(\text{GG} \mid \textit{sight gold})\).
Solution:
Joint probability: P(A,B)
A | B | P(A,B) |
---|---|---|
S | \(D_{GG}\) | 0 |
S | \(D_{GS}\) | 0.5 |
S | \(D_{SS}\) | 1 |
G | \(D_{GG}\) | 1 |
G | \(D_{GS}\) | 0.5 |
G | \(D_{SS}\) | 0 |
Marginal probability: P(B)
B = \(D_{SS}\) | B = \(D_{GS}\) | B = \(D_{GG}\) | |
---|---|---|---|
P(B) | \(\frac{1}{3}\) | \(\frac{1}{3}\) | \(\frac{1}{3}\) |
Marginal probability: P(A)
A = S | A = G | |
---|---|---|
P(A) | 1.5 | 1.5 |
General rule: \[P(B_i|A)=\frac{P(B_i\cap A)}{P(A)}=\frac{P(A|B_i) \cdot P(B_i)}{P(A)}.\]
\[\begin{align} P(B=D_{GG}|A=G)&=\frac{P(B=D_{GG} \cap A=G)}{P(A=G)}, \\ &=\frac{1}{1.5}\\ &\approx 0.66667 =\frac{2}{3} \end{align}\]
Let’s exercise with creating and plotting samples. Let’s assume that there are n_critters <- 10000
critters. These critters are initially aligned vertically at the same horizontal zero position critter_positions <- rep(0,n_critters)
. Now each critter will perform n_steps <- 10000
random steps. Each step moves the critter left or write along the \(x\) axis by a random amount between -1 and 1. We can draw such a number using the function runif(n = XXX, min = -1, max = 1)
which will return a vector of XXX
samples of numbers between -1 and 1, each sampled uniformly at random.
Update the vector critter_positions
a number of n_steps
times, by the random procedure described above. The result will be a vector of where each of the critters is located along the \(x\) axis after these steps (having wiggled around with great enthusiasm).
n_steps <- 10000
n_critters <- 10000
critter_positions <- rep(0,n_critters)
for (i in 1:n_steps) {
critter_positions <- critter_positions + runif(n = n_critters, min = -1, max = 1)
}
Calculate the mean and the standard deviation of the critter positions after the wiggling.
Draw a density plot of two vectors (overlayed, with alpha = 0.5). The first vector is the critter positions you calculated. The second vector is a vector of the same length with samples from a normal distribution, whose mean is the mean of the critter positions, and whose standard deviation is the standard deviation of the critter_positions.
The resulting plot should look (roughly) like this:
tibble(
critter_positions = critter_positions,
normal_samples = rnorm(n = n_critters, mean = mean(critter_positions), sd = sd(critter_positions))
) %>%
pivot_longer(cols = everything(),
names_to = "source",
values_to = "value"
) %>%
ggplot(aes(x = value, color = source, fill = source)) +
geom_density(alpha = 0.5)
Now do the same thing (initializing, wiggling, and plotting), but for only n_critters <- 50
critters, while still using the full 10000 samples for the normal distribution. Your result might look like the plot below.
n_steps <- 10000
n_critters <- 50
critter_positions <- rep(0,n_critters)
for (i in 1:n_steps) {
critter_positions <- critter_positions + runif(n = n_critters, min = -1, max = 1)
}
tibble(
source = c(rep("critter_positions", n_critters), rep("normal_samples", 10000)),
value = c(critter_positions,rnorm(n = 10000, mean = mean(critter_positions), sd = sd(critter_positions)))
) %>%
ggplot(aes(x = value, color = source, fill = source)) +
geom_density(alpha = 0.5)
Name two or three points that are noteworthy about these two simulations with respect to sampling, probability and/or the normal distribution. Be very brief and to-the-point in your answer.
Please submit exercise 1 of the last homework with this homework set as well, even if you have already done so.