Instructions
echo = F
(so as not to have it show up in your output):The goal of this exercise is to get more comfortable with probability distributions in R, using functions like rnorm
and dbinom
.
Random variable \(K\) follows a binomial distribution, so that \(k \sim \text{Binom}(\theta, N)\), where \(\theta\) is the bias towards heads (= success, number 1) and \(N\) is the number of coin tosses (trials). Plot the probability mass function for \(0 \le k \le N\) on the \(x\)-axis, for each one of the four values \(\theta \in \{0.2, 0.5, 0.7, 0.9\}\), keeping \(N=24\) fixed. Use a facet plot. The outcome should look roughly like the plot below.
Describe in one concise sentence what one of the four faceted plot shows. For example (but totally wrong): “The graphs in each facet show the prior probability of latent parameter \(k\) for data observations from different conditions.”
Calculate the probability of observing no more than 7 successes for \(N= 24\) and each \(\theta \in \{0.2, 0.5, 0.7, 0.9\}\).
Suppose variable \(X\) follows a normal distribution, so that \(x \sim \text{Norm}(\mu,\sigma)\) for some mean \(\mu\) and standard deviation \(\sigma\). For each combination of parameter values \(\mu \in \{0, 10\}\) and \(\sigma \in \{1,5\}\), take 10,000 random samples from \(X\). Plot the results as a histogram, using facets. The output could look roughly like the plot below.
Calculate the probability of observing an outcome of at least 0.5, \(P(X \ge 0.5)\), for each of the four parameter value combinations used above. The outcome of your calculation should be:
## # A tibble: 4 x 3
## mu sd prob
## <dbl> <dbl> <dbl>
## 1 0 1 0.309
## 2 0 5 0.460
## 3 10 1 1
## 4 10 5 0.971
Explain the results from the previous task. Why do we see this ordering in the probability scores for different parameter pairs, e.g., why does \(\mu = 1\) and \(\sigma = 1\) score the lowest, followed by …, followed by … etc.?
Consider the example introduced in class, where we set out to compare the means of average prices of different types of avocados. We assume that there is an indicator variable \(g_i\) (corresponding to type
), such that, for example, \(g_i = 0\) is for observations from organic avocados and \(g_i = 1\) is for observations from conventionally grown avocados. The simplest instance from the T-Test Model family assumes that the first group has a mean \(\mu_0\) and the second has \(\mu_1\), while both share the same standard deviation \(\sigma\). Using the notation adopted in this course (see below) we are able to write the likelihood function succinctly like so:
\[ y_i \sim \text{Normal}(\mu_{g_i}, \sigma) \]
In a Bayesian approach, the priors over parameter values could use a normal distribution for \(\mu_i\) and a truncated normal distribution for \(\sigma\) (since \(\sigma\) must be positive). The plot below shows the resulting model.
A point of general importance can be made in connection to this example. It is possible to different built prior structures around the same likelihood function. Some parameterizations can be more useful for a given purpose than others. Some parameterization make it easier to formalize prior domain knowledge than others. To see this, consider the case at hand where we might be specifically interested in the question of how big the difference between the means is. We can therefore also level the model shown below.
The variant of the T-Test Model, formulated in terms of differences between groups, in the second plot is what is usually used in common practice. The advantage is, for example, that it is easy to address the question whether the means are identical, a case which is represented by a single parameter value \(\delta = 0\) (rather than an infinite set of pairs \(\mu_0 = \mu_1\)). It is also easier to specify beliefs about the differences between groups. The prior on \(\delta\) in the second model shown above assumes that we expect a priori that \(\delta = 0\), but specifying different standard deviations for this prior allows to formalize different degrees of certainty. We could for example use a skeptical prior, i.e., an initial model configuration that is skeptical about a group difference, by setting the standard deviation for that parameter’s prior to be very low.
Load the avocado price data, as shown in the course script. Store it in variable avocado_data
.
Fill in this function:
lh_ttest <- function(y0, y2, sigma, mu, delta) {
# returns the likelihood of observing data y0 and y1 for the given parameter values
}
Apply this function to the avocado data, where group 0 is organic, fixing \(\delta = -0.5\), fixing \(\sigma = 0.3\), \(\mu_0 = 1.65\). The output should be zero! This is due to number imprecision. There are so many data points that the likelihood is so small that we only get output zero! This is why we want to calculate with log-probabilities. Here is a function that returns the log probabilities instead.
# this function returns the log-likelihood and also allows delta to be a vector
llh_ttest <- function(y0, y2, mu, sigma, delta) {
map_dbl(delta, function(d) {
sum(dnorm(y0, mu, sigma, log = T)) + sum(dnorm(y1, mu + d, sigma, log = T))
}
)
}
Plot the log-likelihood of the avocado data as a function of \(\delta\), fixing \(\sigma = 0.4\), \(\mu_0 = 1.65\). Make \(\delta\) lie in the range of -5 to 5. The result could look like the plot below.
Write an R function that generates one randomly sampled data set, consisting of a pair \(y_0\) and \(y_1\) of the same length as the original data. The function takes as input a specification of all parameters of the likelihood function. Return the data as a tibble. Example output is shown below.
## # A tibble: 18,249 x 2
## type average_price
## <chr> <dbl>
## 1 organic 1.53
## 2 organic 1.25
## 3 organic 1.38
## 4 organic 1.00
## 5 organic 1.35
## 6 organic 1.67
## 7 organic 1.71
## 8 organic 1.67
## 9 organic 1.29
## 10 organic 2.43
## # ... with 18,239 more rows
Write a function that takes as input the parameters of the model, then calls the function you defined in the previous exercise and then plots the predictive sample in the same way as we plotted the avocado data in class. The output could look like the plot below (for some crooked parameter values).
Play around by inserting different parameter triples into the plotting function you just generated. Compare the plot for the samples to the original data in order to come up with (subjective!) beliefs, informed by data, about the likely parameter values of \(\delta\) (ignoring (= marginalizing over) the other parameter values). Express your informed beliefs about \(\delta\) in language, ideally supported by probability notation like \(\delta \sim \text{Normal}(1000, 0.1)\).
Unicorns exist. The always have. They always will. The only remaining question is: how many are there really? - A deep mystery, until one day a Bright One had an idea. The Bright One strolled for days through the vast forest where the unicorns dwell, and it managed to convince all of the \(K=20\) unicorns that it met to wear a badge-of-recognition, which they all happily did. A month later, the Bright One walked the forest again and it met another \(n = 24\) unicorn. Of these, \(k=7\) carried the badge-of-recognition, the other \(n-k = 17\) did not and swore they never had met the Bright One before. (Unicorns never lie.)
Map this story onto an urn-scenario. E.g., maybe a coin has to be flipped to select an urn, maybe not. If so, what’s its bias? What do we know about the content of any urns? So, what are known variables, what are unknown variables. (Hint: there are four variables, one of them is latent, the others observed.)
Using the conventions introduced in class, draw a picture of the probabilistic dependency between the variables. (It’s fine to draw by hand, and include a (low-resolution) scan or picture. You do not need to give formulas alongside the graph)
Plot the likelihood function in R. (Hint: you want to use the function ‘dhyper’. Look up what it is and does! Read up on the Hypergeometric distribution in a source you trust and like. This will help with the whole exercise. You will plot the latent variable on the \(x\)-axis, keeping all the other variables fixed to their known values. The \(x\)-axis should range from 24 to a reasonable upper bound, which we suggest to set at 300.)
Suppose you had no prior beliefs about the number of unicorns out there, or you thought that any number (say up to and including 300) was equally likely. Then, based on the plot of the likelihood function, what would be your best guess for how many unicorns there are?