Homework 8: Hypothesis Testing

Instructions

Exercise 1: Adressing hypotheses about coin flips with hypothesis testing (45 points)

The goal of this exercise is to get experience in calculating \(p\)-values for different kinds of (directed and undirected) hypotheses. We use the (allegedly easiest) case of coin flips. We also would like to get more confident in how to report the results we obtain from our code.

To obtain the right inuitions for \(p\)-values for directed and undirected hypotheses, you can use the function plot_binomial provided below. It allows you to plot the (binomial) sampling distribution and to specify which (if any) parts of the plot you want to highlight. It also calculates the total probability for all the values (of \(k\)) in the vector supplied to its argument highlight. The result of this calculation is shown in the plot’s title. See the examples below to understand what the function does and how you can use it to develop better intuitions about \(p\)-values.

plot_binomial <- function(theta, N, highlight = NULL) {
  # put the data together
  plotData <- tibble(x = 0:N, y = dbinom(0:N, N, theta))
  # make a simple bar plot
  out_plot <- ggplot(plotData, aes(x = x , y = y )) + 
    geom_col(fill = "gray", width = 0.35) +
    labs(
      x = "test statistic k",
      y =  str_c("Binomial(k, ", N, ", ", theta, ")")
    )
  # if given, highlight some bars in red
  if (!is.null(highlight)) {
    plotData2 = tibble(x = highlight, y = dbinom(highlight, N, theta))
    out_plot <- out_plot + 
      geom_col(
        data = plotData2, 
        aes(x = x, y = y), 
        fill = "firebrick", 
        width = 0.35
      )  +
      ggtitle(
        str_c(
          "Prob. selected values: ", 
          sum(dbinom(highlight, N, theta)) %>% signif(5)
          )
        )
  }
  out_plot
}
plot_binomial(theta = 0.5, N = 24, highlight = c(7:16))

plot_binomial(
  theta = 0.5, 
  N = 24, 
  highlight = which(dbinom(0:24, 24,p=0.5) <= dbinom(7, 24,p=0.5))-1
)

In the following, you will be confronted with different scenarios, each of which has a different research hypothesis. For each of these, think about what it is that you want to learn from a test based on a \(p\)-value. For each scenario, you should therefore fix a suitable null hypothesis about the coin’s bias. Remember that a \(p\)-value quantifies evidence against the specified null-hypothesis (keeping an alternative hypothesis in the back of our minds to distinguish the case of genuinely testing a point-valued null hypothesis from the case of testing an interval-based null hypothesis via a single value used to generated the sampling distribution). You should therefore specify a null-hypothesis (and an alternative hypothesis) that is most conducive of sheddling light on your research question. (Once more, the goal of this exercise is for you to become more comfortable with the whole logic of using \(p\)-values to draw conclusions of interest for a research goal.) When asked to judge significance, please use a significance level of \(\alpha = 0.05\).

Case 1: Manufacturer says: “\(\theta = 0.8\)” (20 points)

The manufacturer of a trick coin claims that their product has a bias of \(\theta = 0.8\) of coming up heads on each toss. You make it your “research hypothesis” to find out whether this is true. Suppose you tossed the coin \(N = 45\) times and you observed \(k=42\) heads.

Fix the null-hypothesis (2 points)

What null-hypothesis would you like to fix for a test that might shed light on your research question? What is the alternative hypothesis?

Plot the sampling distribution (2 points)

Use the function plot_binomial to plot the sampling distribution for this null-hypothesis. Highlight a single value in this plot, namely the one for the observed value of the test statistic \(k=42\).

More extreme values of \(k\) (4 points)

Given the reserch question, what values of \(k\) would count as more extreme evidence against the chosen null-hypothesis? Use the function plot_binomial to plot the sampling distribution, but highlight all the values of \(k\) that provide at least as strong evidence against the null-hypothesis as the observed data \(k=42\) does.

One- or two-sided test? (2 points)

Based on your answer to the previous question, is this a one-sided or a two-sided test?

\(p\)-value (2 points)

What is the \(p\)-value of this test?

Compare to built-in function (2 points)

Use the built-in function binom.test to run that same test. (You should obtain the same \(p\)-value as what you answered in the previous question.)

Interpret and report your results (6 points)

Give one or two concise sentences stating your results and the interpretation of them regarding your research hypothesis. An example (which is absurdly wrong!) could be:

We conducted a binomial test assuming the null-hypothesis that the coin is fair \(\theta < 0.5\) and observed a sigificant test result (\(N = 42\), \(p \approx 1.2\)). This means that we find overwhelming evidence in favor of the null-hypothesis. We therefore conclude that the coin is biased towards heads.

Case 2: Manufacturer says: “\(\theta \le 0.3\)” (20 points)

The manufacturer of a trick coin claims that their product has a bias of \(\theta \le 0.3\) of coming up heads on each toss. You make it your “research hypothesis” to find out whether this is true. Suppose you tossed the coin \(N = 32\) times and you observed \(k=15\) heads.

Fix the null-hypothesis (2 points)

What null-hypothesis would you like to fix for a test that might shed light on your research question? What is the alternative hypothesis?

Plot the sampling distribution (2 points)

Use the function plot_binomial to plot the sampling distribution for this null-hypothesis. Highlight a single value in this plot, namely the one for the observed value of the test statistic \(k=15\).

More extreme values of \(k\) (4 points)

Given the reserch question, what values of \(k\) would count as more extreme evidence against the chosen null-hypothesis? Use the function plot_binomial to plot the sampling distribution, but highlight all the values of \(k\) that provide at least as strong evidence against the null-hypothesis as the observed data \(k=15\) does.

One- or two-sided test? (2 points)

Based on your answer to the previous question, is this a one-sided or a two-sided test?

\(p\)-value (2 points)

What is the \(p\)-value of this test?

Compare to built-in function (2 points)

Use the built-in function binom.test to run that same test. (You should obtain the same \(p\)-value as what you answered in the previous question.)

Interpret and report your results (6 points)

Give one or two concise sentences stating your results and the interpretation of them regarding your research hypothesis.

Case 3: Manufacturer says: “\(\theta \ge 0.6\)” (15 points)

The manufacturer of a trick coin claims that their product has a bias of \(\theta \ge 0.6\) of coming up heads on each toss. You make it your “research hypothesis” to find out whether this is true. Suppose you tossed the coin \(N = 100\) times and you observed \(k=53\) heads.

Use the built-in function binom.test to calculate a \(p\)-value for this case. (Use the previous steps for yourself if it helps you see through how to set this up.) State and interpret your results like you did in the last part of the previous cases.

Exercise 2: Pearson’s \(\chi^2\)-test of goodness of fit (20 points)

The goal of this exercise is to make you feel comfortable with applying and interpreting the results of a Pearson \(\chi^2\)-test of goodness of fit.

Imagine you are on a funfair (German: Kirmes, Jahrmarkt). As usual, you head straight for the lottery booth (German: Losbude). The vendor adverises that of all tickets 5% are mega-winners, 15% are winners, 15% are free rides on the fairy-go-round (German: Karussell), 35% are consolation prizes (German: Trostpreise) and only the remaining 30% are blanks (German: Nieten). You are your nerdy self, as usual, and you buy 50 tickets and count the number of tickets in each category. What you got is this:

n_obs <- c(
  mega_winner = 1, # hurray!
  winner = 2,
  free_ride = 10,
  consolation = 18,
  blank = 19
)

Plot data and prediction (10 points)

The goal of this exercise is to further hone your plotting skills, this time also challenging you to come up with your own idea for a good visual presentation.

Find an informative way of plotting the observed counts and the counts you would have expected to see when buying 40 tickets, which is this vector:1

expected <- c(
  mega_winner = 5,
  winner = 15,
  free_ride = 15,
  consolation = 35,
  draws = 30
) * sum(n_obs) / 100

Test the vendor’s claim (10 points)

Use the built-in function chisq.test to test the vendor’s claim about the probability of obtaining a ticket from each category based on the counts you observed. Interpret and report your findings like you would in a research report.

Exercise 3: Some claims about frequentist testing (15 points)

The goal of this exercise is to make you think deeply about some of the key notions we discussed in class, and how they might relate to each other. On top of this, this exercise is a good preparation towards the final exam, which is likely to contain some truth-value judgement questions similar to this exercise. (Caveat: This is not to say that the kinds of statements to occur in the exam will be or feel exactly like these; they might be easier or, by the time, more familiar.)

For the following statements judge whether they are true or false. If you think that a case is somehow controversial, you can give one short sentence to justify your response.

  1. A \(p\)-value of \(0.00615\) should be interpreted as implying that the probability that the null-hypothesis is true is below 1%.
  2. If we obtain a 95% confidence interval of \([0.2; 0.4]\) for a binomial test, then we know that a two-sided binomial test for the null hypothesis \(\theta = 0.1\) will be statistically significant at the significance level \(\alpha = 0.05\).
  3. If we obtain a significant test result for a two-sided binomial test for the null hypothesis \(\theta = 0.1\), we know that the corresponding 95% CI for the observed data will include the value 0.1.
  4. The Central Limit Theorem implies that a binomial distribution for large enough \(N\) is closely approximated by a normal distribution, no matter what \(\theta\) we are assuming.
  5. If we obtain a significant test result at significance level \(\alpha = 0.05\), then this means that when we repeat the exact same experiment we will find a significant result in at least 95% of the cases.

  1. Notice that the Pearson \(\chi^2\)-test rests on an approximation of normality, which is only sufficiently accurate if we have enough samples. A rule-of-thumb is that at most 20% of all cells should have expected frequencies below 5 in order for the test to be applicable.