Instructions

Required R packages

library(tidyverse)
library(TeachingDemos)

1. Testing the bias of a coin

This task is all about a coin. Imagine you’re about to make a life changing decision based on a coin flip (don’t do this at home!). First you want to check whether or not the coin is fair. You decide to use your knowledge of experiments and statistics to determine this!

You flip the coin 10 times and count how many times it comes up heads. You count 8 heads. Is the coin biased? Let’s find out!

# # Uncomment and change your lastname
#
# lastname <- "yourlastname"
#
# char2seed(lastname)

a. Understanding the binomial distribution

The number of heads in a set of coin flips follows the binomial distribution:

\[\text{Binom}(K = k ; n, \theta) = \binom{n}{k} \, \theta^{k} \, (1-\theta)^{n-k}\]

Here \(\binom{n}{k} = \frac{n!}{k!(n-k)!}\) (said “n choose k”) is the binomial coefficient. It gives the number of possibilities of drawing an unordered set with \(k\) elements from a set with a size of \(n\). \(\theta\) is the bias of the process (the probability of a single success).

The binomial distribution is built into R. The following commands allow you to interact with it:

  • Density:
    • dbinom(x, size, prob, log = FALSE)
  • Cumulative probability distribution:
    • pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE)
  • Quantile:
    • qbinom(p, size, prob, lower.tail = TRUE, log.p = FALSE)
  • Random generation:
    • rbinom(n, size, prob)

You can read about these functions, including the input arguments, by running ?Binomial in the console.

b. Simulating a fair coin

You realise a reasonable approach would be to investigate how a fair coin would behave. Unfortunately you don’t have a coin that you know is fair (otherwise you’d probably use it for your decision) so you decide to simulate one in R.

Uncomment and fill in the ??s in the code to see the result of flipping a fair coin 10 times. That is, a single experiment of 10 trials.

# # Uncomment and fill in the code
#
# n_exp <- 1
# trials <- ??
# theta <- ??
#
# outcome <- rbinom(n = n_exp, size = trials, prob = theta)
#
# print(outcome)

How many heads were there?

ANSWER:

(your answer here)

c. Multiple experiments

But that was just a single experiment. Being quite an intelligent statistician, you realise that to get a good estimate, you should make more than one simulation. You decide to simulate the experiment 20 times and plot the results.

Fill in the code to do this:

# # Uncomment and fill in the code
#
# n_exp <- ?? # number of experiments
# trials <- ?? # number of trials in an experiment
# theta <- ?? # bias
#
# outcomes <- data_frame(outcomes = rbinom(n_exp, trials, theta))
#
# outcomes %>%
#   ggplot(aes(outcomes)) +
#   geom_bar() +
#   scale_x_continuous(breaks = 0:trials, limits = c(-1, trials + 1)) +
#   xlab("Number of heads in ?? flips") +
#   ylab("Frequency") +
#   theme_classic()

What was the most frequently observed number of heads?

ANSWER:

(your answer here)

d. More experiments

You realise doing only 20 simulations isn’t really enough for a nice graph. So you try doing 5000, because with modern technology, why not?

# # Uncomment and fill in the code
#
# n_exp <- ?? # number of experiments
# trials <- ?? # number of trials in an experiment
# theta <- ?? # bias
#
# outcomes <- data_frame(outcomes = rbinom(n_exp, trials, theta))
#
# outcomes %>%
#   ggplot(aes(outcomes)) +
#   geom_bar() +
#   scale_x_continuous(breaks = 0:trials, limits = c(-1, trials + 1)) +
#   xlab("Number of heads in ?? flips") +
#   ylab("Frequency") +
#   theme_classic()

e. Hypotheses

So you have a pretty nice graph. This graph is beginning to approximate the sampling distribution. You see that in some of the simulated experiments, the fair coin showed 8 heads out of 10, but not that frequently.

You realise you need a testable hypothesis about your coin. You remember that in classical statistics you have two hypotheses: the null hypothesis \(H_0\) and the alternative hypothesis \(H_1\).

What would these hypotheses be?

(your answer here)

f. Testing the null-hypothesis with p-values

To test the null hypothesis, you first decide on a false-positive rate (the acceptable rate that this procedure will claim the coin is unfair when it is actually fair). In this case you choose 15%. You then calculate the probability of the observed data, assuming the null hypothesis is true.

You do this by summing up the probability mass for 8, 9 or 10 heads using dbinom(). Because you had no reason to believe the coin was biased in one particular direction (e.g. more biased to heads or more biased to tails) before seeing the data you should also add the mass for 0, 1, or 2 heads. This is then the probability of getting 8 or more heads or 8 or more tails from 10 flips. You compare this value to the false positive rate. This is called a “two-tailed” test.

# # Uncomment and fill in the code
#
# outcomes <- c(??,??,??,??,??,??)
# trials <- ??
# theta <- ??
#
# sum(dbinom(x = outcomes, size = trials, prob = theta))

What is the probability of the data assuming the null hypothesis is true? Is this higher or lower than your acceptable false positive rate?

(your answer here)

This value is also called a p-value. Again: A p-value is the long-run probability that the data would come up as observed (or more extreme) if the null hypothesis is true.

Compare your outcome to the number provided by the function binom.test() after inserting the correct information in the following template:

# # Uncomment and fill in the code
#
# binom.test(x = ??, n = ??, p = ??, alternative = "two.sided")

g. Confidence intervals

You remember that you can also investigate the null-hypothesis with confidence intervals.

You remember something about confidence intervals from your statistics class. You remember that a 95% confidence interval is the range of values that …

Your memory is hazy, so you decide not to trust it and check Wikipedia instead.

It says: “…[I]f confidence intervals are constructed using a given confidence level from an infinite number of independent sample statistics, the proportion of those intervals that contain the true value of the parameter will be equal to the confidence level.”

So you interpret that to mean that if you calculate a hundred 85% confidence intervals from a hundred different experiments with a fair coin, around 85 of them will cover the value 0.5. Is that intuitive? If it’s not, perhaps making a visualisation will help.

# # Uncomment and fill in the code
#
# n_exp <- ?? # number of simulated experiments
# trials <- ?? # number of coin flips per experiment
# theta <- ?? # null bias
# conf_level <- ?? # confidence level
#
# # simulate the experiments
# outcomes <- rbinom(n = ??,
#                    size = ??,
#                    prob = ??)
#
# # create empty tibble to record the data
# conf_int_data <- data_frame(experiment = 1:n_exp,
#                         CI_low = NA,
#                         CI_high = NA,
#                         estimate = NA,
#                         covered = NA)
#
# # calculate a confidence interval for each simulated experiment
# for (i in 1:n_exp) {
#   ci <- binom.test(x = outcomes[i],
#                    n = trials,
#                    p = theta,
#                    conf.level = conf_level)
#
#   # record the result into the tibble
#   conf_int_data$experiment[i] <- i
#   conf_int_data$estimate[i] <- outcomes[i]/trials
#   conf_int_data$CI_low[i] <- ci$conf.int[1]
#   conf_int_data$CI_high[i] <- ci$conf.int[2]
#
#   # check if CI covers the null bias
#   conf_int_data$covered[i] = ifelse(theta > ci$conf.int[1] &
#                                     theta < ci$conf.int[2],
#                                     TRUE, FALSE)
# }
#
#
# # graph the result
# conf_int_data %>%
#   ggplot(mapping = aes(x = experiment)) +
#   geom_point(aes(y = estimate, color = covered), alpha = 0.5) +
#   geom_errorbar(aes(ymin = CI_low, ymax = CI_high, color = covered)) +
#   geom_abline(intercept = theta, slope = 0, linetype = "dashed", alpha = 0.5) +
#   geom_label(aes(0, theta, label = "Null bias", hjust = 0.1), nudge_x = -19) +
#   ylim(0, 1) +
#   labs(colour = "Null bias within CI?") +
#   ylab("estimated theta") +
#   xlab("Experiment No.") +
#   theme_classic()

What proportion of the calculated intervals covered the null bias value?

# calculate the proportion of CIs that covered the bias

# YOUR CODE HERE

ANSWER:

(your answer here)

h. Confidence intervals in the experiment

The idea of testing the null hypothesis is to see if the coin could be fair given the data. You realise you can also do this by calculating a confidence interval for the estimate based on the observed data. The calculated confidence interval will have a long run probability of containing the true value based on the confidence level. You can think of it as calculating just one of the bars in the above graph.

You decide that you’re okay with an 85% confidence level. And, because you looked carefully at the code in the previous steps, you realised the function binom.test() also calculates confidence intervals. You then use this to calculate the 85% confidence interval for \(\theta\) based on the data observed (8 heads out of 10 flips).

# YOUR CODE HERE

What is the 85% confidence interval for the bias of the coin (\(\theta\)) based on 8 out of 10 heads? What does this mean?

ANSWER:

(your answer here)

i. Making a decision.

You have calculated both the p-value and the confidence interval for 8 or more heads (or tails) out of 10.

What should you conclude? Why?

ANSWER:

(your answer here)


2. Further exploration of coin flips:

a. Larger sample sizes

What would have happened if you flipped the coin 100 times and saw 65 heads? Use a 95% confidence interval this time. Would you be more confident or less confident that the coin is unfair?

First draw the confidence interval graph:

# # Uncomment and fill in the code
#
# n_exp <- ?? # number of simulated experiments
# trials <- ?? # number of coin flips per experiment
# theta <- ?? # null bias
# conf_level <- ?? # confidence level
#
# # simulate the experiments
# outcomes <- rbinom(n = ??,
#                    size = ??,
#                    prob = ??)
#
# # create empty tibble to record the data
# conf_int_data <- data_frame(experiment = 1:n_exp,
#                             CI_low = NA,
#                             CI_high = NA,
#                             estimate = NA,
#                             covered = NA)
#
# # calculate a confidence interval for each simulated experiment
# for (i in 1:n_exp) {
#   ci <- binom.test(x = outcomes[i],
#                    n = trials,
#                    p = theta,
#                    conf.level = conf_level)
#
#   # record the result into the tibble
#   conf_int_data$experiment[i] <- i
#   conf_int_data$estimate[i] <- outcomes[i]/trials
#   conf_int_data$CI_low[i] <- ci$conf.int[1]
#   conf_int_data$CI_high[i] <- ci$conf.int[2]
#
#   # check if CI covers the null bias
#   conf_int_data$covered[i] = ifelse(theta > ci$conf.int[1] &
#                                     theta < ci$conf.int[2],
#                                     TRUE, FALSE)
# }
#
#
# # graph the result
# conf_int_data %>%
#   ggplot(mapping = aes(x = experiment)) +
#   geom_point(aes(y = estimate, color = covered), alpha = 0.5) +
#   geom_errorbar(aes(ymin = CI_low, ymax = CI_high, color = covered)) +
#   geom_abline(intercept = theta, slope = 0, linetype = "dashed", alpha = 0.5) +
#   geom_label(aes(0, theta, label = "Null bias", hjust = 0.1), nudge_x = -19) +
#   ylim(0, 1) +
#   labs(colour = "Null bias within CI?") +
#   ylab("estimated theta") +
#   xlab("Experiment No.") +
#   theme_classic()

Then calculate a confidence interval for the estimate based on the observed data:

# YOUR CODE HERE

3. Explore data from in-class experiment

The experiment you took part in in this class is a replication of this study. Let’s see if you did better than the participants of the previous study.

a. Load and inspect the data

Download the data file (link provided in class) and read it into your R session. Then inspect the data by printing it and viewing it with RStudio’s built-in view function.

d = read_csv('results.csv')
show(d)
View(d)

Answer the following questions about this data set:

  • How many rows and columns does the tibble have?
  • What is the data type of the column “condition”? What is the meaning of this information? Do you need it?
  • Where in the data frame are the responses stored? What is the data type of the columns containing the responses?
  • What do you think is stored in column with name “RT”?
  1. Did anybody leave nice comments?

To see the comments given by anyone, we take the “comments” column, cast it into data type factor and ask which levels (different kinds of entries) are in it.

d$comments %>% as.factor() %>% levels()

c. Processing the data for plotting and analysis

To plot and analyse the data, we want to transform choices of ‘true’ and ‘false’ into numbers 1 and 0. This can be done with mutate, for instance. Fill in the code below to insert a new “response” column contains numbers 1 and 0, not strings. Use the ifelse(CONDITION, what-if-true, what-if-false) construction.

d = d %>% mutate(response = ifelse( CONDITION, what-if-true, what-if-false ))

d. What is the mean endorsement rate for statements?

Calculate the proportion of true answers accross all participants and conditions.

f. Separate means for participants and conditions

The general mean does not tell us whether individual subjects differed in their answers and whether some test sentences (conditions) where more or less difficult to get right. Let’s therefore create a summary tibble which contains the mean for each condition first. To get it, you first group d by variable condition and then use summarize to obtain a summary statistic, here: the mean of responses, for each grouping level in variable condition. Complete the following code and make sure the variable you introduce to contain the summaries is called mean:

summary_group = d %>% GROUP %>% SUMMARIZE-MEAN-RESPONSE

Do the same thing, but group by participants (find out what variable contains information about which subject gave which response):

summary_participant = d %>% GROUP %>% SUMMARIZE-MEAN-RESPONSE

g. Plot results

Let’s plot the results of the means for each condition.

summary_group %>% ggplot(aes(x = FILLME, y = FILLME)) + geom_bar(stat = "identity")

Produce a similar plot for participants. Is this a good representation? Can you think of (and implement) a better one?

h. Compare the results to the previous study

Out of 34 master students in the previous study, 11 got the first question (condition 1) wrong. Use a binomial test to check whether the answer rates in the new data set are different from the previous proportion 11 / 34.