Instructions

Grading scheme

Suggested readings

Required R packages


As always, load the required packages and change the seed to your last name.

# library(tidyverse)
# library(TeachingDemos)
# library(BSDA)
#
# lastname <- "YOURLASTNAME"
#
# char2seed(lastname)

This homework sheet is about analysing IQ score results from two groups of people. One group had a placebo (sugar pill) and the other had a “smart drug” before taking the test.

You have two research questions to answer:

  1. Is the mean IQ of the placebo group different from the supposed population mean?

  2. Is the mean IQ of the “smart drug” group different from the placebo group?

1. Inspecting the data

The data file BDACM_HW2_data.csv contains the results of the experiment.

a. Read the file

Read the data file into a tibble called iq_data using the function read_csv(). Then print the tibble

# YOUR CODE HERE

b. Tidy the data frame

Did you notice that the data frame is not “tidy”? The grouping variable is split into two different variables “placebo” and “drug” and the contents of these columns is the IQ scores.

Use the gather() function to tidy the data frame. In this case key should be “group” and value should be “IQ_score”. Make sure to specify that factor_key should be TRUE (this will set the type of the group column to factor. Put the tidy data frame into a new variable iq_data_tidy and print it.

# YOUR CODE HERE

c. Plot the data

Use ggplot to make a box plot of the IQ scores of the two groups. Show groups on the x-axis and IQ_scores on the y axis. Use ylim() of 50 to 150.

# YOUR CODE HERE

d. Another visualisation

It’s often useful to plot the same data in different ways. Another way to plot this data is as a density plot. Use ggplot to plot the density of the samples of the two groups. Plot each group in a different colour usingfill = group. Use an alpha = 0.5 on the density so that the distributions are visible on top of each other and an xlim() of 50 to 150.

# YOUR CODE HERE

e. Summarising the data numerically

Now that you have inspected the data visually, it’s time to take a closer look at the numbers. Use the familiar data %>% group_by() %>% summarise() code to find the mean IQ scores and standard deviations of the two groups. Save the result into a variable called iq_summary and print it.

# YOUR CODE HERE

2. Understanding the data generation process

To answer both of your research questions, it’s important to understand the difference between the data generation distribution and the sampling distribution.

a. The data generation distribution.

If we suppose that IQ scores are normally distributed with a true mean of 100 and a true standard deviation of 15, then choosing a single person from the population at random is akin to sampling from a norm(100, 15).

If we sampled 21000 people and plotted their IQ scores, the resulting distribution would begin to approximate the true data generation distribution.

# # YOUR CODE HERE:
#
# mu <- ... FILL ME ...
# sigma <- ... FILL ME ...
#
# n_samples <- 21000
#
# iq_simulation <- data_frame(sample = 1:n_samples,
#                             iq = NA)
#
# for(i in 1:n_samples){
#
#   # sample a set of values
#   sample_value <- rnorm(n = 1,
#                          mean = mu,
#                          sd = sigma)
#
#   # save value to data frame
#   iq_simulation$iq[i] <- sample_value
# }
#
# iq_simulation %>%
#   ggplot(aes(x = iq)) +
#   geom_histogram(binwidth = 1) +
#   scale_x_continuous(limits = c(50, 150)) +
#   xlab("IQ score") +
#   ylab("Frequency") +
#   ylim(0,1000) +
#   theme_classic()

b. The sampling distribution

However, in reality, our sample sizes are nowhere near large enough to approximate the true data generation distribution. Instead, we aggregate the sample into a single value (the mean).

You can then imagine a hypothetical distribution of sample means, from which a single sample mean is drawn. This is called the sampling distribution.

A sampling distribution is the hypothetical distribution from which a sample statistic (e.g. \(\bar{X}\)) is drawn. The mean of the sampling distribution is equal to the true mean of the population parameter.

We can run a simulation to visualise the sampling distribution. Suppose we have the same data generating distribution norm(100, 15).

If we repeatedly (say 1500 times) sample 14 people at random and calculate the mean IQ of each sample, the distribution of means will approximate the sampling distribution.

# # YOUR CODE HERE
#
# mu <- ... FILL ME ... # true population mean
# sigma <- ... FILL ME ... # true standard deviation
#
# sample_size <- ... FILL ME ... # number of people in each sample
# n_samples <- ... FILL ME ... # number of samples
#
# sample_means <- data_frame(sample = 1:n_samples,
#                            mean = NA,
#                            std_dev = NA)
#
# for(i in 1:n_samples){
#
#   # sample a set of values
#   sample_values <- rnorm(n = sample_size,
#                          mean = mu,
#                          sd = sigma)
#
#   # get the mean of the sample
#   sample_means$mean[i] <- mean(sample_values)
#
#   # get the standard deviation of the samle
#   sample_means$std_dev[i] <- sd(sample_values)
# }
#
# sample_means %>%
#   ggplot(aes(mean)) +
#   geom_histogram(binwidth = 1) +
#   scale_x_continuous(limits = c(50, 150)) +
#   xlab("Mean IQ score of sample") +
#   ylab("Frequency") +
#   ylim(0, 200) +
#   theme_classic()

The sampling distribution is much narrower than the data generating distribution. Why do you think this is?

ANSWER:

(your answer here)

3. \(z\) distributions and \(z\) tests

To answer your first research question (whether the mean IQ in the placebo group is higher than the expected population mean), you need to perform a one-sample test.

a. Creating a z-distribution

A \(z\) transformation is a standardization of a normal distribution such that the mean becomes 0 and the standard deviation becomes 1. We can convert the previously simulated sample mean distribution to a \(z\) distrubition like so. Fill in the correct formula for the z-transformation of sample means.

# FILL IN THE CODE:

# sample_means <- sample_means %>%
#     mutate(z_score = ... FILL ME... )
#
# sample_means %>%
#   ggplot(aes(x = z_score)) +
#   geom_histogram(binwidth = 0.1) +
#   scale_x_continuous(limits = c(-5, 5)) +
#   xlab("z-score of sample mean") +
#   ylab("Frequency") +
#   theme_classic() +
#   ylim(0, 100)

b. Calculating a z-score

Now that we have seen the connection between the sampling distribution and a derived z distribution, we can use this information for a z-test.

Suppose that you want to know whether the mean of your sample for the placebo group is different from the expected population mean (100). You can do this by seeing the probability of drawing the mean of the placebo group from the sampling distribution.

Historically, you would first convert your sample mean to a z-score. This allows for easy probability calculation (e.g. using a look up table), because the probabilities for many z scores are known.

What is the z-score for the mean of the placebo group, if we assume \(\mu = 100\) and \(\sigma = 15\)?

# YOUR CODE HERE:

(your answer here)

c. Calculating a p-value for the z score

We now have a z-score for our sample mean, but how improbable is it? Write code to find the two-tailed p-value for the calculated z-score.

# YOUR CODE HERE:

What is the p-value? What does this mean?

(your answer here)

d. Using built-in functions

The package BSDA has a z.test() function. Steps b and c can be replaced with it.

# # YOUR CODE HERE:
# placebo_data <- iq_data_tidy %>%
#   filter(group == ... FILL ME ...)
#
# placebo_data$IQ_score %>%
#   z.test(mu = ... FILL ME ..., sigma.x = ... FILL ME ...)

4. \(t\) distributions and one sample \(t\)-tests

a. Simulating a t-distribution

\(z\)-tests are a nice tool when we know the true standard deviation, however, often this is not the case. In part 2, we assumed the true standard deviation was 15 (because that’s the standard for IQ scores). However, we probably should not rely on the standard deviation being known in this case. (Actually, the data were generated from a known mean and standard deviation which will be revealed in class)

In cases where the standard deviation is unknown, it can estimated based on the standard deviation of the sample. We can use the simulated sample means and calculate a t-score for each, and so make an approximate plot of the sampling distribution.

# # YOUR CODE HERE:
#
# sample_means <- sample_means %>%
#   mutate(t_score = ... FILL ME ... )
#
# sample_means %>%
#   ggplot( ... FILL ME ... ) +
#   geom_histogram(binwidth = 0.1) +
#   scale_x_continuous(limits = c(-5, 5)) +
#   xlab("t-score of sample mean") +
#   ylab("Frequency") +
#   ylim(0, 100) +
#   theme_classic()

The \(t\)-distribution looks quite similar to the z-distribution. The main difference is that the “tails” are a bit larger. Actually, the \(t\)-distribution gets closer to a z-distribution with increasing “degrees of freedom” (in this case just the sample size minus one).

b. Calculating a t-score for the sample.

Just like the z-test, we can perform a t-test on our placebo sample, comparing it to the expected mean of 100. Calculate the test statistic (\(t\)-score) for the observed data.

# YOUR CODE HERE:

c. Calculating a \(p\)-value for the \(t\)-score

As before, we can compute the \(p\)-value of the data we observed, as the probability of observing a \(t\)-score that is not more extreme (= bigger) than the \(t\)-score of the observed data. Use the built-in \(t\)-distribution with function pt() to calculate the \(p\)-value

# # YOUR CODE HERE:

d. Using built-in functions

Base R has a t.test() function. Steps b and c can be replaced with it.

# # YOUR CODE HERE:

5. Two-sample \(t\)-tests

In order to answer the second research question (whether or not the mean IQ in the “smart drug” group is different from the placebo group), we need to perform a two-sample test. We have not yet discussed the two-sample \(t\)-test in class, but it follows the exact same general logic. The purpose of this exercise is to run you through the two-sample \(t\)-test.

The rationale behind the two-sample \(t\)-test is to determine whether two sets of (unpaired) samples are likely to come from the same population or not. (NB: Notice that there are actually two kinds of two-sample \(t\)-tests, one for paired samples, one for unpaired samples. We have paired samples if we have a measurement of the same individual in two experimental conditions. We have unpaired samples if we have measurements from different individuals in each group.) Formally, we have two random variables \(X\) and \(Y\). The \(n_X\) data points from the placebo group are iid samples from random variable \(X\); the \(n_Y\) data points from the smart-drug group are iid samples from the random variable \(Y\). We assume that both \(X\) and \(Y\) follow a normal distribution with means \(\mu_X\) and \(\mu_Y\) and standard deviations \(\sigma_X\) and \(\sigma_Y\). We do not assume that standard deviations \(\sigma_X\) and \(\sigma_Y\) are known, but we do assume that they are identical: \(\sigma_X = \sigma_Y\). We are then interested in testing the null-hypothesis:

\[H_0 \colon \mu_X = \mu_Y\]

The corresponding research hypothesis is:

\[H_0 \colon \mu_X \neq \mu_Y\]

If the null-hypothesis is true, it can be shown that the sampling distribution:

\[ T = \sqrt{\frac{n_X \ n_Y \ (n_X + n_Y - 2)}{n_X + n_Y}} \cdot \frac{\bar{X} - \bar{Y}}{\sqrt{(n_X -1) S^2_X + (n_Y -1) S^2_Y}} \]

follows a \(t\)-distribution with \(f = n_X + n_Y -2\) degrees of freedom. (NB: In the formula above, as before, \(\bar{x}\) is the sample mean, and \(s\) is the sample standard deviation.)

a. Calculate the \(t\)-score for the data

Like before, we use the above formula, defining the sampling distribution, to compute the test statistic for the observed data.

## YOUR CODE HERE
#
# n_X <- nrow(iq_data_tidy)/2
# n_Y <- ...
# x_bar <- ...
# y_bar <- ...
# sd_X <- filter(iq_summary, group == 'placebo')$standard_deviation
# sd_Y <- ...
#
# t_score <- ...
#
# t_score

b. Calculate the corresponding \(p\)-value

Based on what we know about the distribution of \(T\) (that is follows a \(t\)-distribution with \(f = n_X + n_Y - 2\) degrees of freedom), calculate the probability of seeing an outcome at least as extreme as the observed data.

# YOUR CODE HERE

c. Calculate the test using t.test()

The t.test() function can also perform two sample tests. Make sure that you set the options for an unpaired test with equal variance.

# # YOUR CODE HERE:

# t.test( ... FILL ME ... , var.equal = TRUE, paired = FALSE)

d. How to interpret your result

If everything went well, you obtained a \(p\)-value of around \(0.06\). Suppose that your research hypothesis is: “This smart drug influences cognitive performance, and therefore leads to a difference in measured IQ scores (higher or lower, we do not care, just some difference).” Based on your test result, which of the following statements are true. (Present your answers by typing on separate lines strings like “(xxii) false” with one line for each consecutive statement. We assume the standard \(\alpha\) level of \(0.05\).)

  1. The test results show that it is likely that the smart drug influences IQ-scores because the \(p\)-value is very close to significance.
  2. If we repeat the experiment, we will obtain a significant result in around 6% of the cases. This means that it is likely that the smart drug did have an effect on IQ scores even if this particular teszt result was not significant.
  3. We are not able to reject the null-hypothesis. This means that the research hypothesis is almost surely false. The smart drug has no effect on IQ scores.
  4. The \(p\)-value has not reached significance. The null-hypothesis is therefore very likely true.
  5. The test result was inconclusive. There are no grounds based on which we could reject the null hypothesis. Thus far we cannot draw any conclusions about the research hypothesis.

Are each of these true or false?

ANSWER:

  1. … true or false???…
  2. … true or false???…
  3. … true or false???…
  4. … true or false???…
  5. … true or false???…

e. How to interpret a fictional result

Since the frequentist paradigm is all about counterfactual data (i.e., data which we could have seen but have not), suppose the \(p\) value had turned out to be \(0.03\). Which of the following statements are true/false then? (Present your answers by typing on separate lines strings like “(xxii) false” with one line for each consecutive statement. We assume the standard \(\alpha\) level of \(0.05\).)

  1. The test results show that it is likely that the smart drug influences IQ-scores.
  2. If we repeat the experiment, we will false reject the null-hypothesis in around 3% of the cases.
  3. We can reject the null-hypothesis. This means that the research hypothesis is almost surely false. The smart drug has no effect on IQ scores.
  4. The \(p\)-value has reaached significance. The null-hypothesis is therefore very likely false.
  5. The test result was significant. We therefore reject the null-hypothesis and adopt the research hypothesis that the smart drug has an effect on IQ scores.

Are each of these true or false?

ANSWER:

  1. … true or false???…
  2. … true or false???…
  3. … true or false???…
  4. … true or false???…
  5. … true or false???…

End of assignment