Instructions
echo = F
(so as not to have it show up in your output):knitr::opts_chunk$set(
warning = FALSE, # supress warnings per default
message = FALSE # supress messages per default
)
Then include a code chunk which loads all required packages (which is just tidyverse
). Make sure that this code chunk, too, will not show in your output, using echo = F
.
When chaining operations, please try to use the pipe %>%
wherever reasonable. We will not indicate in a task explicitly that the pipe should be used, but we expect that you do it as a default of elegance.
We will work with the King of France experiment, in particular with the data generated by participants of this course. For a detailed description of the theoretical background and the procedure look into the Appendix D.4 of your lecture script.
Here is a condensed description of the materials. The data set consists of five vignettes:
Where each vignette consists of five critical conditions. The following five sentences are examples of the critical conditions for the first vignette.
Additionally, for each vignette there exists a background check. This sentence is intended to find out whether participants know whether the relevant presuppositions are true. The five background checks are:
Finally, there are also 110 filler sentences, which do not have a presupposition, but also require common world knowledge for a correct answer. We will use the filler sentences also as controls, because there is a “correct” answer to each of these.
Look into the procedure described in the Appendix D.4 of your script and answer the following questions:
Load the data from the in-class replication, using the following code:
data_KoF_raw_IDA <-
read_csv(url('https://raw.githubusercontent.com/michael-franke/intro-data-analysis/master/data_sets/king-of-france_data_raw_IDA.csv'))
At this moment you should familiarize yourself with the data, e.g., by using glimpse
or View
(the latter only works in RStudio). After getting familiar with the data, answer the following questions (using appropriate and concise R code, which you should also reproduce as part of your answers to be submitted):
How many rows does the data set in data_KoF_raw_IDA
contain? (Hint: use the nrow
function!)
How many participants took part in the study? (Hint: use a sequence of operations pull
, unique
and length
.)
Print and inlcude in your HTML document a list of all comments given in the experiment, but printing each unique comment only once.
Print and inlcude in your HTML document a list of all answers given to the languages
question, but printing each unique comment only once.
Calculate the grand average of the variable age
, i.e., calculate the average age of every participant. (Hint: As soon as a vector contains missing data (an entry NA
), it’s mean is NA
as well. Try removing the missing values when calculating the mean, e.g., by checking the documentation of the function mean
for anything helpful.)
Use the summary
function to produce the five-number summary of the variable age
. (NB: you do not need to remove NA
s for this function.) The output of the summary
function shows a set of descriptive statistics that is often referred to as five-number summary (see for further explanation e.g. Five-number summary. It consists of the mean and the five most important sample percentiles: the sample minimum, the 0.25 quantile or first quartile, the 0.5 quantile or median, the 0.75 quantile or third quartile, the sample maximum.
Give the type of each of the following variables included in the data set (i.e., state whether it is ordinal, metric, etc.).
Follow the preprocessing steps executed in the script up to (but not including) Section 4.5.3 (on cleaning). That is, copy-paste the code from the last code box in Section D.4.2 of the script to add the new column condition
just like done in the script. Store the result in a variable called data_KoF_preprocessed_IDA
and select only the columns submission_id
, trial_number
, condition
, vignette
, question
, correct
, response
(in that order).
Your output should look like this:
data_KoF_preprocessed_IDA
## # A tibble: 2,040 x 7
## submission_id trial_number condition vignette question correct response
## <dbl> <dbl> <ord> <chr> <chr> <lgl> <lgl>
## 1 277 1 filler none Big Ben~ FALSE FALSE
## 2 277 2 filler none The Gre~ TRUE FALSE
## 3 277 3 Condition~ 5 The vol~ FALSE TRUE
## 4 277 4 filler none The Uni~ TRUE TRUE
## 5 277 5 filler none Elvis P~ FALSE FALSE
## 6 277 6 filler none William~ FALSE FALSE
## 7 277 7 Condition~ 1 Emmanue~ FALSE TRUE
## 8 277 8 filler none There a~ TRUE TRUE
## 9 277 9 Condition~ 2 The Emp~ FALSE TRUE
## 10 277 10 filler none Monkeys~ TRUE TRUE
## # ... with 2,030 more rows
Is this last data representation tidy? Why (not)?
Section D.4.1.2 of the script lists a number of research questions that we could raise for this data set. Let’s focus on the second, reproduced here:
While we are still far from performing a statistical analysis, we do already have the tools to get at least an indicative pair of numbers that might help address this question, namely the proportion of “true”-judgements in condition C0 and those in C1. Compute these proportions by:
data_KoF_preprocessed_IDA
%in%
very useful, which tests whether some element is included in a vector, e.g., as in the expression condition %in% c("Condition 0", "Condition 1")
)condition
summarise
to obtain the proportion of true judgements (Hint: if x
is a boolean vector, then mean(x)
will cast x
into an integer vector representing each entry of TRUE
as 1
and each entry of FALSE
as 0
, so that the mean will be exactly the proportion of occurrences of TRUE
in the vector x
.)Your final output should look like this:
## # A tibble: 2 x 2
## condition proportion_true
## <ord> <dbl>
## 1 Condition 0 0.153
## 2 Condition 1 0.0941
Notice that there is a perceptible difference in these numbers, but we will yet need to learn about ways of translating such numbers into mental currency, i.e., methods of translating such numbers (or the data that produced them) into statements of evidence (such as: “The data provides evidence that the conditions are different.”) or into decision criteria regarding whether to act as if we knew beyond doubt whether the proportions are equal or not. This is what we will learn in the remainder of this course.