Instructions

Grading scheme

Suggested readings

Required R packages


1. Installing and running R (2 points)

a. (1 point)

Your first task is simply to show that you have been able to install and run R and R Markdown. You don’t have to change this code, just uncomment it. Then the correct output will automatically appear when you ‘knit’ the document.

# UNCOMMENT THE CODE

#R.version

#sessionInfo()
# SOLUTION:

R.version
##                _                           
## platform       x86_64-pc-linux-gnu         
## arch           x86_64                      
## os             linux-gnu                   
## system         x86_64, linux-gnu           
## status                                     
## major          3                           
## minor          5.1                         
## year           2018                        
## month          07                          
## day            02                          
## svn rev        74947                       
## language       R                           
## version.string R version 3.5.1 (2018-07-02)
## nickname       Feather Spray
sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux buster/sid
## 
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
## LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] bindrcpp_0.2.2     TeachingDemos_2.10 forcats_0.3.0     
##  [4] stringr_1.3.1      dplyr_0.7.7        purrr_0.2.5       
##  [7] readr_1.1.1        tidyr_0.8.1        tibble_1.4.2      
## [10] ggplot2_3.0.0.9000 tidyverse_1.2.1    assignr_0.0.1     
## 
## loaded via a namespace (and not attached):
##  [1] tinytex_0.8      tidyselect_0.2.5 xfun_0.3         haven_1.1.2     
##  [5] lattice_0.20-35  colorspace_1.3-2 htmltools_0.3.6  yaml_2.2.0      
##  [9] utf8_1.1.4       rlang_0.2.2      pillar_1.3.0     glue_1.3.0      
## [13] withr_2.1.2      modelr_0.1.2     readxl_1.1.0     bindr_0.1.1     
## [17] plyr_1.8.4       munsell_0.5.0    gtable_0.2.0     cellranger_1.1.0
## [21] rvest_0.3.2      evaluate_0.12    labeling_0.3     knitr_1.20      
## [25] fansi_0.4.0      broom_0.5.0      Rcpp_0.12.19     scales_1.0.0    
## [29] backports_1.1.2  jsonlite_1.5     hms_0.4.2        digest_0.6.18   
## [33] stringi_1.2.4    grid_3.5.1       rprojroot_1.3-2  cli_1.0.1       
## [37] tools_3.5.1      magrittr_1.5     lazyeval_0.2.1   crayon_1.3.4    
## [41] pkgconfig_2.0.2  xml2_1.2.0       lubridate_1.7.4  assertthat_0.2.0
## [45] rmarkdown_1.10   httr_1.3.1       rstudioapi_0.8   R6_2.3.0        
## [49] nlme_3.1-137     compiler_3.5.1

Which version of R are you running? On which platform are you running it?

ANSWER:

SOLUTION: 3.5.1 on x86_64-pc-linux-gnu

b. (1 point)

Install the package tidyverse. Don’t install it in the code below. Instead, install it through the console. Then write code below to load the package and show the sessionInfo again.

# YOUR CODE HERE
# SOLUTION:

library(tidyverse)

sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux buster/sid
## 
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
## LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] bindrcpp_0.2.2     TeachingDemos_2.10 forcats_0.3.0     
##  [4] stringr_1.3.1      dplyr_0.7.7        purrr_0.2.5       
##  [7] readr_1.1.1        tidyr_0.8.1        tibble_1.4.2      
## [10] ggplot2_3.0.0.9000 tidyverse_1.2.1    assignr_0.0.1     
## 
## loaded via a namespace (and not attached):
##  [1] tinytex_0.8      tidyselect_0.2.5 xfun_0.3         haven_1.1.2     
##  [5] lattice_0.20-35  colorspace_1.3-2 htmltools_0.3.6  yaml_2.2.0      
##  [9] utf8_1.1.4       rlang_0.2.2      pillar_1.3.0     glue_1.3.0      
## [13] withr_2.1.2      modelr_0.1.2     readxl_1.1.0     bindr_0.1.1     
## [17] plyr_1.8.4       munsell_0.5.0    gtable_0.2.0     cellranger_1.1.0
## [21] rvest_0.3.2      evaluate_0.12    labeling_0.3     knitr_1.20      
## [25] fansi_0.4.0      broom_0.5.0      Rcpp_0.12.19     scales_1.0.0    
## [29] backports_1.1.2  jsonlite_1.5     hms_0.4.2        digest_0.6.18   
## [33] stringi_1.2.4    grid_3.5.1       rprojroot_1.3-2  cli_1.0.1       
## [37] tools_3.5.1      magrittr_1.5     lazyeval_0.2.1   crayon_1.3.4    
## [41] pkgconfig_2.0.2  xml2_1.2.0       lubridate_1.7.4  assertthat_0.2.0
## [45] rmarkdown_1.10   httr_1.3.1       rstudioapi_0.8   R6_2.3.0        
## [49] nlme_3.1-137     compiler_3.5.1

Which version of tidyverse do you have installed?

ANSWER:

SOLUTION: 1.2.1


2. Rolling dice in R (4 points)

a. (1 point)

Install the package TeachingDemos. Then uncomment the code below and change "yourLastName" to your lastname. Use all lowercase letters for your lastname.

# UNCOMMENT THE CODE AND CHANGE YOUR LASTNAME

#library(TeachingDemos)

#lastname <- "yourLastName"
# SOLUTION:

library(TeachingDemos)

lastname <- "bayes"

b. (1 point)

Roll three six-sided dice by uncommenting the code below.

# UNCOMMENT THE CODE

#char2seed(lastname)

#dice(rolls = 1, ndice = 3, sides = 6, plot.it = TRUE)
# SOLUTION:

char2seed(lastname)

dice(rolls = 1, ndice = 3, sides = 6, plot.it = TRUE)

What values did the dice show?

ANSWER:

SOLUTION: 5, 4, 3 (as long as it matches the graphic it is correct)

c. (1 point)

Roll the dice again.

# UNCOMMENT THE CODE

#dice(rolls = 1, ndice = 3, sides = 6, plot.it = TRUE)
# SOLUTION:

dice(rolls = 1, ndice = 3, sides = 6, plot.it = TRUE)

What values did the dice show this time?

ANSWER:

SOLUTION: 6, 5, 2 (this might not match if you didn’t run the script in the right order – no points deducted)

d. (1 point)

Roll the dice again. But first reset the random seed.

#UNCOMMENT THE CODE

#char2seed(lastname)

#dice(rolls = 1, ndice = 3, sides = 6, plot.it = TRUE)
# SOLUTION:

char2seed(lastname)

dice(rolls = 1, ndice = 3, sides = 6, plot.it = TRUE)

What values did the dice show this time? Do you think R generates truly random numbers?

ANSWER:

SOLUTION: 5, 4, 3. No, the numbers are not random, they are determined by the seed.


3. Tibbles (tidy tables) (6 points)

The iris data set comes with base R. You can read about this data set by running ?iris in the console. It is a data frame. In this course, we prefer to use tibbles (tidy tables) instead of data frames.

a. (1 point)

Convert the iris data frame into a tibble using as_tibble(). Put this in a new variable called iris_tibble. Then print the tibble using the print() function.

# YOUR CODE HERE
# SOLUTION:

iris_tibble <- as_tibble(iris)
print(iris_tibble)
## # A tibble: 150 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.4         2.9          1.4         0.2 setosa 
## 10          4.9         3.1          1.5         0.1 setosa 
## # ... with 140 more rows

Which data type is the variable “Species”? How do you know?

ANSWER:

SOLUTION: Species is a factor. You can see from the <fct> label above the column.

b. (1 point)

Starting from the complete iris data set, filter only the flowers with a sepal length at least 4.5cm. Do this by piping (%>%) the iris_tibble to the filter() function. Hint: You can type the pipe quickly in RStudio with the command ctrl/cmd + shift + M.

# YOUR CODE HERE
# SOLUTION:

iris_tibble %>%
  filter(Sepal.Length >= 4.5)
## # A tibble: 146 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.9         3.1          1.5         0.1 setosa 
## 10          5.4         3.7          1.5         0.2 setosa 
## # ... with 136 more rows

How many datapoints (i.e. flowers) are left? How do you know?

ANSWER:

SOLUTION: 146, because there are 146 rows and each row is a datapoint

c. (1 point)

Starting from the complete iris data set, create a new variable called petal_area (the area of a petal = petal width times petal length). Do this by piping iris_tibble to mutate().

# YOUR CODE HERE
# SOLUTION:

iris_tibble %>%
  mutate(petal_area = Petal.Width * Petal.Length)
## # A tibble: 150 x 6
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species petal_area
##           <dbl>       <dbl>        <dbl>       <dbl> <fct>        <dbl>
##  1          5.1         3.5          1.4         0.2 setosa       0.280
##  2          4.9         3            1.4         0.2 setosa       0.280
##  3          4.7         3.2          1.3         0.2 setosa       0.26 
##  4          4.6         3.1          1.5         0.2 setosa       0.3  
##  5          5           3.6          1.4         0.2 setosa       0.280
##  6          5.4         3.9          1.7         0.4 setosa       0.68 
##  7          4.6         3.4          1.4         0.3 setosa       0.42 
##  8          5           3.4          1.5         0.2 setosa       0.3  
##  9          4.4         2.9          1.4         0.2 setosa       0.280
## 10          4.9         3.1          1.5         0.1 setosa       0.15 
## # ... with 140 more rows

d. (1 point)

Find out the mean sepal length for each species. Do this with by piping iris_tibble to group_by() and then to summarise(). For instructions read the help page for summarise().

# YOUR CODE HERE
# SOLUTION:

iris_tibble %>%
  group_by(Species) %>%
  summarise(mean_sepal_length = mean(Sepal.Length))
## # A tibble: 3 x 2
##   Species    mean_sepal_length
##   <fct>                  <dbl>
## 1 setosa                  5.01
## 2 versicolor              5.94
## 3 virginica               6.59

What is the mean sepal length for virginica?

ANSWER:

SOLUTION: 6.588

e. (2 points)

Starting from the complete iris data set, filter only the flowers that are either ‘versicolor’ or ‘virginica’ and have a petal width between 1.5 and 2.0cm (inclusive). Hint: read the help pages on %in% and between().

# YOUR CODE HERE
# SOLUTION:

iris_tibble %>%
  filter(Species %in% c("virginica", "versicolor"),
                        between(Petal.Width, 1.5, 2.0))
## # A tibble: 41 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
##           <dbl>       <dbl>        <dbl>       <dbl> <fct>     
##  1          6.4         3.2          4.5         1.5 versicolor
##  2          6.9         3.1          4.9         1.5 versicolor
##  3          6.5         2.8          4.6         1.5 versicolor
##  4          6.3         3.3          4.7         1.6 versicolor
##  5          5.9         3            4.2         1.5 versicolor
##  6          5.6         3            4.5         1.5 versicolor
##  7          6.2         2.2          4.5         1.5 versicolor
##  8          5.9         3.2          4.8         1.8 versicolor
##  9          6.3         2.5          4.9         1.5 versicolor
## 10          6.7         3            5           1.7 versicolor
## # ... with 31 more rows

How many datapoints (i.e. flowers) are left? How do you know?

ANSWER:

SOLUTION: 41, because there are 41 rows and each row is a datapoint.


4. Plotting data (4 points)

a. (2 points)

Using the iris data set, create a scatterplot of sepal width (x axis) against sepal length (y axis) using ggplot(). Show each species in a different colour.

# YOUR CODE HERE
# SOLUTION:

iris_tibble %>%
  ggplot(mapping = aes(x = Sepal.Width,
                       y = Sepal.Length,
                       color = Species)) +
    geom_point()

Which species stands out visually? Why?

ANSWER:

SOLUTION: Setosa, as it has a lower sepal length but higher sepal width.

b. (2 points)

Using the iris data set, create a scatterplot of petal width (x axis) against petal length (y axis). Vary the size of the points depending on the sepal width and the colour depending on the sepal length. Use ggplot().

# YOUR CODE HERE
# SOLUTION:

iris_tibble %>%
  ggplot(mapping = aes(x = Petal.Width,
                       y = Petal.Length,
                       size = Sepal.Width,
                       colour = Sepal.Length)) +
    geom_point()

What do you notice about the relationship between petal length and sepal length?

ANSWER:

SOLUTION: Short petals seem to go with short sepals and long petals with long sepals.


End of homework sheet