# Instructions

• Work on your own. This homework should not be incredibly hard. If you need help, take a look at the suggested readings.
• Make sure you have R and RStudio installed. If you are an advanced user and aren’t using RStudio, make sure you have rmarkdown in order to ‘knit’ the HTML output.
• Download this R Markdown file. This is the file you will edit and submit.
• Open the .Rmd file in RStudio.
• Fill in the required code and answers.
• ‘Knit’ the document (ctrl/cmd + shift + K in RStudio) to produce a HTML file.
• Create a ZIP archive called “BDACM_HW1-LastnameFirstname.zip” containing:
• an R Markdown file “BDACM_HW1-LastnameFirstname.Rmd”
• a rendered HTML document “BDACM_HW1-LastnameFirstname.html”

• Total points: 16
• Points required to pass: 10
• For questions that have a coding part and an interpretation part, the total points are divided equally between the two.

• R4DS (R for Data Science).

# Required R packages

• rmarkdown (comes with RStudio)
• tidyverse
• TeachingDemos

# 1. Installing and running R (2 points)

## a. (1 point)

Your first task is simply to show that you have been able to install and run R and R Markdown. You don’t have to change this code, just uncomment it. Then the correct output will automatically appear when you ‘knit’ the document.

# UNCOMMENT THE CODE

#R.version

#sessionInfo()

Which version of R are you running? On which platform are you running it?

## b. (1 point)

Install the package tidyverse. Don’t install it in the code below. Instead, install it through the console. Then write code below to load the package and show the sessionInfo again.

# YOUR CODE HERE

Which version of tidyverse do you have installed?

# 2. Rolling dice in R (4 points)

## a. (1 point)

Install the package TeachingDemos. Then uncomment the code below and change "yourLastName" to your lastname. Use all lowercase letters for your lastname.

# UNCOMMENT THE CODE AND CHANGE YOUR LASTNAME

#library(TeachingDemos)

#lastname <- "yourLastName"

## b. (1 point)

Roll three six-sided dice by uncommenting the code below.

# UNCOMMENT THE CODE

#char2seed(lastname)

#dice(rolls = 1, ndice = 3, sides = 6, plot.it = TRUE)

What values did the dice show?

## c. (1 point)

Roll the dice again.

# UNCOMMENT THE CODE

#dice(rolls = 1, ndice = 3, sides = 6, plot.it = TRUE)

What values did the dice show this time?

## d. (1 point)

Roll the dice again. But first reset the random seed.

#UNCOMMENT THE CODE

#char2seed(lastname)

#dice(rolls = 1, ndice = 3, sides = 6, plot.it = TRUE)

What values did the dice show this time? Do you think R generates truly random numbers?

# 3. Tibbles (tidy tables) (6 points)

The iris data set comes with base R. You can read about this data set by running ?iris in the console. It is a data frame. In this course, we prefer to use tibbles (tidy tables) instead of data frames.

## a. (1 point)

Convert the iris data frame into a tibble using as_tibble(). Put this in a new variable called iris_tibble. Then print the tibble using the print() function.

# YOUR CODE HERE

Which data type is the variable “Species”? How do you know?

## b. (1 point)

Starting from the complete iris data set, filter only the flowers with a sepal length at least 4.5cm. Do this by piping (%>%) the iris_tibble to the filter() function. Hint: You can type the pipe quickly in RStudio with the command ctrl/cmd + shift + M.

# YOUR CODE HERE

How many datapoints (i.e. flowers) are left? How do you know?

## c. (1 point)

Starting from the complete iris data set, create a new variable called petal_area (the area of a petal = petal width times petal length). Do this by piping iris_tibble to mutate().

# YOUR CODE HERE

## d. (1 point)

Find out the mean sepal length for each species. Do this with by piping iris_tibble to group_by() and then to summarise(). For instructions read the help page for summarise().

# YOUR CODE HERE

What is the mean sepal length for virginica?

## e. (2 points)

Starting from the complete iris data set, filter only the flowers that are either ‘versicolor’ or ‘virginica’ and have a petal width between 1.5 and 2.0cm (inclusive). Hint: read the help pages on %in% and between().

# YOUR CODE HERE

How many datapoints (i.e. flowers) are left? How do you know?

# 4. Plotting data (4 points)

## a. (2 points)

Using the iris data set, create a scatterplot of sepal width (x axis) against sepal length (y axis) using ggplot(). Show each species in a different colour.

# YOUR CODE HERE

Which species stands out visually? Why?

## b. (2 points)

Using the iris data set, create a scatterplot of petal width (x axis) against petal length (y axis). Vary the size of the points depending on the sepal width and the colour depending on the sepal length. Use ggplot().

# YOUR CODE HERE

What do you notice about the relationship between petal length and sepal length?