2.3 Functions

2.3.1 Some important built-in functions

Many helpful functions are defined in base R or supplied by packages. We recommend browsing the Cheat Sheets every now and then to pick up more useful stuff for your inventory. Here are some functions that are very basic and generally useful.

2.3.1.1 Standard logic

  • &: “and”
  • |: “or”
  • !: “not”
  • negate(): a pipe-friendly ! (see Section 2.5 for more on piping)
  • all(): returns true of a vector if all elements are T
  • any(): returns true of a vector if at least one element is T

2.3.1.2 Comparisons

  • <: smaller
  • >: greater
  • ==: equal (you can also use near()instead of == e.g. near(3/3,1)returns TRUE)
  • >=: greater or equal
  • <=: less or equal
  • !=: not equal

2.3.1.3 Set theory

  • %in%: whether an element is in a vector
  • union(x, y): union of x and y
  • intersect(x, y): intersection of x and y
  • setdiff(x, y): all elements in x that are not in y

2.3.1.4 Sampling and combinatorics

  • runif(): random number from unit interval [0;1]
  • sample(x, size, replace): take size samples from x (with replacement if replace is T)
  • choose(n, k): number of subsets of size n out of a set of size k (binomial coefficient)

2.3.2 Defining your own functions

If you find yourself in a situation in which you would like to copy-paste some code, possibly with minor amendments, this usually means that you should wrap some recurring operations into a custom-defined function. There are two ways of defining your own functions: as a named function, or an anonymous function.

2.3.2.1 Named functions

The special operator supplied by base R to create new functions is the keyword function. Here is an example of defining a new function with two input variables x and y that returns a computation based on these numbers. We assign a newly created function to the variable cool_function so that we can use this name to call the function later. Notice that the use of the return keyword is optional here. If it is left out, the evaluation of the last line is returned.

# define a new function
# - takes two numbers x & y as argument
# - returns x * y + 1
cool_function <- function(x, y) {
  return(x * y + 1)   
}

# apply `cool_function` to some numbers:
cool_function(3, 3)     # returns 10
cool_function(1, 1)     # returns 2
cool_function(1:2, 1)   # returns vector [2,3]
cool_function(1)        # throws error: 'argument "y" is missing, with no default'
cool_function()         # throws error: 'argument "x" is missing, with no default'

We can give default values for the parameters passed to a function:

# same function as before but with
# default values for each argument
cool_function_2 <- function(x = 2, y = 3) {
  return(x * y + 1)
}

# apply `cool_function_2` to some numbers:
cool_function_2(3, 3)     # returns 10
cool_function_2(1, 1)     # returns 2
cool_function_2(1:2, 1)   # returns vector [2,3]
cool_function_2(1)        # returns 4 (= 1 * 3 + 1)
cool_function_2()         # returns 7 (= 2 * 3 + 1)

Exercise 2.10

Create a function called bigger_100 which takes two numbers as input and outputs 0 if their product is less than or equal to 100, and 1 otherwise. (Hint: remember that you can cast a Boolean value to an integer with as.integer.)

bigger_100 <- function(x, y) {
  return(as.integer(x * y > 100))
}
bigger_100(40, 3)
## [1] 1

2.3.2.2 Anonymous functions

Notice that we can feed functions as parameters to other functions. This is an important ingredient of a functional-style of programming, and something that we will rely on heavily in this book (see Section 2.4). When supplying a function as an argument to another function, we might not want to name the function that is passed. Here’s a (stupid, but hopefully illustrating) example.

We first define the named function new_applier_function which takes two arguments as input: an input vector, which is locally called input in the scope of the function’s body, and a function, which is locally called function_to_apply. Our new function new_applier_function first checks whether the input vector has more than one element, throws an error if not, and otherwise applies the argument function function_to_apply to the vector input.

# define a function that takes a vector and a function as an argument
new_applier_function <- function(input, function_to_apply) {
  # check if input vector has at least 2 elements
  if (length(input) <= 1) {
    # terminate and show informative error message
    stop("Error in 'new_applier_function': input vector has length <= 1.")
  } 
  # otherwise apply the function to the input vector
  return(function_to_apply(input))
}

We use this new function to show the difference between named and unnamed functions, in particular why the latter can be very handy and elegant. First, we consider a case where we use new_applier_function in connection with the named built-in function sum:

# sum vector with built-in & named function
new_applier_function(
  input = 1:3,              # input vector 
  function_to_apply = sum   # built-in & named function to apply
)   # returns 6

If instead of an existing named function, we want to use a new function to supply to new_applier_function, we could define that function first and give it a name, but if we only need it “in situ” for calling new_applier_function once, we can also write this:

# Sum vector with anonymous function
new_applier_function(
  input = 1:3,              # input vector 
  function_to_apply = function(in_vec) {
    return(in_vec[1] + in_vec[2])
  } 
)   # returns 3 (as it only sums the first two arguments)

Exercise 2.11

How many arguments should you pass to a function that…

  1. …tells if the sum of two numbers is even?

  2. …applies two different operations on a variable and sums the results? Operations are not fixed in the function.

  1. Two arguments.

  2. Three arguments.

Call the function new_applier_function with input = 1:3 and an anonymous function that returns just the first two elements of the input vector in reverse order (as a vector).

new_applier_function(
  input = 1:3,              # input vector 
  function_to_apply = function(in_vec) {
    return(c(in_vec[c(2,1)]))
  } 
)