7.3 Conditional statements

x * y does not apply any logic. It merely takes the value of x and multiplies it by the value of y. Conditional statements are how you inject some logic into your code. The most commonly used conditional statement is if. Whenever you see an if statement, read it as ‘If X is TRUE, do a thing’. Including an else statement simply extends the logic to ‘If X is TRUE, do a thing, or else do something different’.

Both the if and else statements allow you to run sections of code, depending on a condition is either TRUE or FALSE. The pseudocode below shows you the general form.

  if (condition) {
  Code executed when condition is TRUE
  } else {
  Code executed when condition is FALSE
  }

To delve into this a bit more, we can use an old programmer joke to set up a problem.

A programmer’s partner says: ‘Please go to the store and buy a carton of milk and if they have eggs, get six.’

The programmer returned with 6 cartons of milk.

When the partner sees this, and exclaims ‘Why the heck did you buy six cartons of milk?’

The programmer replied ‘They had eggs’.

At the risk of explaining a joke, the conditional statement here is whether or not the store had eggs. If coded as per the original request, the programmer should bring 6 cartons of milk if the store had eggs (condition = TRUE), or else bring 1 carton of milk if there weren’t any eggs (condition = FALSE). In R this is coded as:

eggs <- TRUE # Whether there were eggs in the store

  if (eggs == TRUE) { # If there are eggs
  n.milk <- 6 # Get 6 cartons of milk
    } else { # If there are not eggs
  n.milk <- 1 # Get 1 carton of milk
  }

We can then check n.milk to see how many milk cartons they returned with.

n.milk
## [1] 6

And just like the joke, our R code has missed that the condition was to determine whether or not to buy eggs, not more milk (this is actually a loose example of the Winograd Scheme, designed to test the intelligence of artificial intelligence by whether it can reason what the intended referent of a sentence is).

We could code the exact same egg-milk joke conditional statement using an ifelse() function.

eggs <- TRUE
n.milk <- ifelse(eggs == TRUE, yes = 6, no = 1)

This ifelse() function is doing exactly the same as the more fleshed out version from earlier, but is now condensed down into a single line of code. It has the added benefit of working on vectors as opposed to single values (more on this later when we introduce loops). The logic is read in the same way; “If there are eggs, assign a value of 6 to n.milk, if there isn’t any eggs, assign the value 1 to n.milk”.

We can check again to make sure the logic is still returning 6 cartons of milk:

n.milk
## [1] 6

Currently we’d have to copy and paste code if we wanted to change if eggs were in the store or not. We learned above how to avoid lots of copy and pasting by creating a function. Just as with the simple x * y expression in our previous multiply_columns() function, the logical statements above are straightforward to code and well suited to be turned into a function. How about we do just that and wrap this logical statement up in a function?

milk <- function(eggs) {
  if (eggs == TRUE) {
    6
  } else {
    1
  }
}

We’ve now created a function called milk() where the only argument is eggs. The user of the function specifies if eggs is either TRUE or FALSE, and the function will then use a conditional statement to determine how many cartons of milk are returned.

Let’s quickly try:

milk(eggs = TRUE)
## [1] 6

And the joke is maintained. Notice in this case we have actually specified that we are fulfilling the eggs argument (eggs = TRUE). In some functions, as with ours here, when a function only has a single argument we can be lazy and not name which argument we are fulfilling. In reality, it’s generally viewed as better practice to explicitly state which arguments you are fulfilling to avoid potential mistakes.

OK, lets go back to the multiply_columns() function we created above and explain how we’ve used conditional statements to warn the user if NA values are produced when we multiple any two columns together.

multiply_columns <- function(x, y) {
  temp_var <- x * y
  if (any(is.na(temp_var))) {
    warning("The function has produced NAs")
    return(temp_var)
  } else {
    return(temp_var)
  }
}

In this new version of the function we still use x * y as before but this time we’ve assigned the values from this calculation to a temporary vector called temp_var so we can use it in our conditional statements. Note, this temp_var variable is local to our function and will not exist outside of the function due something called R’s scoping rules. We then use an if statement to determine whether our temp_var variable contains any NA values. The way this works is that we first use the is.na() function to test whether each value in our temp_var variable is an NA. The is.na() function returns TRUE if the value is an NA and FALSE if the value isn’t an NA. We then nest the is.na(temp_var) function inside the function any() to test whether any of the values returned by is.na(temp_var) are TRUE. If at least one value is TRUE the any() function will return a TRUE. So, if there are any NA values in our temp_var variable the condition for the if() function will be TRUE whereas if there are no NA values present then the condition will be FALSE. If the condition is TRUE the warning() function generates a warning message for the user and then returns the temp_var variable. If the condition is FALSE the code below the else statement is executed which just returns the temp_var variable.

So if we run our modified multiple_columns() function on the columns city$aberdeen and city$nairobi (which contains NAs) we will receive an warning message.

aberdeen_nairobi_func <- multiply_columns(city$aberdeen, city$nairobi)
## Warning in multiply_columns(city$aberdeen, city$nairobi): The function has
## produced NAs

Whereas if we multiple two columns that don’t contain NA values we don’t receive a warning message.

porto_aberdeen_func <- multiply_columns(city$porto, city$aberdeen)