7.3 Conditional statements
x * y does not apply any logic. It merely takes the value of
x and multiplies it by the value of
y. Conditional statements are how you inject some logic into your code. The most commonly used conditional statement is
if. Whenever you see an
if statement, read it as ‘If X is TRUE, do a thing’. Including an
else statement simply extends the logic to ‘If X is TRUE, do a thing, or else do something different’.
else statements allow you to run sections of code, depending on a condition is either
FALSE. The pseudocode below shows you the general form.
To delve into this a bit more, we can use an old programmer joke to set up a problem.
A programmer’s partner says: ‘Please go to the store and buy a carton of milk and if they have eggs, get six.’
The programmer returned with 6 cartons of milk.
When the partner sees this, and exclaims ‘Why the heck did you buy six cartons of milk?’
The programmer replied ‘They had eggs’
At the risk of explaining a joke, the conditional statement here is whether or not the store had eggs. If coded as per the original request, the programmer should bring 6 cartons of milk if the store had eggs (condition = TRUE), or else bring 1 carton of milk if there weren’t any eggs (condition = FALSE). In R this is coded as:
We can then check
n.milk to see how many milk cartons they returned with.
And just like the joke, our R code has missed that the condition was to determine whether or not to buy eggs, not more milk (this is actually a loose example of the Winograd Scheme, designed to test the intelligence of artificial intelligence by whether it can reason what the intended referent of a sentence is).
We could code the exact same egg-milk joke conditional statement using an
ifelse() function is doing exactly the same as the more fleshed out version from earlier, but is now condensed down into a single line of code. It has the added benefit of working on vectors as opposed to single values (more on this later when we introduce loops). The logic is read in the same way; “If there are eggs, assign a value of 6 to
n.milk, if there isn’t any eggs, assign the value 1 to
We can check again to make sure the logic is still returning 6 cartons of milk:
Currently we’d have to copy and paste code if we wanted to change if eggs were in the store or not. We learned above how to avoid lots of copy and pasting by creating a function. Just as with the simple
x * y expression in our previous
multiply_columns() function, the logical statements above are straightforward to code and well suited to be turned into a function. How about we do just that and wrap this logical statement up in a function?
We’ve now created a function called
milk() where the only argument is
eggs. The user of the function specifies if eggs is either
FALSE, and the function will then use a conditional statement to determine how many cartons of milk are returned.
Let’s quickly try:
And the joke is maintained. Notice in this case we have actually specified that we are fulfilling the
eggs argument (
eggs = TRUE)? In some functions, as with ours here, when a function only has a single argument we can be lazy and not name which argument we are fulfilling. In reality, it’s generally viewed as better practice to explicitly state which arguments you are fulfilling to avoid potential mistakes.
OK, lets go back to the
multiply_columns() function we created above and explain how we’ve used conditional statements to warn the user if
NA values are produced when we multiple any two columns together.
In this new version of the function we still use
x * y as before but this time we’ve assigned the values from this calculation to a temporary vector called
temp_var so we can use it in our conditional statements. Note, this
temp_var variable is local to our function and will not exist outside of the function due something called R’s scoping rules. We then use an
if statement to determine whether our
temp_var variable contains any
NA values. The way this works is that we first use the
is.na() function to test whether each value in our
temp_var variable is an
is.na() function returns
TRUE if the value is an
FALSE if the value isn’t an
NA. We then nest the
is.na(temp_var) function inside the function
any() to test whether any of the values returned by
TRUE. If at least one value is
any() function will return a
TRUE. So, if there are any
NA values in our
temp_var variable the condition for the
if() function will be
TRUE whereas if there are no
NA values present then the condition will be
FALSE. If the condition is
warning() function generates a warning message for the user and then returns the
temp_var variable. If the condition is
FALSE the code below the
else statement is executed which just returns the
So if we run our modified
multiple_columns() function on the columns
city$nairobi (which contains
NAs) we will receive an warning message.
Whereas if we multiple two columns that don’t contain
NA values we don’t receive a warning message