7.5 Loops

R is very good at performing repetitive tasks. If we want a set of operations to be repeated several times we use what’s known as a loop. When you create a loop, R will execute the instructions in the loop a specified number of times or until a specified condition is met. There are three main types of loop in R: the for loop, the while loop and the repeat loop.

Loops are one of the staples of all programming languages, not just R, and can be a powerful tool (although in our opinion, used far too frequently when writing R code).

7.5.1 For loop

The most commonly used loop structure when you want to repeat a task a defined number of times is the for loop. The most basic example of a for loop is:

for (i in 1:5) {
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

But what’s the code actually doing? This is a dynamic bit of code were an index i is iteratively replaced by each value in the vector 1:5. Let’s break it down. Because the first value in our sequence (1:5) is 1, the loop starts by replacing i with 1 and runs everything between the { }. Loops conventionally use i as the counter, short for iteration, but you are free to use whatever you like, even your pet’s name, it really does not matter (except when using nested loops, in which case the counters must be called different things, like SenorWhiskers and HerrFlufferkins.

So, if we were to do the first iteration of the loop manually

i <- 1
print(i)
## [1] 1

Once this first iteration is complete, the for loop loops back to the beginning and replaces i with the next value in our 1:5 sequence (2 in this case):

i <- 2
print(i)
## [1] 2

This process is then repeated until the loop reaches the final value in the sequence (5 in this example) after which point it stops.

To reinforce how for loops work and introduce you to a valuable feature of loops, we’ll alter our counter within the loop. This can be used, for example, if we’re using a loop to iterate through a vector but want to select the next row (or any other value). To show this we’ll simply add 1 to the value of our index every time we iterate our loop.

for (i in 1:5) {
  print(i + 1)
}
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6

As in the previous loop, the first value in our sequence is 1. The loop begins by replacing i with 1, but this time we’ve specified that a value of 1 must be added to i in the expression resulting in a value of 1 + 1.

i <- 1
i + 1
## [1] 2

As before, once the iteration is complete, the loop moves onto the next value in the sequence and replaces i with the next value (2 in this case) so that i + 1 becomes 2 + 1.

i <- 2
i + 1
## [1] 3

And so on. We think you get the idea! In essence this is all a for loop is doing and nothing more.

Whilst above we have been using simple addition in the body of the loop, you can also combine loops with functions.

Let’s go back to our data frame city. Previously in the Chapter we created a function to multiply two columns and used it to create our porto_aberdeen, aberdeen_nairobi, and nairobi_genoa objects. We could have used a loop for this. Let’s remind ourselves what our data look like and the code for the multiple_columns() function.

# Recreating our dataset
city <- data.frame(
  porto = rnorm(100),
  aberdeen = rnorm(100),
  nairobi = c(rep(NA, 10), rnorm(90)),
  genoa = rnorm(100)
)

# Our function
multiply_columns <- function(x, y) {
  temp <- x * y
  if (any(is.na(temp))) {
    warning("The function has produced NAs")
    return(temp)
  } else {
    return(temp)
  }
}

To use a list to iterate over these columns we need to first create an empty list (remember lists?) which we call temp (short for temporary) which will be used to store the output of the for loop.

temp <- list()
for (i in 1:(ncol(city) - 1)) {
  temp[[i]] <- multiply_columns(x = city[, i], y = city[, i + 1])
}
## Warning in multiply_columns(x = city[, i], y = city[, i + 1]): The function has
## produced NAs

## Warning in multiply_columns(x = city[, i], y = city[, i + 1]): The function has
## produced NAs

When we specify our for loop notice how we subtracted 1 from ncol(city). The ncol() function returns the number of columns in our city data frame which is 4 and so our loop runs from i = 1 to i = 4 - 1 which is i = 3. We’ll come back to why we need to subtract 1 from this in a minute.

So in the first iteration of the loop i takes on the value 1. The multiply_columns() function multiplies the city[, 1] (porto) and city[, 1 + 1] (aberdeen) columns and stores it in the temp[[1]] which is the first element of the temp list.

The second iteration of the loop i takes on the value 2. The multiply_columns() function multiplies the city[, 2] (aberdeen) and city[, 2 + 1] (nairobi) columns and stores it in the temp[[2]] which is the second element of the temp list.

The third and final iteration of the loop i takes on the value 3. The multiply_columns() function multiplies the city[, 3] (nairobi) and city[, 3 + 1] (genoa) columns and stores it in the temp[[3]] which is the third element of the temp list.

So can you see why we used ncol(city) - 1 when we first set up our loop? As we have four columns in our city data frame if we didn’t use ncol(city) - 1 then eventually we’d try to add the 4th column with the non-existent 5th column.

Again, it’s a good idea to test that we are getting something sensible from our loop (remember, check, check and check again!). To do this we can use the identical() function to compare the variables we created by hand with each iteration of the loop manually.

porto_aberdeen_func <- multiply_columns(city$porto, city$aberdeen)
i <- 1
identical(multiply_columns(city[, i], city[, i + 1]), porto_aberdeen_func)
## [1] TRUE

aberdeen_nairobi_func <- multiply_columns(city$aberdeen, city$nairobi)
## Warning in multiply_columns(city$aberdeen, city$nairobi): The function has
## produced NAs
i <- 2
identical(multiply_columns(city[, i], city[, i + 1]), aberdeen_nairobi_func)
## Warning in multiply_columns(city[, i], city[, i + 1]): The function has
## produced NAs
## [1] TRUE

If you can follow the examples above, you’ll be in a good spot to begin writing some of your own for loops. That said there are other types of loops available to you.

7.5.2 While loop

Another type of loop that you may use (albeit less frequently) is the while loop. The while loop is used when you want to keep looping until a specific logical condition is satisfied (contrast this with the for loop which will always iterate through an entire sequence).

The basic structure of the while loop is:

while(logical_condition){ expression }

A simple example of a while loop is:

i <- 0
while (i <= 4) {
  i <- i + 1
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

Here the loop will only continue to pass values to the main body of the loop (the expression body) when i is less than or equal to 4 (specified using the <= operator in this example). Once i is greater than 4 the loop will stop.

There is another, very rarely used type of loop; the repeat loop. The repeat loop has no conditional check so can keep iterating indefinitely (meaning a break, or “stop here”, has to be coded into it). It’s worthwhile being aware of it’s existence, but for now we don’t think you need to worry about it; the for and while loops will see you through the vast majority of your looping needs.

7.5.3 When to use a loop?

Loops are fairly commonly used, though sometimes a little overused in our opinion. Equivalent tasks can be performed with functions, which are often more efficient than loops. Though this raises the question when should you use a loop?

In general loops are implemented inefficiently in R and should be avoided when better alternatives exist, especially when you’re working with large datasets. However, loop are sometimes the only way to achieve the result we want.

Some examples of when using loops can be appropriate:

  • Some simulations (e.g. the Ricker model can, in part, be built using loops)

  • Recursive relationships (a relationship which depends on the value of the previous relationship [“to understand recursion, you must understand recursion”])

  • More complex problems (e.g., how long since the last badger was seen at site \(j\), given a pine marten was seen at time \(t\), at the same location \(j\) as the badger, where the pine marten was detected in a specific 6 hour period, but exclude badgers seen 30 minutes before the pine marten arrival, repeated for all pine marten detections)

  • While loops (keep jumping until you’ve reached the moon)

7.5.4 If not loops, then what?

In short, use the apply family of functions; apply(), lapply(), tapply(), sapply(), vapply(), and mapply(). The apply functions can often do the tasks of most “home-brewed” loops, sometimes faster (though that won’t really be an issue for most people) but more importantly with a much lower risk of error. A strategy to have in the back of your mind which may be useful is; for every loop you make, try to remake it using an apply function (often lapply or sapply will work). If you can, use the apply version. There’s nothing worse than realising there was a small, tiny, seemingly meaningless mistake in a loop which weeks, months or years down the line has propagated into a huge mess. We strongly recommend trying to use the apply functions whenever possible.

lapply

Your go to apply function will often be lapply() at least in the beginning. The way that lapply() works, and the reason it is often a good alternative to for loops, is that it will go through each element in a list and perform a task (i.e. run a function). It has the added benefit that it will output the results as a list - something you’d have to otherwise code yourself into a loop.

An lapply() has the following structure:

lapply(X, FUN)

Here X is the vector which we want to do something to. FUN stands for how much fun this is (just kidding!). It’s also short for “function”.

Let’s start with a simple demonstration first. Let’s use the lapply() function create a sequence from 1 to 5 and add 1 to each observation (just like we did when we used a for loop):

lapply(0:4, function(a) {a + 1})
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3
## 
## [[4]]
## [1] 4
## 
## [[5]]
## [1] 5

Notice that we need to specify our sequence as 0:4 to get the output 1 ,2 ,3 ,4 , 5 as we are adding 1 to each element of the sequence. See what happens if you use 1:5 instead.

Equivalently, we could have defined the function first and then used the function in lapply()

add_fun <- function(a) {a + 1}
lapply(0:4, add_fun)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3
## 
## [[4]]
## [1] 4
## 
## [[5]]
## [1] 5

The sapply() function does the same thing as lapply() but instead of storing the results as a list, it stores them as a vector.

sapply(0:4, function(a) {a + 1})
## [1] 1 2 3 4 5

As you can see, in both cases, we get exactly the same results as when we used the for loop.