# Control flow structures

## Remarks

For loops are a flow control method for repeating a task or set of tasks over a domain. The core structure of a for loop is

``````for ( [index] in [domain]){
[body]
}
``````

Where

1. `[index]` is a name takes exactly one value of `[domain]` over each iteration of the loop.
2. `[domain]` is a vector of values over which to iterate.
3. `[body]` is the set of instructions to apply on each iteration.

As a trivial example, consider the use of a for loop to obtain the cumulative sum of a vector of values.

``````x <- 1:4
cumulative_sum <- 0
for (i in x){
cumulative_sum <- cumulative_sum + x[i]
}
cumulative_sum
``````

### Optimizing Structure of For Loops

For loops can be useful for conceptualizing and executing tasks to repeat. If not constructed carefully, however, they can be very slow to execute compared to the preferred used of the `apply` family of functions. Nonetheless, there are a handful of elements you can include in your for loop construction to optimize the loop. In many cases, good construction of the for loop will yield computational efficiency very close to that of an apply function.

A 'properly constructed' for loop builds on the core structure and includes a statement declaring the object that will capture each iteration of the loop. This object should have both a class and a length declared.

``````[output] <- [vector_of_length]
for ([index] in [length_safe_domain]){
[output][index] <- [body]
}
``````

To illustrate, let us write a loop to square each value in a numeric vector (this is a trivial example for illustration only. The 'correct' way of completing this task would be `x_squared <- x^2`).

``````x <- 1:100
x_squared <- vector("numeric", length = length(x))
for (i in seq_along(x)){
x_squared[i] <- x[i]^2
}
``````

Again, notice that we first declared a receptacle for the output `x_squared`, and gave it the class "numeric" with the same length as `x`. Additionally, we declared a "length safe domain" using the `seq_along` function. `seq_along` generates a vector of indices for an object that is suited for use in for loops. While it seems intuitive to use `for (i in 1:length(x))`, if `x` has 0 length, the loop will attempt to iterate over the domain of `1:0`, resulting in an error (the 0th index is undefined in R).

Receptacle objects and length safe domains are handled internally by the `apply` family of functions and users are encouraged to adopt the `apply` approach in place of for loops as much as possible. However, if properly constructed, a for loop may occasionally provide greater code clarity with minimal loss of efficiency.

### Vectorizing For Loops

For loops can often be a useful tool in conceptualizing the tasks that need to be completed within each iteration. When the loop is completely developed and conceptualized, there may be advantages to turning the loop into a function.

In this example, we will develop a for loop to calculate the mean of each column in the `mtcars` dataset (again, a trivial example as it could be accomplished via the `colMeans` function).

``````column_mean_loop <- vector("numeric", length(mtcars))
for (k in seq_along(mtcars)){
column_mean_loop[k] <- mean(mtcars[[k]])
}
``````

The for loop can be converted to an apply function by rewriting the body of the loop as a function.

``````col_mean_fn <- function(x) mean(x)
column_mean_apply <- vapply(mtcars, col_mean_fn, numeric(1))
``````

And to compare the results:

``````identical(column_mean_loop,
unname(column_mean_apply)) #* vapply added names to the elements
#* remove them for comparison
``````

The advantages of the vectorized form is that we were able to eliminate a few lines of code. The mechanics of determining the length and type of the output object and iterating over a length safe domain are handled for us by the apply function. Additionally, the apply function is a little bit faster than the loop. The difference of speed is often negligible in human terms depending on the number of iterations and the complexity of the body.

## Basic For Loop Construction

In this example we will calculate the squared deviance for each column in a data frame, in this case the `mtcars`.

Option A: integer index

``````squared_deviance <- vector("list", length(mtcars))
for (i in seq_along(mtcars)){
squared_deviance[[i]] <- (mtcars[[i]] - mean(mtcars[[i]]))^2
}
``````

`squared_deviance` is an 11 elements list, as expected.

``````class(squared_deviance)
length(squared_deviance)
``````

Option B: character index

``````squared_deviance <- vector("list", length(mtcars))
Squared_deviance <- setNames(squared_deviance, names(mtcars))
for (k in names(mtcars)){
squared_deviance[[k]] <- (mtcars[[k]] - mean(mtcars[[k]]))^2
}
``````

What if we want a `data.frame` as a result? Well, there are many options for transforming a list into other objects. However, and maybe the simplest in this case, will be to store the `for` results in a `data.frame`.

``````squared_deviance <- mtcars #copy the original
squared_deviance[TRUE]<-NA  #replace with NA or do squared_deviance[,]<-NA
for (i in seq_along(mtcars)){
squared_deviance[[i]] <- (mtcars[[i]] - mean(mtcars[[i]]))^2
}
dim(squared_deviance)
 32 11
``````

The result will be the same event though we use the character option (B).

## Optimal Construction of a For Loop

To illustrate the effect of good for loop construction, we will calculate the mean of each column in four different ways:

1. Using a poorly optimized for loop
2. Using a well optimized for for loop
3. Using an `*apply` family of functions
4. Using the `colMeans` function

Each of these options will be shown in code; a comparison of the computational time to execute each option will be shown; and lastly a discussion of the differences will be given.

### Poorly optimized for loop

``````column_mean_poor <- NULL
for (i in 1:length(mtcars)){
column_mean_poor[i] <- mean(mtcars[[i]])
}
``````

### Well optimized for loop

``````column_mean_optimal <- vector("numeric", length(mtcars))
for (i in seq_along(mtcars)){
column_mean_optimal <- mean(mtcars[[i]])
}
``````

### `vapply` Function

``````column_mean_vapply <- vapply(mtcars, mean, numeric(1))
``````

### `colMeans` Function

``````column_mean_colMeans <- colMeans(mtcars)
``````

### Efficiency comparison

The results of benchmarking these four approaches is shown below (code not displayed)

``````Unit: microseconds
expr     min       lq     mean   median       uq     max neval  cld
poor 240.986 262.0820 287.1125 275.8160 307.2485 442.609   100    d
optimal 220.313 237.4455 258.8426 247.0735 280.9130 362.469   100   c
vapply 107.042 109.7320 124.4715 113.4130 132.6695 202.473   100 a
colMeans 155.183 161.6955 180.2067 175.0045 194.2605 259.958   100  b
``````

Notice that the optimized `for` loop edged out the poorly constructed for loop. The poorly constructed for loop is constantly increasing the length of the output object, and at each change of the length, R is reevaluating the class of the object.

Some of this overhead burden is removed by the optimized for loop by declaring the type of output object and its length before starting the loop.

In this example, however, the use of an `vapply` function doubles the computational efficiency, largely because we told R that the result had to be numeric (if any one result were not numeric, an error would be returned).

Use of the `colMeans` function is a touch slower than the `vapply` function. This difference is attributable to some error checks performed in `colMeans` and mainly to the `as.matrix` conversion (because `mtcars` is a `data.frame`) that weren't performed in the `vapply` function.

## The Other Looping Constructs: while and repeat

R provides two additional looping constructs, `while` and `repeat`, which are typically used in situations where the number of iterations required is indeterminate.

### The `while` loop

The general form of a `while` loop is as follows,

``````while (condition) {
## do something
## in loop body
}
``````

where `condition` is evaluated prior to entering the loop body. If `condition` evaluates to `TRUE`, the code inside of the loop body is executed, and this process repeats until `condition` evaluates to `FALSE` (or a `break` statement is reached; see below). Unlike the `for` loop, if a `while` loop uses a variable to perform incremental iterations, the variable must be declared and initialized ahead of time, and must be updated within the loop body. For example, the following loops accomplish the same task:

``````for (i in 0:4) {
cat(i, "\n")
}
# 0
# 1
# 2
# 3
# 4

i <- 0
while (i < 5) {
cat(i, "\n")
i <- i + 1
}
# 0
# 1
# 2
# 3
# 4
``````

In the `while` loop above, the line `i <- i + 1` is necessary to prevent an infinite loop.

Additionally, it is possible to terminate a `while` loop with a call to `break` from inside the loop body:

``````iter <- 0
while (TRUE) {
if (runif(1) < 0.25) {
break
} else {
iter <- iter + 1
}
}
iter
# 4
``````

In this example, `condition` is always `TRUE`, so the only way to terminate the loop is with a call to `break` inside the body. Note that the final value of `iter` will depend on the state of your PRNG when this example is run, and should produce different results (essentially) each time the code is executed.

### The `repeat` loop

The `repeat` construct is essentially the same as `while (TRUE) { ## something }`, and has the following form:

``````repeat ({
## do something
## in loop body
})
``````

The extra `{}` are not required, but the `()` are. Rewriting the previous example using `repeat`,

``````iter <- 0
repeat ({
if (runif(1) < 0.25) {
break
} else {
iter <- iter + 1
}
})
iter
# 2
``````

### More on `break`

It's important to note that `break` will only terminate the immediately enclosing loop. That is, the following is an infinite loop:

``````while (TRUE) {
while (TRUE) {
cat("inner loop\n")
break
}
cat("outer loop\n")
}
``````

With a little creativity, however, it is possible to break entirely from within a nested loop. As an example, consider the following expression, which, in its current state, will loop infinitely:

``````while (TRUE) {
cat("outer loop body\n")
while (TRUE) {
cat("inner loop body\n")
x <- runif(1)
if (x < .3) {
break
} else {
cat(sprintf("x is %.5f\n", x))
}
}
}
``````

One possibility is to recognize that, unlike `break`, the `return` expression does have the ability to return control across multiple levels of enclosing loops. However, since `return` is only valid when used within a function, we cannot simply replace `break` with `return()` above, but also need to wrap the entire expression as an anonymous function:

``````(function() {
while (TRUE) {
cat("outer loop body\n")
while (TRUE) {
cat("inner loop body\n")
x <- runif(1)
if (x < .3) {
return()
} else {
cat(sprintf("x is %.5f\n", x))
}
}
}
})()
``````

Alternatively, we can create a dummy variable (`exit`) prior to the expression, and activate it via `<<-` from the inner loop when we are ready to terminate:

``````exit <- FALSE
while (TRUE) {
cat("outer loop body\n")
while (TRUE) {
cat("inner loop body\n")
x <- runif(1)
if (x < .3) {
exit <<- TRUE
break
} else {
cat(sprintf("x is %.5f\n", x))
}
}
if (exit) break
}
``````

2016-07-22
2016-08-06
R Language Pedia
Icon