3.4 Wrangling data frames

Now that you’re able to successfully import your data from an external file into R our next task is to do something useful with our data. Working with data is a fundamental skill which you’ll need to develop and get comfortable with as you’ll likely do a lot of it during any project. The good news is that R is especially good at manipulating, summarising and visualising data. Manipulating data (often known as data wrangling or munging) in R can at first seem a little daunting for the new user but if you follow a few simple logical rules then you’ll quickly get the hang of it, especially with some practice.

See this video for a general overview on how to use positional and logical indexes to extract data from a data frame object in R

Let’s remind ourselves of the structure of the flowers data frame we imported in the previous section.

flowers <- read.table(file = 'data/flower.txt', header = TRUE, sep = "\t")
str(flowers)
## 'data.frame':    96 obs. of  8 variables:
##  $ treat    : chr  "tip" "tip" "tip" "tip" ...
##  $ nitrogen : chr  "medium" "medium" "medium" "medium" ...
##  $ block    : int  1 1 1 1 1 1 1 1 2 2 ...
##  $ height   : num  7.5 10.7 11.2 10.4 10.4 9.8 6.9 9.4 10.4 12.3 ...
##  $ weight   : num  7.62 12.14 12.76 8.78 13.58 ...
##  $ leafarea : num  11.7 14.1 7.1 11.9 14.5 12.2 13.2 14 10.5 16.1 ...
##  $ shootarea: num  31.9 46 66.7 20.3 26.9 72.7 43.1 28.5 57.8 36.9 ...
##  $ flowers  : int  1 10 10 1 4 9 7 6 5 8 ...

To access the data in any of the variables (columns) in our data frame we can use the $ notation. For example, to access the height variable in our flowers data frame we can use flowers$height. This tells R that the height variable is contained within the data frame flowers.

flowers$height
##  [1]  7.5 10.7 11.2 10.4 10.4  9.8  6.9  9.4 10.4 12.3 10.4 11.0  7.1  6.0  9.0
## [16]  4.5 12.6 10.0 10.0  8.5 14.1 10.1  8.5  6.5 11.5  7.7  6.4  8.8  9.2  6.2
## [31]  6.3 17.2  8.0  8.0  6.4  7.6  9.7 12.3  9.1  8.9  7.4  3.1  7.9  8.8  8.5
## [46]  5.6 11.5  5.8  5.6  5.3  7.5  4.1  3.5  8.5  4.9  2.5  5.4  3.9  5.8  4.5
## [61]  8.0  1.8  2.2  3.9  8.5  8.5  6.4  1.2  2.6 10.9  7.2  2.1  4.7  5.0  6.5
## [76]  2.6  6.0  9.3  4.6  5.2  3.9  2.3  5.2  2.2  4.5  1.8  3.0  3.7  2.4  5.7
## [91]  3.7  3.2  3.9  3.3  5.5  4.4

This will return a vector of the height data. If we want we can assign this vector to another object and do stuff with it, like calculate a mean or get a summary of the variable using the summary() function.

f_height <- flowers$height
mean(f_height)
## [1] 6.839583
summary(f_height)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.200   4.475   6.450   6.840   9.025  17.200

Or if we don’t want to create an additional object we can use functions ‘on-the-fly’ to only display the value in the console.

mean(flowers$height)
## [1] 6.839583
summary(flowers$height)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.200   4.475   6.450   6.840   9.025  17.200

Just as we did with vectors, we also can access data in data frames using the square bracket [ ] notation. However, instead of just using a single index, we now need to use two indexes, one to specify the rows and one for the columns. To do this, we can use the notation my_data[rows, columns] where rows and columns are indexes and my_data is the name of the data frame. Again, just like with our vectors our indexes can be positional or the result of a logical test.

3.4.1 Positional indexes

To use positional indexes we simple have to write the position of the rows and columns we want to extract inside the [ ]. For example, if for some reason we wanted to extract the first value (1^st row ) of the height variable (4^th column).

flowers[1, 4]
## [1] 7.5

# this would give you the same
flowers$height[1]
## [1] 7.5

We can also extract values from multiple rows or columns by specifying these indexes as vectors inside the [ ]. To extract the first 10 rows and the first 4 columns we simple supply a vector containing a sequence from 1 to 10 for the rows index (1:10) and a vector from 1 to 4 for the column index (1:4).

flowers[1:10, 1:4]
##    treat nitrogen block height
## 1    tip   medium     1    7.5
## 2    tip   medium     1   10.7
## 3    tip   medium     1   11.2
## 4    tip   medium     1   10.4
## 5    tip   medium     1   10.4
## 6    tip   medium     1    9.8
## 7    tip   medium     1    6.9
## 8    tip   medium     1    9.4
## 9    tip   medium     2   10.4
## 10   tip   medium     2   12.3

Or for non sequential rows and columns then we can supply vectors of positions using the c() function. To extract the 1^st, 5^th, 12^th, 30^th rows from the 1^st, 3^rd, 6^th and 8^th columns.

flowers[c(1, 5, 12, 30), c(1, 3, 6, 8)]
##    treat block leafarea flowers
## 1    tip     1     11.7       1
## 5    tip     1     14.5       4
## 12   tip     2     12.6       6
## 30   tip     2     11.6       5

All we are doing in the two examples above is creating vectors of positions for the rows and columns that we want to extract. We have done this by using the skills we developed in Chapter 2 when we generated vectors using the c() function or using the : notation.

But what if we want to extract either all of the rows or all of the columns? It would be extremely tedious to have to generate vectors for all rows or for all columns. Thankfully R has a shortcut. If you don’t specify either a row or column index in the [ ] then R interprets it to mean you want all rows or all columns. For example, to extract the first 8 rows and all of the columns in the flower data frame

flowers[1:8, ]
##   treat nitrogen block height weight leafarea shootarea flowers
## 1   tip   medium     1    7.5   7.62     11.7      31.9       1
## 2   tip   medium     1   10.7  12.14     14.1      46.0      10
## 3   tip   medium     1   11.2  12.76      7.1      66.7      10
## 4   tip   medium     1   10.4   8.78     11.9      20.3       1
## 5   tip   medium     1   10.4  13.58     14.5      26.9       4
## 6   tip   medium     1    9.8  10.08     12.2      72.7       9
## 7   tip   medium     1    6.9  10.11     13.2      43.1       7
## 8   tip   medium     1    9.4  10.28     14.0      28.5       6

or all of the rows and the first 3 columns. If you’re reading the web version of this book scroll down in output panel to see all of the data. Note, if you’re reading the pdf version of the book some of the output has been truncated to save some space.

flowers[, 1:3]

##    treat nitrogen block
## 1    tip   medium     1
## 2    tip   medium     1
## 3    tip   medium     1
## 4    tip   medium     1
## 5    tip   medium     1
## 6    tip   medium     1
## 7    tip   medium     1
## 8    tip   medium     1
## 9    tip   medium     2
## 10   tip   medium     2
## 11   tip   medium     2
## 12   tip   medium     2
## 13   tip   medium     2
## 14   tip   medium     2
## 15   tip   medium     2
## 16   tip   medium     2
## 17   tip     high     1
## 18   tip     high     1
## 19   tip     high     1
## 20   tip     high     1
## 21   tip     high     1
## 22   tip     high     1
## 23   tip     high     1
## 24   tip     high     1
## 25   tip     high     2
## 26   tip     high     2
## 27   tip     high     2
## 28   tip     high     2
## 29   tip     high     2
## 30   tip     high     2
## 31   tip     high     2
## 32   tip     high     2
## 33   tip      low     1
## 34   tip      low     1
## 35   tip      low     1
## 36   tip      low     1
## 37   tip      low     1
## 38   tip      low     1
## 39   tip      low     1
## 40   tip      low     1
## 41   tip      low     2
## 42   tip      low     2
## 43   tip      low     2
## 44   tip      low     2
## 45   tip      low     2
## 46   tip      low     2
## 47   tip      low     2
## 48   tip      low     2
## 49 notip   medium     1
## 50 notip   medium     1
## 51 notip   medium     1
## 52 notip   medium     1
## 53 notip   medium     1
## 54 notip   medium     1
## 55 notip   medium     1
## 56 notip   medium     1
## 57 notip   medium     2
## 58 notip   medium     2
## 59 notip   medium     2
## 60 notip   medium     2
## 61 notip   medium     2
## 62 notip   medium     2
## 63 notip   medium     2
## 64 notip   medium     2
## 65 notip     high     1
## 66 notip     high     1
## 67 notip     high     1
## 68 notip     high     1
## 69 notip     high     1
## 70 notip     high     1
## 71 notip     high     1
## 72 notip     high     1
## 73 notip     high     2
## 74 notip     high     2
## 75 notip     high     2
## 76 notip     high     2
## 77 notip     high     2
## 78 notip     high     2
## 79 notip     high     2
## 80 notip     high     2
## 81 notip      low     1
## 82 notip      low     1
## 83 notip      low     1
## 84 notip      low     1
## 85 notip      low     1
## 86 notip      low     1
## 87 notip      low     1
## 88 notip      low     1
## 89 notip      low     2
## 90 notip      low     2
## 91 notip      low     2
## 92 notip      low     2
## 93 notip      low     2
## 94 notip      low     2
## 95 notip      low     2
## 96 notip      low     2

We can even use negative positional indexes to exclude certain rows and columns. As an example, lets extract all of the rows except the first 85 rows and all columns except the 4^th, 7^th and 8^th columns. Notice we need to use -() when we generate our row positional vectors. If we had just used -1:85 this would actually generate a regular sequence from -1 to 85 which is not what we want (we can of course use -1:-85).

flowers[-(1:85), -c(4, 7, 8)]
##    treat nitrogen block weight leafarea
## 86 notip      low     1   6.01     17.6
## 87 notip      low     1   9.93     12.0
## 88 notip      low     1   7.03      7.9
## 89 notip      low     2   9.10     14.5
## 90 notip      low     2   9.05      9.6
## 91 notip      low     2   8.10     10.5
## 92 notip      low     2   7.45     14.1
## 93 notip      low     2   9.19     12.4
## 94 notip      low     2   8.92     11.6
## 95 notip      low     2   8.44     13.5
## 96 notip      low     2  10.60     16.2

In addition to using a positional index for extracting particular columns (variables) we can also name the variables directly when using the square bracket [ ] notation. For example, let’s extract the first 5 rows and the variables treat, nitrogen and leafarea. Instead of using flowers[1:5, c(1, 2, 6)] we can instead use

flowers[1:5, c("treat", "nitrogen", "leafarea")]
##   treat nitrogen leafarea
## 1   tip   medium     11.7
## 2   tip   medium     14.1
## 3   tip   medium      7.1
## 4   tip   medium     11.9
## 5   tip   medium     14.5

We often use this method in preference to the positional index for selecting columns as it will still give us what we want even if we’ve changed the order of the columns in our data frame for some reason.

3.4.2 Logical indexes

Just as we did with vectors, we can also extract data from our data frame based on a logical test. We can use all of the logical operators that we used for our vector examples so if these have slipped your mind maybe pop back and refresh your memory. Let’s extract all rows where height is greater than 12 and extract all columns by default (remember, if you don’t include a column index after the comma it means all columns).

big_flowers <- flowers[flowers$height > 12, ]
big_flowers
##    treat nitrogen block height weight leafarea shootarea flowers
## 10   tip   medium     2   12.3  13.48     16.1      36.9       8
## 17   tip     high     1   12.6  18.66     18.6      54.0       9
## 21   tip     high     1   14.1  19.12     13.1     113.2      13
## 32   tip     high     2   17.2  19.20     10.9      89.9      14
## 38   tip      low     1   12.3  11.27     13.7      28.7       5

Notice in the code above that we need to use the flowers$height notation for the logical test. If we just named the height variable without the name of the data frame we would receive an error telling us R couldn’t find the variable height. The reason for this is that the height variable only exists inside the flowers data frame so you need to tell R exactly where it is.

big_flowers <- flowers[height > 12, ]
Error in `[.data.frame`(flowers, height > 12, ) : 
  object 'height' not found

So how does this work? The logical test is flowers$height > 12 and R will only extract those rows that satisfy this logical condition. If we look at the output of just the logical condition you can see this returns a vector containing TRUE if height is greater than 12 and FALSE if height is not greater than 12.

flowers$height > 12
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
## [37] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

So our row index is a vector containing either TRUE or FALSE values and only those rows that are TRUE are selected.

Other commonly used operators are shown below.

flowers[flowers$height >= 6, ]        # values greater or equal to 6

flowers[flowers$height <= 6, ]          # values less than or equal to 6

flowers[flowers$height == 8, ]         # values  equal to 8

flowers[flowers$height != 8, ]         # values  not equal to 8

We can also extract rows based on the value of a character string or factor level. Let’s extract all rows where the nitrogen level is equal to high (again we will output all columns). Notice that the double equals == sign must be used for a logical test and that the character string must be enclosed in either single or double quotes (i.e. "high").

nit_high <- flowers[flowers$nitrogen == "high", ]        
nit_high

##    treat nitrogen block height weight leafarea shootarea flowers
## 17   tip     high     1   12.6  18.66     18.6      54.0       9
## 18   tip     high     1   10.0  18.07     16.9      90.5       3
## 19   tip     high     1   10.0  13.29     15.8     142.7      12
## 20   tip     high     1    8.5  14.33     13.2      91.4       5
## 21   tip     high     1   14.1  19.12     13.1     113.2      13
## 22   tip     high     1   10.1  15.49     12.6      77.2      12
## 23   tip     high     1    8.5  17.82     20.5      54.4       3
## 24   tip     high     1    6.5  17.13     24.1     147.4       6
## 25   tip     high     2   11.5  23.89     14.3     101.5      12
## 26   tip     high     2    7.7  14.77     17.2     104.5       4
## 27   tip     high     2    6.4  13.60     13.6     152.6       7
## 28   tip     high     2    8.8  16.58     16.7     100.1       9
## 29   tip     high     2    9.2  13.26     11.3     108.0       9
## 30   tip     high     2    6.2  17.32     11.6      85.9       5
## 31   tip     high     2    6.3  14.50     18.3      55.6       8
## 32   tip     high     2   17.2  19.20     10.9      89.9      14
## 65 notip     high     1    8.5  22.53     20.8     166.9      16
## 66 notip     high     1    8.5  17.33     19.8     184.4      12
## 67 notip     high     1    6.4  11.52     12.1     140.5       7
## 68 notip     high     1    1.2  18.24     16.6     148.1       7
## 69 notip     high     1    2.6  16.57     17.1     141.1       3
## 70 notip     high     1   10.9  17.22     49.2     189.6      17
## 71 notip     high     1    7.2  15.21     15.9     135.0      14
## 72 notip     high     1    2.1  19.15     15.6     176.7       6
## 73 notip     high     2    4.7  13.42     19.8     124.7       5
## 74 notip     high     2    5.0  16.82     17.3     182.5      15
## 75 notip     high     2    6.5  14.00     10.1     126.5       7
## 76 notip     high     2    2.6  18.88     16.4     181.5      14
## 77 notip     high     2    6.0  13.68     16.2     133.7       2
## 78 notip     high     2    9.3  18.75     18.4     181.1      16
## 79 notip     high     2    4.6  14.65     16.7      91.7      11
## 80 notip     high     2    5.2  17.70     19.1     181.1       8

Or we can extract all rows where nitrogen level is not equal to medium (using !=) and only return columns 1 to 4.

nit_not_medium <- flowers[flowers$nitrogen != "medium", 1:4]        
nit_not_medium

##    treat nitrogen block height
## 17   tip     high     1   12.6
## 18   tip     high     1   10.0
## 19   tip     high     1   10.0
## 20   tip     high     1    8.5
## 21   tip     high     1   14.1
## 22   tip     high     1   10.1
## 23   tip     high     1    8.5
## 24   tip     high     1    6.5
## 25   tip     high     2   11.5
## 26   tip     high     2    7.7
## 27   tip     high     2    6.4
## 28   tip     high     2    8.8
## 29   tip     high     2    9.2
## 30   tip     high     2    6.2
## 31   tip     high     2    6.3
## 32   tip     high     2   17.2
## 33   tip      low     1    8.0
## 34   tip      low     1    8.0
## 35   tip      low     1    6.4
## 36   tip      low     1    7.6
## 37   tip      low     1    9.7
## 38   tip      low     1   12.3
## 39   tip      low     1    9.1
## 40   tip      low     1    8.9
## 41   tip      low     2    7.4
## 42   tip      low     2    3.1
## 43   tip      low     2    7.9
## 44   tip      low     2    8.8
## 45   tip      low     2    8.5
## 46   tip      low     2    5.6
## 47   tip      low     2   11.5
## 48   tip      low     2    5.8
## 65 notip     high     1    8.5
## 66 notip     high     1    8.5
## 67 notip     high     1    6.4
## 68 notip     high     1    1.2
## 69 notip     high     1    2.6
## 70 notip     high     1   10.9
## 71 notip     high     1    7.2
## 72 notip     high     1    2.1
## 73 notip     high     2    4.7
## 74 notip     high     2    5.0
## 75 notip     high     2    6.5
## 76 notip     high     2    2.6
## 77 notip     high     2    6.0
## 78 notip     high     2    9.3
## 79 notip     high     2    4.6
## 80 notip     high     2    5.2
## 81 notip      low     1    3.9
## 82 notip      low     1    2.3
## 83 notip      low     1    5.2
## 84 notip      low     1    2.2
## 85 notip      low     1    4.5
## 86 notip      low     1    1.8
## 87 notip      low     1    3.0
## 88 notip      low     1    3.7
## 89 notip      low     2    2.4
## 90 notip      low     2    5.7
## 91 notip      low     2    3.7
## 92 notip      low     2    3.2
## 93 notip      low     2    3.9
## 94 notip      low     2    3.3
## 95 notip      low     2    5.5
## 96 notip      low     2    4.4

We can increase the complexity of our logical tests by combining them with Boolean expressions just as we did for vector objects. For example, to extract all rows where height is greater or equal to 6 AND nitrogen is equal to medium AND treat is equal to notip we combine a series of logical expressions with the & symbol.

low_notip_heigh6 <- flowers[flowers$height >= 6 & flowers$nitrogen == "medium" &
                             flowers$treat == "notip", ]        
low_notip_heigh6
##    treat nitrogen block height weight leafarea shootarea flowers
## 51 notip   medium     1    7.5  13.60     13.6     122.2      11
## 54 notip   medium     1    8.5  10.04     12.3     113.6       4
## 61 notip   medium     2    8.0  11.43     12.6      43.2      14

To extract rows based on an ‘OR’ Boolean expression we can use the | symbol. Let’s extract all rows where height is greater than 12.3 OR less than 2.2.

height2.2_12.3 <- flowers[flowers$height > 12.3 | flowers$height < 2.2, ]        
height2.2_12.3
##    treat nitrogen block height weight leafarea shootarea flowers
## 17   tip     high     1   12.6  18.66     18.6      54.0       9
## 21   tip     high     1   14.1  19.12     13.1     113.2      13
## 32   tip     high     2   17.2  19.20     10.9      89.9      14
## 62 notip   medium     2    1.8  10.47     11.8     120.8       9
## 68 notip     high     1    1.2  18.24     16.6     148.1       7
## 72 notip     high     1    2.1  19.15     15.6     176.7       6
## 86 notip      low     1    1.8   6.01     17.6      46.2       4

An alternative method of selecting parts of a data frame based on a logical expression is to use the subset() function instead of the [ ]. The advantage of using subset() is that you no longer need to use the $ notation when specifying variables inside the data frame as the first argument to the function is the name of the data frame to be subsetted. The disadvantage is that subset() is less flexible than the [ ] notation.

tip_med_2 <- subset(flowers, treat == "tip" & nitrogen == "medium" & block == 2)
tip_med_2
##    treat nitrogen block height weight leafarea shootarea flowers
## 9    tip   medium     2   10.4  10.48     10.5      57.8       5
## 10   tip   medium     2   12.3  13.48     16.1      36.9       8
## 11   tip   medium     2   10.4  13.18     11.1      56.8      12
## 12   tip   medium     2   11.0  11.56     12.6      31.3       6
## 13   tip   medium     2    7.1   8.16     29.6       9.7       2
## 14   tip   medium     2    6.0  11.22     13.0      16.4       3
## 15   tip   medium     2    9.0  10.20     10.8      90.1       6
## 16   tip   medium     2    4.5  12.55     13.4      14.4       6

And if you only want certain columns you can use the select = argument.

tipplants <- subset(flowers, treat == "tip" & nitrogen == "medium" & block == 2, 
                            select = c("treat", "nitrogen", "leafarea"))
tipplants
##    treat nitrogen leafarea
## 9    tip   medium     10.5
## 10   tip   medium     16.1
## 11   tip   medium     11.1
## 12   tip   medium     12.6
## 13   tip   medium     29.6
## 14   tip   medium     13.0
## 15   tip   medium     10.8
## 16   tip   medium     13.4

3.4.3 Ordering data frames

Remember when we used the function order() to order one vector based on the order of another vector (way back in Chapter 2). This comes in very handy if you want to reorder rows in your data frame. For example, if we want all of the rows in the data frame flowers to be ordered in ascending value of height and output all columns by default. If you’re reading this section of the book on the web you can scroll down in the output panels to see the entire ordered data frame. If you’re reading the pdf version of the book, note that some of the output from the code chunks has been truncated to save some space.

height_ord <- flowers[order(flowers$height), ]        
height_ord

##    treat nitrogen block height weight leafarea shootarea flowers
## 68 notip     high     1    1.2  18.24     16.6     148.1       7
## 62 notip   medium     2    1.8  10.47     11.8     120.8       9
## 86 notip      low     1    1.8   6.01     17.6      46.2       4
## 72 notip     high     1    2.1  19.15     15.6     176.7       6
## 63 notip   medium     2    2.2  10.70     15.3      97.1       7
## 84 notip      low     1    2.2   9.97      9.6      63.1       2
## 82 notip      low     1    2.3   7.28     13.8      32.8       6
## 89 notip      low     2    2.4   9.10     14.5      78.7       8
## 56 notip   medium     1    2.5  14.85     17.5      77.8      10
## 69 notip     high     1    2.6  16.57     17.1     141.1       3
## 76 notip     high     2    2.6  18.88     16.4     181.5      14
## 87 notip      low     1    3.0   9.93     12.0      56.6       6
## 42   tip      low     2    3.1   8.74     16.1      39.1       3
## 92 notip      low     2    3.2   7.45     14.1      38.1       4
## 94 notip      low     2    3.3   8.92     11.6      55.2       6
## 53 notip   medium     1    3.5  12.93     16.6     109.3       3
## 88 notip      low     1    3.7   7.03      7.9      36.7       5
## 91 notip      low     2    3.7   8.10     10.5      60.5       6
## 58 notip   medium     2    3.9   9.07      9.6      90.4       7
## 64 notip   medium     2    3.9  12.97     17.0      97.5       5
## 81 notip      low     1    3.9   7.17     13.5      52.8       6
## 93 notip      low     2    3.9   9.19     12.4      52.6       9
## 52 notip   medium     1    4.1  12.58     13.9     136.6      11
## 96 notip      low     2    4.4  10.60     16.2      63.3       6
## 16   tip   medium     2    4.5  12.55     13.4      14.4       6
## 60 notip   medium     2    4.5  13.68     14.8     125.5       9
## 85 notip      low     1    4.5   8.60      9.4     113.5       7
## 79 notip     high     2    4.6  14.65     16.7      91.7      11
## 73 notip     high     2    4.7  13.42     19.8     124.7       5
## 55 notip   medium     1    4.9   6.89      8.2      52.9       3
## 74 notip     high     2    5.0  16.82     17.3     182.5      15
## 80 notip     high     2    5.2  17.70     19.1     181.1       8
## 83 notip      low     1    5.2   5.79     11.0      67.4       5
## 50 notip   medium     1    5.3   9.29     11.5      82.3       6
## 57 notip   medium     2    5.4  11.36     17.8     104.6      12
## 95 notip      low     2    5.5   8.44     13.5      77.6       9
## 46   tip      low     2    5.6   8.10     10.1       5.8       2
## 49 notip   medium     1    5.6  11.03     18.6      49.9       8
## 90 notip      low     2    5.7   9.05      9.6      63.2       6
## 48   tip      low     2    5.8   8.04      5.8      30.7       7
## 59 notip   medium     2    5.8  10.18     15.7      88.8       6
## 14   tip   medium     2    6.0  11.22     13.0      16.4       3
## 77 notip     high     2    6.0  13.68     16.2     133.7       2
## 30   tip     high     2    6.2  17.32     11.6      85.9       5
## 31   tip     high     2    6.3  14.50     18.3      55.6       8
## 27   tip     high     2    6.4  13.60     13.6     152.6       7
## 35   tip      low     1    6.4   5.97      8.7       7.3       2
## 67 notip     high     1    6.4  11.52     12.1     140.5       7
## 24   tip     high     1    6.5  17.13     24.1     147.4       6
## 75 notip     high     2    6.5  14.00     10.1     126.5       7
## 7    tip   medium     1    6.9  10.11     13.2      43.1       7
## 13   tip   medium     2    7.1   8.16     29.6       9.7       2
## 71 notip     high     1    7.2  15.21     15.9     135.0      14
## 41   tip      low     2    7.4  10.89     13.3       9.5       5
## 1    tip   medium     1    7.5   7.62     11.7      31.9       1
## 51 notip   medium     1    7.5  13.60     13.6     122.2      11
## 36   tip      low     1    7.6  13.05      7.2      47.2       8
## 26   tip     high     2    7.7  14.77     17.2     104.5       4
## 43   tip      low     2    7.9   8.89      8.4      34.1       4
## 33   tip      low     1    8.0   6.88      9.3      16.1       4
## 34   tip      low     1    8.0  10.23     11.9      88.1       4
## 61 notip   medium     2    8.0  11.43     12.6      43.2      14
## 20   tip     high     1    8.5  14.33     13.2      91.4       5
## 23   tip     high     1    8.5  17.82     20.5      54.4       3
## 45   tip      low     2    8.5   7.16      8.7      29.9       4
## 54 notip   medium     1    8.5  10.04     12.3     113.6       4
## 65 notip     high     1    8.5  22.53     20.8     166.9      16
## 66 notip     high     1    8.5  17.33     19.8     184.4      12
## 28   tip     high     2    8.8  16.58     16.7     100.1       9
## 44   tip      low     2    8.8   9.39      7.1      38.9       4
## 40   tip      low     1    8.9  11.48     11.1      39.4       7
## 15   tip   medium     2    9.0  10.20     10.8      90.1       6
## 39   tip      low     1    9.1   8.96      9.7      23.8       3
## 29   tip     high     2    9.2  13.26     11.3     108.0       9
## 78 notip     high     2    9.3  18.75     18.4     181.1      16
## 8    tip   medium     1    9.4  10.28     14.0      28.5       6
## 37   tip      low     1    9.7   6.49      8.1      18.0       3
## 6    tip   medium     1    9.8  10.08     12.2      72.7       9
## 18   tip     high     1   10.0  18.07     16.9      90.5       3
## 19   tip     high     1   10.0  13.29     15.8     142.7      12
## 22   tip     high     1   10.1  15.49     12.6      77.2      12
## 4    tip   medium     1   10.4   8.78     11.9      20.3       1
## 5    tip   medium     1   10.4  13.58     14.5      26.9       4
## 9    tip   medium     2   10.4  10.48     10.5      57.8       5
## 11   tip   medium     2   10.4  13.18     11.1      56.8      12
## 2    tip   medium     1   10.7  12.14     14.1      46.0      10
## 70 notip     high     1   10.9  17.22     49.2     189.6      17
## 12   tip   medium     2   11.0  11.56     12.6      31.3       6
## 3    tip   medium     1   11.2  12.76      7.1      66.7      10
## 25   tip     high     2   11.5  23.89     14.3     101.5      12
## 47   tip      low     2   11.5   8.72     10.2      28.3       6
## 10   tip   medium     2   12.3  13.48     16.1      36.9       8
## 38   tip      low     1   12.3  11.27     13.7      28.7       5
## 17   tip     high     1   12.6  18.66     18.6      54.0       9
## 21   tip     high     1   14.1  19.12     13.1     113.2      13
## 32   tip     high     2   17.2  19.20     10.9      89.9      14

We can also order by descending order of a variable (i.e. leafarea) using the decreasing = TRUE argument.

leafarea_ord <- flowers[order(flowers$leafarea, decreasing = TRUE), ]        
leafarea_ord

##    treat nitrogen block height weight leafarea shootarea flowers
## 70 notip     high     1   10.9  17.22     49.2     189.6      17
## 13   tip   medium     2    7.1   8.16     29.6       9.7       2
## 24   tip     high     1    6.5  17.13     24.1     147.4       6
## 65 notip     high     1    8.5  22.53     20.8     166.9      16
## 23   tip     high     1    8.5  17.82     20.5      54.4       3
## 66 notip     high     1    8.5  17.33     19.8     184.4      12
## 73 notip     high     2    4.7  13.42     19.8     124.7       5
## 80 notip     high     2    5.2  17.70     19.1     181.1       8
## 17   tip     high     1   12.6  18.66     18.6      54.0       9
## 49 notip   medium     1    5.6  11.03     18.6      49.9       8
## 78 notip     high     2    9.3  18.75     18.4     181.1      16
## 31   tip     high     2    6.3  14.50     18.3      55.6       8
## 57 notip   medium     2    5.4  11.36     17.8     104.6      12
## 86 notip      low     1    1.8   6.01     17.6      46.2       4
## 56 notip   medium     1    2.5  14.85     17.5      77.8      10
## 74 notip     high     2    5.0  16.82     17.3     182.5      15
## 26   tip     high     2    7.7  14.77     17.2     104.5       4
## 69 notip     high     1    2.6  16.57     17.1     141.1       3
## 64 notip   medium     2    3.9  12.97     17.0      97.5       5
## 18   tip     high     1   10.0  18.07     16.9      90.5       3
## 28   tip     high     2    8.8  16.58     16.7     100.1       9
## 79 notip     high     2    4.6  14.65     16.7      91.7      11
## 53 notip   medium     1    3.5  12.93     16.6     109.3       3
## 68 notip     high     1    1.2  18.24     16.6     148.1       7
## 76 notip     high     2    2.6  18.88     16.4     181.5      14
## 77 notip     high     2    6.0  13.68     16.2     133.7       2
## 96 notip      low     2    4.4  10.60     16.2      63.3       6
## 10   tip   medium     2   12.3  13.48     16.1      36.9       8
## 42   tip      low     2    3.1   8.74     16.1      39.1       3
## 71 notip     high     1    7.2  15.21     15.9     135.0      14
## 19   tip     high     1   10.0  13.29     15.8     142.7      12
## 59 notip   medium     2    5.8  10.18     15.7      88.8       6
## 72 notip     high     1    2.1  19.15     15.6     176.7       6
## 63 notip   medium     2    2.2  10.70     15.3      97.1       7
## 60 notip   medium     2    4.5  13.68     14.8     125.5       9
## 5    tip   medium     1   10.4  13.58     14.5      26.9       4
## 89 notip      low     2    2.4   9.10     14.5      78.7       8
## 25   tip     high     2   11.5  23.89     14.3     101.5      12
## 2    tip   medium     1   10.7  12.14     14.1      46.0      10
## 92 notip      low     2    3.2   7.45     14.1      38.1       4
## 8    tip   medium     1    9.4  10.28     14.0      28.5       6
## 52 notip   medium     1    4.1  12.58     13.9     136.6      11
## 82 notip      low     1    2.3   7.28     13.8      32.8       6
## 38   tip      low     1   12.3  11.27     13.7      28.7       5
## 27   tip     high     2    6.4  13.60     13.6     152.6       7
## 51 notip   medium     1    7.5  13.60     13.6     122.2      11
## 81 notip      low     1    3.9   7.17     13.5      52.8       6
## 95 notip      low     2    5.5   8.44     13.5      77.6       9
## 16   tip   medium     2    4.5  12.55     13.4      14.4       6
## 41   tip      low     2    7.4  10.89     13.3       9.5       5
## 7    tip   medium     1    6.9  10.11     13.2      43.1       7
## 20   tip     high     1    8.5  14.33     13.2      91.4       5
## 21   tip     high     1   14.1  19.12     13.1     113.2      13
## 14   tip   medium     2    6.0  11.22     13.0      16.4       3
## 12   tip   medium     2   11.0  11.56     12.6      31.3       6
## 22   tip     high     1   10.1  15.49     12.6      77.2      12
## 61 notip   medium     2    8.0  11.43     12.6      43.2      14
## 93 notip      low     2    3.9   9.19     12.4      52.6       9
## 54 notip   medium     1    8.5  10.04     12.3     113.6       4
## 6    tip   medium     1    9.8  10.08     12.2      72.7       9
## 67 notip     high     1    6.4  11.52     12.1     140.5       7
## 87 notip      low     1    3.0   9.93     12.0      56.6       6
## 4    tip   medium     1   10.4   8.78     11.9      20.3       1
## 34   tip      low     1    8.0  10.23     11.9      88.1       4
## 62 notip   medium     2    1.8  10.47     11.8     120.8       9
## 1    tip   medium     1    7.5   7.62     11.7      31.9       1
## 30   tip     high     2    6.2  17.32     11.6      85.9       5
## 94 notip      low     2    3.3   8.92     11.6      55.2       6
## 50 notip   medium     1    5.3   9.29     11.5      82.3       6
## 29   tip     high     2    9.2  13.26     11.3     108.0       9
## 11   tip   medium     2   10.4  13.18     11.1      56.8      12
## 40   tip      low     1    8.9  11.48     11.1      39.4       7
## 83 notip      low     1    5.2   5.79     11.0      67.4       5
## 32   tip     high     2   17.2  19.20     10.9      89.9      14
## 15   tip   medium     2    9.0  10.20     10.8      90.1       6
## 9    tip   medium     2   10.4  10.48     10.5      57.8       5
## 91 notip      low     2    3.7   8.10     10.5      60.5       6
## 47   tip      low     2   11.5   8.72     10.2      28.3       6
## 46   tip      low     2    5.6   8.10     10.1       5.8       2
## 75 notip     high     2    6.5  14.00     10.1     126.5       7
## 39   tip      low     1    9.1   8.96      9.7      23.8       3
## 58 notip   medium     2    3.9   9.07      9.6      90.4       7
## 84 notip      low     1    2.2   9.97      9.6      63.1       2
## 90 notip      low     2    5.7   9.05      9.6      63.2       6
## 85 notip      low     1    4.5   8.60      9.4     113.5       7
## 33   tip      low     1    8.0   6.88      9.3      16.1       4
## 35   tip      low     1    6.4   5.97      8.7       7.3       2
## 45   tip      low     2    8.5   7.16      8.7      29.9       4
## 43   tip      low     2    7.9   8.89      8.4      34.1       4
## 55 notip   medium     1    4.9   6.89      8.2      52.9       3
## 37   tip      low     1    9.7   6.49      8.1      18.0       3
## 88 notip      low     1    3.7   7.03      7.9      36.7       5
## 36   tip      low     1    7.6  13.05      7.2      47.2       8
## 3    tip   medium     1   11.2  12.76      7.1      66.7      10
## 44   tip      low     2    8.8   9.39      7.1      38.9       4
## 48   tip      low     2    5.8   8.04      5.8      30.7       7

We can even order data frames based on multiple variables. For example, to order the data frame flowers in ascending order of both block and height.

block_height_ord <- flowers[order(flowers$block, flowers$height), ]        
block_height_ord

##    treat nitrogen block height weight leafarea shootarea flowers
## 68 notip     high     1    1.2  18.24     16.6     148.1       7
## 86 notip      low     1    1.8   6.01     17.6      46.2       4
## 72 notip     high     1    2.1  19.15     15.6     176.7       6
## 84 notip      low     1    2.2   9.97      9.6      63.1       2
## 82 notip      low     1    2.3   7.28     13.8      32.8       6
## 56 notip   medium     1    2.5  14.85     17.5      77.8      10
## 69 notip     high     1    2.6  16.57     17.1     141.1       3
## 87 notip      low     1    3.0   9.93     12.0      56.6       6
## 53 notip   medium     1    3.5  12.93     16.6     109.3       3
## 88 notip      low     1    3.7   7.03      7.9      36.7       5
## 81 notip      low     1    3.9   7.17     13.5      52.8       6
## 52 notip   medium     1    4.1  12.58     13.9     136.6      11
## 85 notip      low     1    4.5   8.60      9.4     113.5       7
## 55 notip   medium     1    4.9   6.89      8.2      52.9       3
## 83 notip      low     1    5.2   5.79     11.0      67.4       5
## 50 notip   medium     1    5.3   9.29     11.5      82.3       6
## 49 notip   medium     1    5.6  11.03     18.6      49.9       8
## 35   tip      low     1    6.4   5.97      8.7       7.3       2
## 67 notip     high     1    6.4  11.52     12.1     140.5       7
## 24   tip     high     1    6.5  17.13     24.1     147.4       6
## 7    tip   medium     1    6.9  10.11     13.2      43.1       7
## 71 notip     high     1    7.2  15.21     15.9     135.0      14
## 1    tip   medium     1    7.5   7.62     11.7      31.9       1
## 51 notip   medium     1    7.5  13.60     13.6     122.2      11
## 36   tip      low     1    7.6  13.05      7.2      47.2       8
## 33   tip      low     1    8.0   6.88      9.3      16.1       4
## 34   tip      low     1    8.0  10.23     11.9      88.1       4
## 20   tip     high     1    8.5  14.33     13.2      91.4       5
## 23   tip     high     1    8.5  17.82     20.5      54.4       3
## 54 notip   medium     1    8.5  10.04     12.3     113.6       4
## 65 notip     high     1    8.5  22.53     20.8     166.9      16
## 66 notip     high     1    8.5  17.33     19.8     184.4      12
## 40   tip      low     1    8.9  11.48     11.1      39.4       7
## 39   tip      low     1    9.1   8.96      9.7      23.8       3
## 8    tip   medium     1    9.4  10.28     14.0      28.5       6
## 37   tip      low     1    9.7   6.49      8.1      18.0       3
## 6    tip   medium     1    9.8  10.08     12.2      72.7       9
## 18   tip     high     1   10.0  18.07     16.9      90.5       3
## 19   tip     high     1   10.0  13.29     15.8     142.7      12
## 22   tip     high     1   10.1  15.49     12.6      77.2      12
## 4    tip   medium     1   10.4   8.78     11.9      20.3       1
## 5    tip   medium     1   10.4  13.58     14.5      26.9       4
## 2    tip   medium     1   10.7  12.14     14.1      46.0      10
## 70 notip     high     1   10.9  17.22     49.2     189.6      17
## 3    tip   medium     1   11.2  12.76      7.1      66.7      10
## 38   tip      low     1   12.3  11.27     13.7      28.7       5
## 17   tip     high     1   12.6  18.66     18.6      54.0       9
## 21   tip     high     1   14.1  19.12     13.1     113.2      13
## 62 notip   medium     2    1.8  10.47     11.8     120.8       9
## 63 notip   medium     2    2.2  10.70     15.3      97.1       7
## 89 notip      low     2    2.4   9.10     14.5      78.7       8
## 76 notip     high     2    2.6  18.88     16.4     181.5      14
## 42   tip      low     2    3.1   8.74     16.1      39.1       3
## 92 notip      low     2    3.2   7.45     14.1      38.1       4
## 94 notip      low     2    3.3   8.92     11.6      55.2       6
## 91 notip      low     2    3.7   8.10     10.5      60.5       6
## 58 notip   medium     2    3.9   9.07      9.6      90.4       7
## 64 notip   medium     2    3.9  12.97     17.0      97.5       5
## 93 notip      low     2    3.9   9.19     12.4      52.6       9
## 96 notip      low     2    4.4  10.60     16.2      63.3       6
## 16   tip   medium     2    4.5  12.55     13.4      14.4       6
## 60 notip   medium     2    4.5  13.68     14.8     125.5       9
## 79 notip     high     2    4.6  14.65     16.7      91.7      11
## 73 notip     high     2    4.7  13.42     19.8     124.7       5
## 74 notip     high     2    5.0  16.82     17.3     182.5      15
## 80 notip     high     2    5.2  17.70     19.1     181.1       8
## 57 notip   medium     2    5.4  11.36     17.8     104.6      12
## 95 notip      low     2    5.5   8.44     13.5      77.6       9
## 46   tip      low     2    5.6   8.10     10.1       5.8       2
## 90 notip      low     2    5.7   9.05      9.6      63.2       6
## 48   tip      low     2    5.8   8.04      5.8      30.7       7
## 59 notip   medium     2    5.8  10.18     15.7      88.8       6
## 14   tip   medium     2    6.0  11.22     13.0      16.4       3
## 77 notip     high     2    6.0  13.68     16.2     133.7       2
## 30   tip     high     2    6.2  17.32     11.6      85.9       5
## 31   tip     high     2    6.3  14.50     18.3      55.6       8
## 27   tip     high     2    6.4  13.60     13.6     152.6       7
## 75 notip     high     2    6.5  14.00     10.1     126.5       7
## 13   tip   medium     2    7.1   8.16     29.6       9.7       2
## 41   tip      low     2    7.4  10.89     13.3       9.5       5
## 26   tip     high     2    7.7  14.77     17.2     104.5       4
## 43   tip      low     2    7.9   8.89      8.4      34.1       4
## 61 notip   medium     2    8.0  11.43     12.6      43.2      14
## 45   tip      low     2    8.5   7.16      8.7      29.9       4
## 28   tip     high     2    8.8  16.58     16.7     100.1       9
## 44   tip      low     2    8.8   9.39      7.1      38.9       4
## 15   tip   medium     2    9.0  10.20     10.8      90.1       6
## 29   tip     high     2    9.2  13.26     11.3     108.0       9
## 78 notip     high     2    9.3  18.75     18.4     181.1      16
## 9    tip   medium     2   10.4  10.48     10.5      57.8       5
## 11   tip   medium     2   10.4  13.18     11.1      56.8      12
## 12   tip   medium     2   11.0  11.56     12.6      31.3       6
## 25   tip     high     2   11.5  23.89     14.3     101.5      12
## 47   tip      low     2   11.5   8.72     10.2      28.3       6
## 10   tip   medium     2   12.3  13.48     16.1      36.9       8
## 32   tip     high     2   17.2  19.20     10.9      89.9      14

What if we wanted to order flowers by ascending order of block but descending order of height? We can use a simple trick by adding a - symbol before the flowers$height variable when we use the order() function. This will essentially turn all of the height values negative which will result in reversing the order. Note, that this trick will only work with numeric variables.

block_revheight_ord <- flowers[order(flowers$block, -flowers$height), ]        
block_revheight_ord

##    treat nitrogen block height weight leafarea shootarea flowers
## 21   tip     high     1   14.1  19.12     13.1     113.2      13
## 17   tip     high     1   12.6  18.66     18.6      54.0       9
## 38   tip      low     1   12.3  11.27     13.7      28.7       5
## 3    tip   medium     1   11.2  12.76      7.1      66.7      10
## 70 notip     high     1   10.9  17.22     49.2     189.6      17
## 2    tip   medium     1   10.7  12.14     14.1      46.0      10
## 4    tip   medium     1   10.4   8.78     11.9      20.3       1
## 5    tip   medium     1   10.4  13.58     14.5      26.9       4
## 22   tip     high     1   10.1  15.49     12.6      77.2      12
## 18   tip     high     1   10.0  18.07     16.9      90.5       3
## 19   tip     high     1   10.0  13.29     15.8     142.7      12
## 6    tip   medium     1    9.8  10.08     12.2      72.7       9
## 37   tip      low     1    9.7   6.49      8.1      18.0       3
## 8    tip   medium     1    9.4  10.28     14.0      28.5       6
## 39   tip      low     1    9.1   8.96      9.7      23.8       3
## 40   tip      low     1    8.9  11.48     11.1      39.4       7
## 20   tip     high     1    8.5  14.33     13.2      91.4       5
## 23   tip     high     1    8.5  17.82     20.5      54.4       3
## 54 notip   medium     1    8.5  10.04     12.3     113.6       4
## 65 notip     high     1    8.5  22.53     20.8     166.9      16
## 66 notip     high     1    8.5  17.33     19.8     184.4      12
## 33   tip      low     1    8.0   6.88      9.3      16.1       4
## 34   tip      low     1    8.0  10.23     11.9      88.1       4
## 36   tip      low     1    7.6  13.05      7.2      47.2       8
## 1    tip   medium     1    7.5   7.62     11.7      31.9       1
## 51 notip   medium     1    7.5  13.60     13.6     122.2      11
## 71 notip     high     1    7.2  15.21     15.9     135.0      14
## 7    tip   medium     1    6.9  10.11     13.2      43.1       7
## 24   tip     high     1    6.5  17.13     24.1     147.4       6
## 35   tip      low     1    6.4   5.97      8.7       7.3       2
## 67 notip     high     1    6.4  11.52     12.1     140.5       7
## 49 notip   medium     1    5.6  11.03     18.6      49.9       8
## 50 notip   medium     1    5.3   9.29     11.5      82.3       6
## 83 notip      low     1    5.2   5.79     11.0      67.4       5
## 55 notip   medium     1    4.9   6.89      8.2      52.9       3
## 85 notip      low     1    4.5   8.60      9.4     113.5       7
## 52 notip   medium     1    4.1  12.58     13.9     136.6      11
## 81 notip      low     1    3.9   7.17     13.5      52.8       6
## 88 notip      low     1    3.7   7.03      7.9      36.7       5
## 53 notip   medium     1    3.5  12.93     16.6     109.3       3
## 87 notip      low     1    3.0   9.93     12.0      56.6       6
## 69 notip     high     1    2.6  16.57     17.1     141.1       3
## 56 notip   medium     1    2.5  14.85     17.5      77.8      10
## 82 notip      low     1    2.3   7.28     13.8      32.8       6
## 84 notip      low     1    2.2   9.97      9.6      63.1       2
## 72 notip     high     1    2.1  19.15     15.6     176.7       6
## 86 notip      low     1    1.8   6.01     17.6      46.2       4
## 68 notip     high     1    1.2  18.24     16.6     148.1       7
## 32   tip     high     2   17.2  19.20     10.9      89.9      14
## 10   tip   medium     2   12.3  13.48     16.1      36.9       8
## 25   tip     high     2   11.5  23.89     14.3     101.5      12
## 47   tip      low     2   11.5   8.72     10.2      28.3       6
## 12   tip   medium     2   11.0  11.56     12.6      31.3       6
## 9    tip   medium     2   10.4  10.48     10.5      57.8       5
## 11   tip   medium     2   10.4  13.18     11.1      56.8      12
## 78 notip     high     2    9.3  18.75     18.4     181.1      16
## 29   tip     high     2    9.2  13.26     11.3     108.0       9
## 15   tip   medium     2    9.0  10.20     10.8      90.1       6
## 28   tip     high     2    8.8  16.58     16.7     100.1       9
## 44   tip      low     2    8.8   9.39      7.1      38.9       4
## 45   tip      low     2    8.5   7.16      8.7      29.9       4
## 61 notip   medium     2    8.0  11.43     12.6      43.2      14
## 43   tip      low     2    7.9   8.89      8.4      34.1       4
## 26   tip     high     2    7.7  14.77     17.2     104.5       4
## 41   tip      low     2    7.4  10.89     13.3       9.5       5
## 13   tip   medium     2    7.1   8.16     29.6       9.7       2
## 75 notip     high     2    6.5  14.00     10.1     126.5       7
## 27   tip     high     2    6.4  13.60     13.6     152.6       7
## 31   tip     high     2    6.3  14.50     18.3      55.6       8
## 30   tip     high     2    6.2  17.32     11.6      85.9       5
## 14   tip   medium     2    6.0  11.22     13.0      16.4       3
## 77 notip     high     2    6.0  13.68     16.2     133.7       2
## 48   tip      low     2    5.8   8.04      5.8      30.7       7
## 59 notip   medium     2    5.8  10.18     15.7      88.8       6
## 90 notip      low     2    5.7   9.05      9.6      63.2       6
## 46   tip      low     2    5.6   8.10     10.1       5.8       2
## 95 notip      low     2    5.5   8.44     13.5      77.6       9
## 57 notip   medium     2    5.4  11.36     17.8     104.6      12
## 80 notip     high     2    5.2  17.70     19.1     181.1       8
## 74 notip     high     2    5.0  16.82     17.3     182.5      15
## 73 notip     high     2    4.7  13.42     19.8     124.7       5
## 79 notip     high     2    4.6  14.65     16.7      91.7      11
## 16   tip   medium     2    4.5  12.55     13.4      14.4       6
## 60 notip   medium     2    4.5  13.68     14.8     125.5       9
## 96 notip      low     2    4.4  10.60     16.2      63.3       6
## 58 notip   medium     2    3.9   9.07      9.6      90.4       7
## 64 notip   medium     2    3.9  12.97     17.0      97.5       5
## 93 notip      low     2    3.9   9.19     12.4      52.6       9
## 91 notip      low     2    3.7   8.10     10.5      60.5       6
## 94 notip      low     2    3.3   8.92     11.6      55.2       6
## 92 notip      low     2    3.2   7.45     14.1      38.1       4
## 42   tip      low     2    3.1   8.74     16.1      39.1       3
## 76 notip     high     2    2.6  18.88     16.4     181.5      14
## 89 notip      low     2    2.4   9.10     14.5      78.7       8
## 63 notip   medium     2    2.2  10.70     15.3      97.1       7
## 62 notip   medium     2    1.8  10.47     11.8     120.8       9

If we wanted to do the same thing with a factor (or character) variable like nitrogen we would need to use the function xtfrm() for this variable inside our order() function.

block_revheight_ord <- flowers[order(-xtfrm(flowers$nitrogen), flowers$height), ]        
block_revheight_ord

##    treat nitrogen block height weight leafarea shootarea flowers
## 62 notip   medium     2    1.8  10.47     11.8     120.8       9
## 63 notip   medium     2    2.2  10.70     15.3      97.1       7
## 56 notip   medium     1    2.5  14.85     17.5      77.8      10
## 53 notip   medium     1    3.5  12.93     16.6     109.3       3
## 58 notip   medium     2    3.9   9.07      9.6      90.4       7
## 64 notip   medium     2    3.9  12.97     17.0      97.5       5
## 52 notip   medium     1    4.1  12.58     13.9     136.6      11
## 16   tip   medium     2    4.5  12.55     13.4      14.4       6
## 60 notip   medium     2    4.5  13.68     14.8     125.5       9
## 55 notip   medium     1    4.9   6.89      8.2      52.9       3
## 50 notip   medium     1    5.3   9.29     11.5      82.3       6
## 57 notip   medium     2    5.4  11.36     17.8     104.6      12
## 49 notip   medium     1    5.6  11.03     18.6      49.9       8
## 59 notip   medium     2    5.8  10.18     15.7      88.8       6
## 14   tip   medium     2    6.0  11.22     13.0      16.4       3
## 7    tip   medium     1    6.9  10.11     13.2      43.1       7
## 13   tip   medium     2    7.1   8.16     29.6       9.7       2
## 1    tip   medium     1    7.5   7.62     11.7      31.9       1
## 51 notip   medium     1    7.5  13.60     13.6     122.2      11
## 61 notip   medium     2    8.0  11.43     12.6      43.2      14
## 54 notip   medium     1    8.5  10.04     12.3     113.6       4
## 15   tip   medium     2    9.0  10.20     10.8      90.1       6
## 8    tip   medium     1    9.4  10.28     14.0      28.5       6
## 6    tip   medium     1    9.8  10.08     12.2      72.7       9
## 4    tip   medium     1   10.4   8.78     11.9      20.3       1
## 5    tip   medium     1   10.4  13.58     14.5      26.9       4
## 9    tip   medium     2   10.4  10.48     10.5      57.8       5
## 11   tip   medium     2   10.4  13.18     11.1      56.8      12
## 2    tip   medium     1   10.7  12.14     14.1      46.0      10
## 12   tip   medium     2   11.0  11.56     12.6      31.3       6
## 3    tip   medium     1   11.2  12.76      7.1      66.7      10
## 10   tip   medium     2   12.3  13.48     16.1      36.9       8
## 86 notip      low     1    1.8   6.01     17.6      46.2       4
## 84 notip      low     1    2.2   9.97      9.6      63.1       2
## 82 notip      low     1    2.3   7.28     13.8      32.8       6
## 89 notip      low     2    2.4   9.10     14.5      78.7       8
## 87 notip      low     1    3.0   9.93     12.0      56.6       6
## 42   tip      low     2    3.1   8.74     16.1      39.1       3
## 92 notip      low     2    3.2   7.45     14.1      38.1       4
## 94 notip      low     2    3.3   8.92     11.6      55.2       6
## 88 notip      low     1    3.7   7.03      7.9      36.7       5
## 91 notip      low     2    3.7   8.10     10.5      60.5       6
## 81 notip      low     1    3.9   7.17     13.5      52.8       6
## 93 notip      low     2    3.9   9.19     12.4      52.6       9
## 96 notip      low     2    4.4  10.60     16.2      63.3       6
## 85 notip      low     1    4.5   8.60      9.4     113.5       7
## 83 notip      low     1    5.2   5.79     11.0      67.4       5
## 95 notip      low     2    5.5   8.44     13.5      77.6       9
## 46   tip      low     2    5.6   8.10     10.1       5.8       2
## 90 notip      low     2    5.7   9.05      9.6      63.2       6
## 48   tip      low     2    5.8   8.04      5.8      30.7       7
## 35   tip      low     1    6.4   5.97      8.7       7.3       2
## 41   tip      low     2    7.4  10.89     13.3       9.5       5
## 36   tip      low     1    7.6  13.05      7.2      47.2       8
## 43   tip      low     2    7.9   8.89      8.4      34.1       4
## 33   tip      low     1    8.0   6.88      9.3      16.1       4
## 34   tip      low     1    8.0  10.23     11.9      88.1       4
## 45   tip      low     2    8.5   7.16      8.7      29.9       4
## 44   tip      low     2    8.8   9.39      7.1      38.9       4
## 40   tip      low     1    8.9  11.48     11.1      39.4       7
## 39   tip      low     1    9.1   8.96      9.7      23.8       3
## 37   tip      low     1    9.7   6.49      8.1      18.0       3
## 47   tip      low     2   11.5   8.72     10.2      28.3       6
## 38   tip      low     1   12.3  11.27     13.7      28.7       5
## 68 notip     high     1    1.2  18.24     16.6     148.1       7
## 72 notip     high     1    2.1  19.15     15.6     176.7       6
## 69 notip     high     1    2.6  16.57     17.1     141.1       3
## 76 notip     high     2    2.6  18.88     16.4     181.5      14
## 79 notip     high     2    4.6  14.65     16.7      91.7      11
## 73 notip     high     2    4.7  13.42     19.8     124.7       5
## 74 notip     high     2    5.0  16.82     17.3     182.5      15
## 80 notip     high     2    5.2  17.70     19.1     181.1       8
## 77 notip     high     2    6.0  13.68     16.2     133.7       2
## 30   tip     high     2    6.2  17.32     11.6      85.9       5
## 31   tip     high     2    6.3  14.50     18.3      55.6       8
## 27   tip     high     2    6.4  13.60     13.6     152.6       7
## 67 notip     high     1    6.4  11.52     12.1     140.5       7
## 24   tip     high     1    6.5  17.13     24.1     147.4       6
## 75 notip     high     2    6.5  14.00     10.1     126.5       7
## 71 notip     high     1    7.2  15.21     15.9     135.0      14
## 26   tip     high     2    7.7  14.77     17.2     104.5       4
## 20   tip     high     1    8.5  14.33     13.2      91.4       5
## 23   tip     high     1    8.5  17.82     20.5      54.4       3
## 65 notip     high     1    8.5  22.53     20.8     166.9      16
## 66 notip     high     1    8.5  17.33     19.8     184.4      12
## 28   tip     high     2    8.8  16.58     16.7     100.1       9
## 29   tip     high     2    9.2  13.26     11.3     108.0       9
## 78 notip     high     2    9.3  18.75     18.4     181.1      16
## 18   tip     high     1   10.0  18.07     16.9      90.5       3
## 19   tip     high     1   10.0  13.29     15.8     142.7      12
## 22   tip     high     1   10.1  15.49     12.6      77.2      12
## 70 notip     high     1   10.9  17.22     49.2     189.6      17
## 25   tip     high     2   11.5  23.89     14.3     101.5      12
## 17   tip     high     1   12.6  18.66     18.6      54.0       9
## 21   tip     high     1   14.1  19.12     13.1     113.2      13
## 32   tip     high     2   17.2  19.20     10.9      89.9      14

Notice that the nitrogen variable has been reverse ordered alphabetically and height has been ordered by increasing values within each level of nitrogen.

If we wanted to order the data frame by nitrogen but this time order it from low -> medium -> high instead of the default alphabetically (high, low, medium), we need to first change the order of our levels of the nitrogen factor in our data frame using the factor() function. Once we’ve done this we can then use the order() function as usual. Note, if you’re reading the pdf version of this book, the output has been truncated to save space.

flowers$nitrogen <- factor(flowers$nitrogen, 
                           levels = c("low", "medium", "high"))    
nit_ord <- flowers[order(flowers$nitrogen),]
nit_ord

##    treat nitrogen block height weight leafarea shootarea flowers
## 33   tip      low     1    8.0   6.88      9.3      16.1       4
## 34   tip      low     1    8.0  10.23     11.9      88.1       4
## 35   tip      low     1    6.4   5.97      8.7       7.3       2
## 36   tip      low     1    7.6  13.05      7.2      47.2       8
## 37   tip      low     1    9.7   6.49      8.1      18.0       3
## 38   tip      low     1   12.3  11.27     13.7      28.7       5
## 39   tip      low     1    9.1   8.96      9.7      23.8       3
## 40   tip      low     1    8.9  11.48     11.1      39.4       7
## 41   tip      low     2    7.4  10.89     13.3       9.5       5
## 42   tip      low     2    3.1   8.74     16.1      39.1       3
## 43   tip      low     2    7.9   8.89      8.4      34.1       4
## 44   tip      low     2    8.8   9.39      7.1      38.9       4
## 45   tip      low     2    8.5   7.16      8.7      29.9       4
## 46   tip      low     2    5.6   8.10     10.1       5.8       2
## 47   tip      low     2   11.5   8.72     10.2      28.3       6
## 48   tip      low     2    5.8   8.04      5.8      30.7       7
## 81 notip      low     1    3.9   7.17     13.5      52.8       6
## 82 notip      low     1    2.3   7.28     13.8      32.8       6
## 83 notip      low     1    5.2   5.79     11.0      67.4       5
## 84 notip      low     1    2.2   9.97      9.6      63.1       2
## 85 notip      low     1    4.5   8.60      9.4     113.5       7
## 86 notip      low     1    1.8   6.01     17.6      46.2       4
## 87 notip      low     1    3.0   9.93     12.0      56.6       6
## 88 notip      low     1    3.7   7.03      7.9      36.7       5
## 89 notip      low     2    2.4   9.10     14.5      78.7       8
## 90 notip      low     2    5.7   9.05      9.6      63.2       6
## 91 notip      low     2    3.7   8.10     10.5      60.5       6
## 92 notip      low     2    3.2   7.45     14.1      38.1       4
## 93 notip      low     2    3.9   9.19     12.4      52.6       9
## 94 notip      low     2    3.3   8.92     11.6      55.2       6
## 95 notip      low     2    5.5   8.44     13.5      77.6       9
## 96 notip      low     2    4.4  10.60     16.2      63.3       6
## 1    tip   medium     1    7.5   7.62     11.7      31.9       1
## 2    tip   medium     1   10.7  12.14     14.1      46.0      10
## 3    tip   medium     1   11.2  12.76      7.1      66.7      10
## 4    tip   medium     1   10.4   8.78     11.9      20.3       1
## 5    tip   medium     1   10.4  13.58     14.5      26.9       4
## 6    tip   medium     1    9.8  10.08     12.2      72.7       9
## 7    tip   medium     1    6.9  10.11     13.2      43.1       7
## 8    tip   medium     1    9.4  10.28     14.0      28.5       6
## 9    tip   medium     2   10.4  10.48     10.5      57.8       5
## 10   tip   medium     2   12.3  13.48     16.1      36.9       8
## 11   tip   medium     2   10.4  13.18     11.1      56.8      12
## 12   tip   medium     2   11.0  11.56     12.6      31.3       6
## 13   tip   medium     2    7.1   8.16     29.6       9.7       2
## 14   tip   medium     2    6.0  11.22     13.0      16.4       3
## 15   tip   medium     2    9.0  10.20     10.8      90.1       6
## 16   tip   medium     2    4.5  12.55     13.4      14.4       6
## 49 notip   medium     1    5.6  11.03     18.6      49.9       8
## 50 notip   medium     1    5.3   9.29     11.5      82.3       6
## 51 notip   medium     1    7.5  13.60     13.6     122.2      11
## 52 notip   medium     1    4.1  12.58     13.9     136.6      11
## 53 notip   medium     1    3.5  12.93     16.6     109.3       3
## 54 notip   medium     1    8.5  10.04     12.3     113.6       4
## 55 notip   medium     1    4.9   6.89      8.2      52.9       3
## 56 notip   medium     1    2.5  14.85     17.5      77.8      10
## 57 notip   medium     2    5.4  11.36     17.8     104.6      12
## 58 notip   medium     2    3.9   9.07      9.6      90.4       7
## 59 notip   medium     2    5.8  10.18     15.7      88.8       6
## 60 notip   medium     2    4.5  13.68     14.8     125.5       9
## 61 notip   medium     2    8.0  11.43     12.6      43.2      14
## 62 notip   medium     2    1.8  10.47     11.8     120.8       9
## 63 notip   medium     2    2.2  10.70     15.3      97.1       7
## 64 notip   medium     2    3.9  12.97     17.0      97.5       5
## 17   tip     high     1   12.6  18.66     18.6      54.0       9
## 18   tip     high     1   10.0  18.07     16.9      90.5       3
## 19   tip     high     1   10.0  13.29     15.8     142.7      12
## 20   tip     high     1    8.5  14.33     13.2      91.4       5
## 21   tip     high     1   14.1  19.12     13.1     113.2      13
## 22   tip     high     1   10.1  15.49     12.6      77.2      12
## 23   tip     high     1    8.5  17.82     20.5      54.4       3
## 24   tip     high     1    6.5  17.13     24.1     147.4       6
## 25   tip     high     2   11.5  23.89     14.3     101.5      12
## 26   tip     high     2    7.7  14.77     17.2     104.5       4
## 27   tip     high     2    6.4  13.60     13.6     152.6       7
## 28   tip     high     2    8.8  16.58     16.7     100.1       9
## 29   tip     high     2    9.2  13.26     11.3     108.0       9
## 30   tip     high     2    6.2  17.32     11.6      85.9       5
## 31   tip     high     2    6.3  14.50     18.3      55.6       8
## 32   tip     high     2   17.2  19.20     10.9      89.9      14
## 65 notip     high     1    8.5  22.53     20.8     166.9      16
## 66 notip     high     1    8.5  17.33     19.8     184.4      12
## 67 notip     high     1    6.4  11.52     12.1     140.5       7
## 68 notip     high     1    1.2  18.24     16.6     148.1       7
## 69 notip     high     1    2.6  16.57     17.1     141.1       3
## 70 notip     high     1   10.9  17.22     49.2     189.6      17
## 71 notip     high     1    7.2  15.21     15.9     135.0      14
## 72 notip     high     1    2.1  19.15     15.6     176.7       6
## 73 notip     high     2    4.7  13.42     19.8     124.7       5
## 74 notip     high     2    5.0  16.82     17.3     182.5      15
## 75 notip     high     2    6.5  14.00     10.1     126.5       7
## 76 notip     high     2    2.6  18.88     16.4     181.5      14
## 77 notip     high     2    6.0  13.68     16.2     133.7       2
## 78 notip     high     2    9.3  18.75     18.4     181.1      16
## 79 notip     high     2    4.6  14.65     16.7      91.7      11
## 80 notip     high     2    5.2  17.70     19.1     181.1       8

3.4.4 Adding columns and rows

Sometimes it’s useful to be able to add extra rows and columns of data to our data frames. There are multiple ways to achieve this (as there always is in R!) depending on your circumstances. To simply append additional rows to an existing data frame we can use the rbind() function and to append columns the cbind() function. Let’s create a couple of test data frames to see this in action using our old friend the data.frame() function.

# rbind for rows
df1 <- data.frame(id = 1:4, height = c(120, 150, 132, 122),
                        weight = c(44, 56, 49, 45))
df1
##   id height weight
## 1  1    120     44
## 2  2    150     56
## 3  3    132     49
## 4  4    122     45

df2 <- data.frame(id = 5:6, height = c(119, 110),
                        weight = c(39, 35))
df2
##   id height weight
## 1  5    119     39
## 2  6    110     35

df3 <- data.frame(id = 1:4, height = c(120, 150, 132, 122),
                        weight = c(44, 56, 49, 45))
df3
##   id height weight
## 1  1    120     44
## 2  2    150     56
## 3  3    132     49
## 4  4    122     45

df4 <- data.frame(location = c("UK", "CZ", "CZ", "UK"))
df4
##   location
## 1       UK
## 2       CZ
## 3       CZ
## 4       UK

We can use the rbind() function to append the rows of data in df2 to the rows in df1 and assign the new data frame to df_rcomb.

df_rcomb <- rbind(df1, df2)
df_rcomb
##   id height weight
## 1  1    120     44
## 2  2    150     56
## 3  3    132     49
## 4  4    122     45
## 5  5    119     39
## 6  6    110     35

And cbind to append the column in df4 to the df3 data frame and assign to df_ccomb.

df_ccomb <- cbind(df3, df4)
df_ccomb
##   id height weight location
## 1  1    120     44       UK
## 2  2    150     56       CZ
## 3  3    132     49       CZ
## 4  4    122     45       UK

Another situation when adding a new column to a data frame is useful is when you want to perform some kind of transformation on an existing variable. For example, say we wanted to apply a log₁₀ transformation on the height variable in the df_rcomb data frame we created above. We could just create a separate variable to contains these values but it’s good practice to create this variable as a new column inside our existing data frame so we keep all of our data together. Let’s call this new variable height_log10.

# log10 transformation
df_rcomb$height_log10 <- log10(df_rcomb$height)
df_rcomb
##   id height weight height_log10
## 1  1    120     44     2.079181
## 2  2    150     56     2.176091
## 3  3    132     49     2.120574
## 4  4    122     45     2.086360
## 5  5    119     39     2.075547
## 6  6    110     35     2.041393

This situation also crops up when we want to convert an existing variable in a data frame from one data class to another data class. For example, the id variable in the df_rcomb data frame is numeric type data (use the str() or class() functions to check for yourself). If we wanted to convert the id variable to a factor to use later in our analysis we can create a new variable called Fid in our data frame and use the factor() function to convert the id variable.

# convert to a factor 
df_rcomb$Fid <- factor(df_rcomb$id)
df_rcomb
##   id height weight height_log10 Fid
## 1  1    120     44     2.079181   1
## 2  2    150     56     2.176091   2
## 3  3    132     49     2.120574   3
## 4  4    122     45     2.086360   4
## 5  5    119     39     2.075547   5
## 6  6    110     35     2.041393   6
str(df_rcomb)
## 'data.frame':    6 obs. of  5 variables:
##  $ id          : int  1 2 3 4 5 6
##  $ height      : num  120 150 132 122 119 110
##  $ weight      : num  44 56 49 45 39 35
##  $ height_log10: num  2.08 2.18 2.12 2.09 2.08 ...
##  $ Fid         : Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6

3.4.5 Merging data frames

Instead of just appending either rows or columns to a data frame we can also merge two data frames together. Let’s say we have one data frame that contains taxonomic information on some common UK rocky shore invertebrates (called taxa) and another data frame that contains information on where they are usually found on the rocky shore (called zone). We can merge these two data frames together to produce a single data frame with both taxonomic and location information. Let’s first create both of these data frames (in reality you would probably just import your different datasets).

taxa <- data.frame(GENUS = c("Patella", "Littorina", "Halichondria", "Semibalanus"),
         species = c("vulgata", "littoria", "panacea", "balanoides"),
         family = c("patellidae", "Littorinidae", "Halichondriidae", "Archaeobalanidae"))
taxa
##          GENUS    species           family
## 1      Patella    vulgata       patellidae
## 2    Littorina   littoria     Littorinidae
## 3 Halichondria    panacea  Halichondriidae
## 4  Semibalanus balanoides Archaeobalanidae

zone <- data.frame(genus = c("Laminaria", "Halichondria", "Xanthoria", "Littorina", 
                             "Semibalanus", "Fucus"),
                   species = c("digitata", "panacea", "parietina", "littoria", 
                               "balanoides", "serratus"),
                   zone = c( "v_low", "low", "v_high", "low_mid", "high", "low_mid"))
zone
##          genus    species    zone
## 1    Laminaria   digitata   v_low
## 2 Halichondria    panacea     low
## 3    Xanthoria  parietina  v_high
## 4    Littorina   littoria low_mid
## 5  Semibalanus balanoides    high
## 6        Fucus   serratus low_mid

Because both of our data frames contains at least one variable in common (species in our case) we can simply use the merge() function to create a new data frame called taxa_zone.

taxa_zone <- merge(x = taxa, y = zone)
taxa_zone
##      species        GENUS           family        genus    zone
## 1 balanoides  Semibalanus Archaeobalanidae  Semibalanus    high
## 2   littoria    Littorina     Littorinidae    Littorina low_mid
## 3    panacea Halichondria  Halichondriidae Halichondria     low

Notice that the merged data frame contains only the rows that have species information in both data frames. There are also two columns called GENUS and genus because the merge() function treats these as two different variables that originate from the two data frames.

If we want to include all data from both data frames then we will need to use the all = TRUE argument. The missing values will be included as NA.

taxa_zone <- merge(x = taxa, y = zone, all = TRUE)
taxa_zone
##      species        GENUS           family        genus    zone
## 1 balanoides  Semibalanus Archaeobalanidae  Semibalanus    high
## 2   digitata         <NA>             <NA>    Laminaria   v_low
## 3   littoria    Littorina     Littorinidae    Littorina low_mid
## 4    panacea Halichondria  Halichondriidae Halichondria     low
## 5  parietina         <NA>             <NA>    Xanthoria  v_high
## 6   serratus         <NA>             <NA>        Fucus low_mid
## 7    vulgata      Patella       patellidae         <NA>    <NA>

If the variable names that you want to base the merge on are different in each data frame (for example GENUS and genus) you can specify the names in the first data frame (known as x) and the second data frame (known as y) using the by.x = and by.y = arguments.

taxa_zone <- merge(x = taxa, y = zone, by.x = "GENUS", by.y = "genus", all = TRUE)
taxa_zone
##          GENUS  species.x           family  species.y    zone
## 1        Fucus       <NA>             <NA>   serratus low_mid
## 2 Halichondria    panacea  Halichondriidae    panacea     low
## 3    Laminaria       <NA>             <NA>   digitata   v_low
## 4    Littorina   littoria     Littorinidae   littoria low_mid
## 5      Patella    vulgata       patellidae       <NA>    <NA>
## 6  Semibalanus balanoides Archaeobalanidae balanoides    high
## 7    Xanthoria       <NA>             <NA>  parietina  v_high

Or using multiple variable names.

taxa_zone <- merge(x = taxa, y = zone, by.x = c("species", "GENUS"), 
                    by.y = c("species", "genus"), all = TRUE)
taxa_zone
##      species        GENUS           family    zone
## 1 balanoides  Semibalanus Archaeobalanidae    high
## 2   digitata    Laminaria             <NA>   v_low
## 3   littoria    Littorina     Littorinidae low_mid
## 4    panacea Halichondria  Halichondriidae     low
## 5  parietina    Xanthoria             <NA>  v_high
## 6   serratus        Fucus             <NA> low_mid
## 7    vulgata      Patella       patellidae    <NA>

3.4.6 Reshaping data frames

Reshaping data into different formats is a common task. With rectangular type data (data frames have the same number of rows in each column) there are two main data frame shapes that you will come across: the ‘long’ format (sometimes called stacked) and the ‘wide’ format. An example of a long format data frame is given below. We can see that each row is a single observation from an individual subject and each subject can have multiple rows. This results in a single column of our measurement.

long_data <- data.frame(
             subject = rep(c("A", "B", "C", "D"), each = 3),
             sex = rep(c("M", "F", "F", "M"), each =3),
             condition = rep(c("control", "cond1", "cond2"), times = 4),
             measurement = c(12.9, 14.2, 8.7, 5.2, 12.6, 10.1, 8.9,
                             12.1, 14.2, 10.5, 12.9, 11.9))
long_data
##    subject sex condition measurement
## 1        A   M   control        12.9
## 2        A   M     cond1        14.2
## 3        A   M     cond2         8.7
## 4        B   F   control         5.2
## 5        B   F     cond1        12.6
## 6        B   F     cond2        10.1
## 7        C   F   control         8.9
## 8        C   F     cond1        12.1
## 9        C   F     cond2        14.2
## 10       D   M   control        10.5
## 11       D   M     cond1        12.9
## 12       D   M     cond2        11.9

We can also format the same data in the wide format as shown below. In this format we have multiple observations from each subject in a single row with measurements in different columns (control, cond1 and cond2). This is a common format when you have repeated measurements from sampling units.

wide_data <- data.frame(subject = c("A", "B", "C", "D"),
               sex = c("M", "F", "F", "M"),
               control = c(12.9, 5.2, 8.9, 10.5),
               cond1 = c(14.2, 12.6, 12.1, 12.9),
               cond2 = c(8.7, 10.1, 14.2, 11.9))
wide_data
##   subject sex control cond1 cond2
## 1       A   M    12.9  14.2   8.7
## 2       B   F     5.2  12.6  10.1
## 3       C   F     8.9  12.1  14.2
## 4       D   M    10.5  12.9  11.9

Whilst there’s no inherent problem with either of these formats we will sometimes need to convert between the two because some functions will require a specific format for them to work. The most common format is the long format.

There are many ways to convert between these two formats but we’ll use the melt() and dcast() functions from the reshape2 package (you will need to install this package first). The melt() function is used to convert from wide to long formats. The first argument for the melt() function is the data frame we want to melt (in our case wide_data). The id.vars = c("subject", "sex") argument is a vector of the variables you want to stack, the measured.vars = c("control", "cond1", "cond2") argument identifies the columns of the measurements in different conditions, the variable.name = "condition" argument specifies what you want to call the stacked column of your different conditions in your output data frame and value.name = "measurement" is the name of the column of your stacked measurements in your output data frame.

library(reshape2)
wide_data    # remind ourselves what the wide format looks like
##   subject sex control cond1 cond2
## 1       A   M    12.9  14.2   8.7
## 2       B   F     5.2  12.6  10.1
## 3       C   F     8.9  12.1  14.2
## 4       D   M    10.5  12.9  11.9

# convert wide to long
my_long_df <- melt(data = wide_data, id.vars = c("subject", "sex"),
                   measured.vars = c("control", "cond1", "cond2"),
                   variable.name = "condition", value.name = "measurement")
my_long_df
##    subject sex condition measurement
## 1        A   M   control        12.9
## 2        B   F   control         5.2
## 3        C   F   control         8.9
## 4        D   M   control        10.5
## 5        A   M     cond1        14.2
## 6        B   F     cond1        12.6
## 7        C   F     cond1        12.1
## 8        D   M     cond1        12.9
## 9        A   M     cond2         8.7
## 10       B   F     cond2        10.1
## 11       C   F     cond2        14.2
## 12       D   M     cond2        11.9

The dcast() function is used to convert from a long format data frame to a wide format data frame. The first argument is again is the data frame we want to cast (long_data for this example). The second argument is in formula syntax. The subject + sex bit of the formula means that we want to keep these columns separate, and the ~ condition part is the column that contains the labels that we want to split into new columns in our new data frame. The value.var = "measurement" argument is the column that contains the measured data.

long_data   # remind ourselves what the long format look like
##    subject sex condition measurement
## 1        A   M   control        12.9
## 2        A   M     cond1        14.2
## 3        A   M     cond2         8.7
## 4        B   F   control         5.2
## 5        B   F     cond1        12.6
## 6        B   F     cond2        10.1
## 7        C   F   control         8.9
## 8        C   F     cond1        12.1
## 9        C   F     cond2        14.2
## 10       D   M   control        10.5
## 11       D   M     cond1        12.9
## 12       D   M     cond2        11.9

# convert long to wide
my_wide_df <- dcast(data = long_data, subject + sex ~ condition, 
                    value.var = "measurement")
my_wide_df
##   subject sex cond1 cond2 control
## 1       A   M  14.2   8.7    12.9
## 2       B   F  12.6  10.1     5.2
## 3       C   F  12.1  14.2     8.9
## 4       D   M  12.9  11.9    10.5