1.5 R packages

The base installation of R comes with many useful packages as standard. These packages will contain many of the functions you will use on a daily basis. However, as you start using R for more diverse projects (and as your own use of R evolves) you will find that there comes a time when you will need to extend R’s capabilities. Happily, many thousands of R users have developed useful code and shared this code as installable packages. You can think of a package as a collection of functions, data and help files collated into a well defined standard structure which you can download and install in R. These packages can be downloaded from a variety of sources but the most popular are CRAN, Bioconductor and GitHub. Currently, CRAN hosts over 15000 packages and is the official repository for user contributed R packages. Bioconductor provides open source software oriented towards bioinformatics and hosts over 1800 R packages. GitHub is a website that hosts git repositories for all sorts of software and projects (not just R). Often, cutting edge development versions of R packages are hosted on GitHub so if you need all the new bells and whistles then this may be an option. However, a potential downside of using the development version of an R package is that it might not be as stable as the version hosted on CRAN (it’s in development!) and updating packages won’t be automatic.

1.5.1 CRAN packages

See this video for step-by-step instruction on how to install, use and update packages from CRAN

 

To install a package from CRAN you can use the install.packages() function. For example if you want to install the remotes package enter the following code into the Console window of RStudio (note: you will need a working internet connection to do this)

install.packages('remotes', dependencies = TRUE)

You may be asked to select a CRAN mirror, just select ‘0-cloud’ or a mirror near to your location. The dependencies = TRUE argument ensures that additional packages that are required will also be installed.

It’s good practice to occasionally update your previously installed packages to get access to new functionality and bug fixes. To update CRAN packages you can use the update.packages() function (you will need a working internet connection for this)

update.packages(ask = FALSE)

The ask = FALSE argument avoids having to confirm every package download which can be a pain if you have many packages installed.

1.5.2 Bioconductor packages

To install packages from Bioconductor the process is a little different. You first need to install the BiocManager package. You only need to do this once unless you subsequently reinstall or upgrade R

install.packages('BiocManager', dependencies = TRUE)

Once the BiocManager package has been installed you can either install all of the ‘core’ Bioconductor packages with

BiocManager::install()

or install specific packages such as the ‘GenomicRanges’ and ‘edgeR’ packages

BiocManager::install(c("GenomicRanges", "edgeR"))

To update Bioconductor packages just use the BiocManager::install() function again

BiocManager::install(ask = FALSE)

Again, you can use the ask = FALSE argument to avoid having to confirm every package download.

1.5.3 GitHub packages

There are multiple options for installing packages hosted on GitHub. Perhaps the most efficient method is to use the install_github() function from the remotes package (you installed this package previously). Before you use the function you will need to know the GitHub username of the repository owner and also the name of the repository. For example, the development version of dplyr from Hadley Wickham is hosted on the tidyverse GitHub account and has the repository name ‘dplyr’ (just Google ‘github dplyr’). To install this version from GitHub use

remotes::install_github('tidyverse/dplyr')

The safest way (that we know of) to update a package installed from GitHub is to just reinstall it using the above command.

1.5.4 Using packages

Once you have installed a package onto your computer it is not immediately available for you to use. To use a package you first need to load the package by using the library() function. For example, to load the remotes package you previously installed

library(remotes)

The library() function will also load any additional packages required and may print out additional package information. It is important to realise that every time you start a new R session (or restore a previously saved session) you need to load the packages you will be using. We tend to put all our library() statements required for our analysis near the top of our R scripts to make them easily accessible and easy to add to as our code develops. If you try to use a function without first loading the relevant R package you will receive an error message that R could not find the function. For example, if you try to use the install_github() function without loading the remotes package first you will receive the following error

install_github('tidyverse/dplyr')

# Error in install_github("tidyverse/dplyr") : 
#  could not find function "install_github"

Sometimes it can be useful to use a function without first using the library() function. If, for example, you will only be using one or two functions in your script and don’t want to load all of the other functions in a package then you can access the function directly by specifying the package name followed by two colons and then the function name

remotes::install_github('tidyverse/dplyr')

This is how we were able to use the install() and install_github() functions above without first loading the packages BiocManager and remotes. Most of the time we recommend using the library() function.