The base installation of R comes with many useful packages as standard. These packages will contain many of the functions you will use on a daily basis. However, as you start using R for more diverse projects (and as your own use of R evolves) you will find that there comes a time when you will need to extend R’s capabilities. Happily, many thousands of R users have developed useful code and shared this code as installable packages. You can think of a package as a collection of functions, data and help files collated into a well defined standard structure which you can download and install in R. These packages can be downloaded from a variety of sources but the most popular are CRAN, Bioconductor and GitHub. Currently, CRAN hosts over 15000 packages and is the official repository for user contributed R packages. Bioconductor provides open source software oriented towards bioinformatics and hosts over 1800 R packages. GitHub is a website that hosts git repositories for all sorts of software and projects (not just R). Often, cutting edge development versions of R packages are hosted on GitHub so if you need all the new bells and whistles then this may be an option. However, a potential downside of using the development version of an R package is that it might not be as stable as the version hosted on CRAN (it’s in development!) and updating packages won’t be automatic.
See this video for step-by-step instruction on how to install, use and update packages from CRAN
To install a package from CRAN you can use the
install.packages() function. For example if you want to install the
remotes package enter the following code into the Console window of RStudio (note: you will need a working internet connection to do this)
You may be asked to select a CRAN mirror, just select ‘0-cloud’ or a mirror near to your location. The
dependencies = TRUE argument ensures that additional packages that are required will also be installed.
It’s good practice to occasionally update your previously installed packages to get access to new functionality and bug fixes. To update CRAN packages you can use the
update.packages() function (you will need a working internet connection for this)
ask = FALSE argument avoids having to confirm every package download which can be a pain if you have many packages installed.
To install packages from Bioconductor the process is a little different. You first need to install the
BiocManager package. You only need to do this once unless you subsequently reinstall or upgrade R
Once the BiocManager package has been installed you can either install all of the ‘core’ Bioconductor packages with
or install specific packages such as the ‘GenomicRanges’ and ‘edgeR’ packages
To update Bioconductor packages just use the
BiocManager::install() function again
Again, you can use the
ask = FALSE argument to avoid having to confirm every package download.
There are multiple options for installing packages hosted on GitHub. Perhaps the most efficient method is to use the
install_github() function from the
remotes package (you installed this package previously). Before you use the function you will need to know the GitHub username of the repository owner and also the name of the repository. For example, the development version of
dplyr from Hadley Wickham is hosted on the tidyverse GitHub account and has the repository name ‘dplyr’ (just Google ‘github dplyr’). To install this version from GitHub use
The safest way (that we know of) to update a package installed from GitHub is to just reinstall it using the above command.
Once you have installed a package onto your computer it is not immediately available for you to use. To use a package you first need to load the package by using the
library() function. For example, to load the
remotes package you previously installed
library() function will also load any additional packages required and may print out additional package information. It is important to realise that every time you start a new R session (or restore a previously saved session) you need to load the packages you will be using. We tend to put all our
library() statements required for our analysis near the top of our R scripts to make them easily accessible and easy to add to as our code develops. If you try to use a function without first loading the relevant R package you will receive an error message that R could not find the function. For example, if you try to use the
install_github() function without loading the
remotes package first you will receive the following error
Sometimes it can be useful to use a function without first using the
library() function. If, for example, you will only be using one or two functions in your script and don’t want to load all of the other functions in a package then you can access the function directly by specifying the package name followed by two colons and then the function name
This is how we were able to use the
install_github() functions above without first loading the packages
remotes. Most of the time we recommend using the