Chapter 5 Graphics with ggplot
For many people, one of their favourite uses of R is making figures. These can either take the form of a quick dirty plot to get a feel for what’s going on in your dataset, or fancier, more complex figures to use in a publication or a report. This process is often as close as many scientists get to having a professional creative side (at least that’s true for me), and it’s a source of pride for some folk.
As stated in the Introduction, one of the many reasons for the rise in the popularity of R is due to its ability to produce publication standard figures (as well as those quick and dirty figures - which are the type we all produce most of). Not only can R users make figures well suited for publication, but the means in which the figures are produced also offer a wide-range of customisation. This in turn allows users to create their own particular styles and brands of figures (well beyond the cookie-cutter styles in more traditional point and click programs). Because of this inherent flexibility when producing figures, the data visualisation side of R and supporting packages has grown substantially over the years.
In this Chapter, we will focus on creating figures through using a specialised package called
Before we get going with making some plots of the
gg variety, how about a quick history of one of the most commonly used packages in R?
ggplot2 was based on a book called Grammar of Graphics by Leland Wilkinson (hence the gg in
ggplot2), yours for only £100 or so. But before you spend all that money, see here for an interesting summary of Wilkinson’s book.
Grammar of Graphics tried to move away from the idea that to create a scatterplot, a user should click the
scatterplot button or use the
scatterplot() function. Instead, by splitting figures into their various components (e.g. the underlying statistics, the geometric arrangement, the theme, see Fig. 4.1), a user will be able to manipulate each of these components (i.e. layers) and produce a tailor made figure fit for their very specific needs. Contrast this approach with the one used by, for example, Microsoft Excel. The user specifies the data and then clicks the bar chart, scatterplot, line graph etc. button. This inherently locks the user into many choices made by the software developer and not the user. Think of how easily you can spot an Excel scatterplot because other than a couple of pre-set options, there’s really not much you can do to change the way the plot displays the data - you are at the mercy of the [insert corporation here] Gods.
While Wilkinson would eventually go on to become vice-president of SPSS, his (and his oft forgotten co-author’s) ideas would, never-the-less, make their way into R via
ggplot2 as well as other implementations (e.g. tableau)
ggplot2 was released by Hadley Wickham. By 2017 the package was said to have been downloaded 10 million times.
ggplot2 now has many secondary packages that use it either as a foundation for expanding on or for interfacing/working with (some statistical packages now have accompanying
ggplot2 interfaces for producing figures i.e.
mgcViz) and is now part of the
It’s important to note that
ggplot2 is not required to make “fancy” figures. If you’d prefer to use base R then go ahead. Almost, if not completely, equivalent figures are possible in base R. The difference between
ggplot2 and base is to do with how you get to the end product rather than any differences in the end product itself. This is, never-the-less, a common belief almost certainly due to the fact that making a moderately attractive figure is, in our opinion at least, easier to do with
ggplot2 as, at the start, many decisions are made for the user, without you necessarily even knowing that a decision was ever made!
With that in mind, let’s get started making some figures.