In this blog post, I elaborate on setting axis limits in a plot, generated by ggplot2. There are two ways: one where you pretend the data outside the limits doesn’t exist (using lims), and one where you respect that the data outside the limits exists (using coord_cartesian).
The documentation for the lims, xlim and ylim functions state the following about values outside its limits:
This is a shortcut for supplying the
limitsargument to the individual scales. Note that, by default, any values outside the limits will be replaced with
And this is what the documentation says about coord_cartesian:
Setting limits on the coordinate system will zoom the plot (like you’re looking at it with a magnifying glass), and will not change the underlying data like setting limits on a scale will.
Hadley Wickham, one of the most important figures in the R community, wrote about it in his book:
Here’s an example. First, we create some dummy data, an X and a Y that are closely correlated. We also add some outliers to the data. Lastly, we plot it, without setting any limits on the axes.
library(ggplot2) library(data.table) set.seed(10) normal_data_x <- rnorm(100,3,2) normal_data_y <- normal_data_x + runif(100,-2,2) outliers_x <- runif(25,8,10) outliers_y <- outliers_x ^ runif(25,1,2) d <- data.table(x = c(normal_data_x,outliers_x),y = c(normal_data_y,outliers_y)) ggplot(d,aes(x = x,y = y)) + geom_point() + geom_smooth(method = 'lm')
This is what the data looks like. Two strongly correlated series, when X is smaller than 7. And on the right you can see the outliers. I also added a linear smoother to demonstrate my point later on. What we see:
- All the data is visible, even the outliers.
- This smoother is based on all the data, even the outliers.
We can limit our X and Y axes using the xlim and ylim function as follows.
ggplot(d,aes(x = x,y = y)) + geom_point() + geom_smooth(method = 'lm') + xlim(-2,7) + ylim(-1,12)
We now observe:
- We no longer observe the outliers
- The smoother is based on the data without the outliers.
Finally, we can limit our X and Y axes using the coord_cartesian function.
ggplot(d,aes(x = x,y = y)) + geom_point() + geom_smooth(method = 'lm') + coord_cartesian(xlim=c(-2,7), ylim = c(-1,12))
As you can see, now:
- Once again, we no longer observe our outliers.
- However, we respect that outliers exist and the smoother is based on all the data.
By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!