Home ยป How to set axis limits in ggplot2 without losing data

How to set axis limits in ggplot2 without losing data

  • by
axis limits coord_cartesian
Want to do a random act of kindness? Share this post.

In this blog post, I elaborate on setting axis limits in a plot, generated by ggplot2. There are two ways: one where you pretend the data outside the limits doesn’t exist (using lims), and one where you respect that the data outside the limits exists (using coord_cartesian).

The documentation for the lims, xlim and ylim functions state the following about values outside its limits:

This is a shortcut for supplying the limits argument to the individual scales. Note that, by default, any values outside the limits will be replaced with NA.

And this is what the documentation says about coord_cartesian:

Setting limits on the coordinate system will zoom the plot (like you’re looking at it with a magnifying glass), and will not change the underlying data like setting limits on a scale will.

Hadley Wickham, one of the most important figures in the R community, wrote about it in his book:

Here’s an example. First, we create some dummy data, an X and a Y that are closely correlated. We also add some outliers to the data. Lastly, we plot it, without setting any limits on the axes.

library(ggplot2)
library(data.table)

set.seed(10)
normal_data_x <- rnorm(100,3,2)
normal_data_y <- normal_data_x + runif(100,-2,2)
outliers_x <- runif(25,8,10)
outliers_y <- outliers_x ^ runif(25,1,2)

d <- data.table(x = c(normal_data_x,outliers_x),y = c(normal_data_y,outliers_y))

ggplot(d,aes(x = x,y = y)) + 
  geom_point() +
  geom_smooth(method = 'lm')

This is what the data looks like. Two strongly correlated series, when X is smaller than 7. And on the right you can see the outliers. I also added a linear smoother to demonstrate my point later on. What we see:

  • All the data is visible, even the outliers.
  • This smoother is based on all the data, even the outliers.

We can limit our X and Y axes using the xlim and ylim function as follows.

ggplot(d,aes(x = x,y = y)) + 
  geom_point() +
  geom_smooth(method = 'lm') +
  xlim(-2,7) + ylim(-1,12)

We now observe:

  • We no longer observe the outliers
  • The smoother is based on the data without the outliers.

Finally, we can limit our X and Y axes using the coord_cartesian function.

ggplot(d,aes(x = x,y = y)) + 
  geom_point() +
  geom_smooth(method = 'lm') +
  coord_cartesian(xlim=c(-2,7), ylim = c(-1,12))

As you can see, now:

  • Once again, we no longer observe our outliers.
  • However, we respect that outliers exist and the smoother is based on all the data.

Great success!

Want to do a random act of kindness? Share this post.