Site icon Roel Peters

Add a regression equation and R² in ggplot2

In my early days as an analyst, adding regression line equations and R² to my plots in Microsoft Excel was a good way to make an impression on the management. Because maths. In R, it is a little harder to achieve. There are two main ways to achieve it: manually, and using the ggpubr library. In this blog post, I explain how to do it in both ways.

First, let’s get some dummy data from the mtcars data set, load necessary packages and remove scientific notation. Our first plot — without the equation — looks like this.

library(ggplot2)

options(scipen=999) # no scientific notation

data(mtcars)
df <- mtcars

ggplot(df,aes(x = wt, y = hp)) + 
  geom_point() + 
  geom_smooth(method = "lm", se=FALSE)

The following solution was proposed ten years ago in a Google Group and simply involved some base functions. I updated the solution a little bit and this is the resulting code. By passing the x and y variable to the eq function, the regression object gets stored in a variable. The coefficients and the R² are concatenated in a long string.

eq <- function(x,y) {
  m <- lm(y ~ x)
  as.character(
    as.expression(
      substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,
                list(a = format(coef(m)[1], digits = 4),
                b = format(coef(m)[2], digits = 4),
                r2 = format(summary(m)$r.squared, digits = 3)))
    )
  )
}

ggplot(df,aes(x = wt, y = hp)) + 
  geom_point() + 
  geom_smooth(method = "lm", se=FALSE) +
  geom_text(x = 2, y = 300, label = eq(df$wt,df$hp), parse = TRUE)

I haven’t changed the theme one little bit to keep the solution as simple as possible.

True, that’s a lot of code for something that seems obvious for an Excel user. Using the ggpubr package, you can plot the regression and a wide range of measures. The eq.label and the rr.label are use respectively to access the regression line equation and the R².

library(ggpubr)
ggplot(df,aes(x = wt, y = hp)) + 
  geom_point() + 
  geom_smooth(method = "lm", se=FALSE) +
  stat_regline_equation(label.y = 400, aes(label = ..eq.label..)) +
  stat_regline_equation(label.y = 350, aes(label = ..rr.label..))

Here are the other measures you can access:

By the way, you can easily use the measures from ggpubr in facets using facet_wrap() or facet_grid(). For every subset of your data, there is a different regression line equation and accompanying measures.

ggplot(df,aes(x = wt, y = hp)) + 
  geom_point() + 
  geom_smooth(method = "lm", se=FALSE) +
  stat_regline_equation(label.y = 400, aes(label = ..eq.label..)) +
  stat_regline_equation(label.y = 350, aes(label = ..rr.label..)) +
  facet_wrap(~vs)

By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!

Congratulations, you can now add the regression line equation and several measures to your ggplot2 visualizations.

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Exit mobile version