Home » Add a regression equation and R² in ggplot2

Add a regression equation and R² in ggplot2

  • by
  • 4 min read

In my early days as an analyst, adding regression line equations and R² to my plots in Microsoft Excel was a good way to make an impression on the management. Because maths. In R, it is a little harder to achieve. There are two main ways to achieve it: manually, and using the ggpubr library. In this blog post, I explain how to do it in both ways.

First, let’s get some dummy data from the mtcars data set, load necessary packages and remove scientific notation. Our first plot — without the equation — looks like this.

library(ggplot2)

options(scipen=999) # no scientific notation

data(mtcars)
df <- mtcars

ggplot(df,aes(x = wt, y = hp)) + 
  geom_point() + 
  geom_smooth(method = "lm", se=FALSE)

The following solution was proposed ten years ago in a Google Group and simply involved some base functions. I updated the solution a little bit and this is the resulting code. By passing the x and y variable to the eq function, the regression object gets stored in a variable. The coefficients and the R² are concatenated in a long string.

eq <- function(x,y) {
  m <- lm(y ~ x)
  as.character(
    as.expression(
      substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,
                list(a = format(coef(m)[1], digits = 4),
                b = format(coef(m)[2], digits = 4),
                r2 = format(summary(m)$r.squared, digits = 3)))
    )
  )
}

ggplot(df,aes(x = wt, y = hp)) + 
  geom_point() + 
  geom_smooth(method = "lm", se=FALSE) +
  geom_text(x = 2, y = 300, label = eq(df$wt,df$hp), parse = TRUE)

I haven’t changed the theme one little bit to keep the solution as simple as possible.

True, that’s a lot of code for something that seems obvious for an Excel user. Using the ggpubr package, you can plot the regression and a wide range of measures. The eq.label and the rr.label are use respectively to access the regression line equation and the R².

library(ggpubr)
ggplot(df,aes(x = wt, y = hp)) + 
  geom_point() + 
  geom_smooth(method = "lm", se=FALSE) +
  stat_regline_equation(label.y = 400, aes(label = ..eq.label..)) +
  stat_regline_equation(label.y = 350, aes(label = ..rr.label..))

Here are the other measures you can access:

  • ..eq.label..: equation for the fitted polynomial as a character string to be parsed
  • ..rr.label..: R2 of the fitted model as a character string to be parsed
  • ..adj.rr.label..: Adjusted R2 of the fitted model as a character string to be parsed
  • ..AIC.labe..l: AIC for the fitted model.
  • ..BIC.label..: BIC for the fitted model.

By the way, you can easily use the measures from ggpubr in facets using facet_wrap() or facet_grid(). For every subset of your data, there is a different regression line equation and accompanying measures.

ggplot(df,aes(x = wt, y = hp)) + 
  geom_point() + 
  geom_smooth(method = "lm", se=FALSE) +
  stat_regline_equation(label.y = 400, aes(label = ..eq.label..)) +
  stat_regline_equation(label.y = 350, aes(label = ..rr.label..)) +
  facet_wrap(~vs)

Congratulations, you can now add the regression line equation and several measures to your ggplot2 visualizations.

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Leave a Reply

Your email address will not be published. Required fields are marked *