Home » Add a regression equation and R² in ggplot2

Add a regression equation and R² in ggplot2

In my early days as an analyst, adding regression line equations and R² to my plots in Microsoft Excel was a good way to make an impression on the management. Because maths. In R, it is a little harder to achieve. There are two main ways to achieve it: manually, and using the ggpubr library. In this blog post, I explain how to do it in both ways.

First, let’s get some dummy data from the mtcars data set, load necessary packages and remove scientific notation. Our first plot — without the equation — looks like this.

library(ggplot2)

options(scipen=999) # no scientific notation

data(mtcars)
df <- mtcars

ggplot(df,aes(x = wt, y = hp)) + 
  geom_point() + 
  geom_smooth(method = "lm", se=FALSE)

The following solution was proposed ten years ago in a Google Group and simply involved some base functions. I updated the solution a little bit and this is the resulting code. By passing the x and y variable to the eq function, the regression object gets stored in a variable. The coefficients and the R² are concatenated in a long string.

eq <- function(x,y) {
  m <- lm(y ~ x)
  as.character(
    as.expression(
      substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,
                list(a = format(coef(m)[1], digits = 4),
                b = format(coef(m)[2], digits = 4),
                r2 = format(summary(m)$r.squared, digits = 3)))
    )
  )
}

ggplot(df,aes(x = wt, y = hp)) + 
  geom_point() + 
  geom_smooth(method = "lm", se=FALSE) +
  geom_text(x = 2, y = 300, label = eq(df$wt,df$hp), parse = TRUE)

I haven’t changed the theme one little bit to keep the solution as simple as possible.

True, that’s a lot of code for something that seems obvious for an Excel user. Using the ggpubr package, you can plot the regression and a wide range of measures. The eq.label and the rr.label are use respectively to access the regression line equation and the R².

library(ggpubr)
ggplot(df,aes(x = wt, y = hp)) + 
  geom_point() + 
  geom_smooth(method = "lm", se=FALSE) +
  stat_regline_equation(label.y = 400, aes(label = ..eq.label..)) +
  stat_regline_equation(label.y = 350, aes(label = ..rr.label..))

Here are the other measures you can access:

  • ..eq.label..: equation for the fitted polynomial as a character string to be parsed
  • ..rr.label..: R2 of the fitted model as a character string to be parsed
  • ..adj.rr.label..: Adjusted R2 of the fitted model as a character string to be parsed
  • ..AIC.labe..l: AIC for the fitted model.
  • ..BIC.label..: BIC for the fitted model.

By the way, you can easily use the measures from ggpubr in facets using facet_wrap() or facet_grid(). For every subset of your data, there is a different regression line equation and accompanying measures.

ggplot(df,aes(x = wt, y = hp)) + 
  geom_point() + 
  geom_smooth(method = "lm", se=FALSE) +
  stat_regline_equation(label.y = 400, aes(label = ..eq.label..)) +
  stat_regline_equation(label.y = 350, aes(label = ..rr.label..)) +
  facet_wrap(~vs)

By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!

Congratulations, you can now add the regression line equation and several measures to your ggplot2 visualizations.

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

6 thoughts on “Add a regression equation and R² in ggplot2”

  1. Thank you for your assistance. It worked. However, when I tried to include another factor denoted by colour, the graphs came out well, but the equations and R square overlapped. How do I separate the two equations?

    1. Hi Raphael! If you drop the label.y parameter, ggplot2 will position them nicely below each other. You can position this block using label.x.npc and label.y.npc.

  2. Thank you for your assistance. It worked. However, when I tried to include another factor denoted by colour, the graphs came out well, but the labels for the equations and R square overlapped. How do I separate the two equations?

  3. I dropped the label.y for the equation and maneuvered with the R-squared by using label.x. The results were amazing!!!! Thanks a lot. Your R commands are very simple but super effective!!!

  4. thank you so much for this , i am new in r i cannot belive it was so difficult to show something so easy to do in excel

Leave a Reply

Your email address will not be published. Required fields are marked *