In my early days as an analyst, adding regression line equations and R² to my plots in Microsoft Excel was a good way to make an impression on the management. Because maths. In R, it is a little harder to achieve. There are two main ways to achieve it: manually, and using the ggpubr library. In this blog post, I explain how to do it in both ways.
First, let’s get some dummy data from the mtcars data set, load necessary packages and remove scientific notation. Our first plot — without the equation — looks like this.
library(ggplot2)
options(scipen=999) # no scientific notation
data(mtcars)
df <- mtcars
ggplot(df,aes(x = wt, y = hp)) +
geom_point() +
geom_smooth(method = "lm", se=FALSE)
The following solution was proposed ten years ago in a Google Group and simply involved some base functions. I updated the solution a little bit and this is the resulting code. By passing the x and y variable to the eq function, the regression object gets stored in a variable. The coefficients and the R² are concatenated in a long string.
eq <- function(x,y) {
m <- lm(y ~ x)
as.character(
as.expression(
substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,
list(a = format(coef(m)[1], digits = 4),
b = format(coef(m)[2], digits = 4),
r2 = format(summary(m)$r.squared, digits = 3)))
)
)
}
ggplot(df,aes(x = wt, y = hp)) +
geom_point() +
geom_smooth(method = "lm", se=FALSE) +
geom_text(x = 2, y = 300, label = eq(df$wt,df$hp), parse = TRUE)
I haven’t changed the theme one little bit to keep the solution as simple as possible.
True, that’s a lot of code for something that seems obvious for an Excel user. Using the ggpubr package, you can plot the regression and a wide range of measures. The eq.label and the rr.label are use respectively to access the regression line equation and the R².
library(ggpubr)
ggplot(df,aes(x = wt, y = hp)) +
geom_point() +
geom_smooth(method = "lm", se=FALSE) +
stat_regline_equation(label.y = 400, aes(label = ..eq.label..)) +
stat_regline_equation(label.y = 350, aes(label = ..rr.label..))
Here are the other measures you can access:
- ..eq.label..: equation for the fitted polynomial as a character string to be parsed
- ..rr.label..: R2 of the fitted model as a character string to be parsed
- ..adj.rr.label..: Adjusted R2 of the fitted model as a character string to be parsed
- ..AIC.labe..l: AIC for the fitted model.
- ..BIC.label..: BIC for the fitted model.
By the way, you can easily use the measures from ggpubr in facets using facet_wrap() or facet_grid(). For every subset of your data, there is a different regression line equation and accompanying measures.
ggplot(df,aes(x = wt, y = hp)) +
geom_point() +
geom_smooth(method = "lm", se=FALSE) +
stat_regline_equation(label.y = 400, aes(label = ..eq.label..)) +
stat_regline_equation(label.y = 350, aes(label = ..rr.label..)) +
facet_wrap(~vs)
By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!
Congratulations, you can now add the regression line equation and several measures to your ggplot2 visualizations.