Something that took me a while to do properly in ggplot2 is adding the percentage sign as a suffix to your tick labels, controlling decimals and at the same time still being able to set the limits of your axis.
I’ll show an example using the iris data set. Let’s say I want to show the mean sepal length per species, as a percentage of the maximum sepal lenth in the dataset.
In the first chunk of code I load in the data set and I make the required transformation.
library(datasets) library(scales) library(data.table) library(ggplot2) dt <- as.data.table(iris) dt[,Sepal.Length := Sepal.Length / max(Sepal.Length)] dt <- dt[,.(Species, Sepal.Length)]
We can make the visualization as follows. By setting the labels in ggplot2’s scale_y_continuous() function, I can process all the values through a function that turns every value into a percentage.
ggplot(dt, aes(x = Species, y = Sepal.Length, fill = Species)) + geom_bar(stat = 'summary', fun.y = 'mean') + scale_y_continuous(labels = function(x) paste0(x * 100, '%'))
But there is an easier way, using the scales library, by setting the accuracy parameter, you can control how many decimals you would like to show.
ggplot(dt, aes(x = Species, y = Sepal.Length, fill = Species)) + geom_bar(stat = 'summary', fun.y = 'mean') + scale_y_continuous(labels = scales::percent_format(accuracy = 1))
Finally, another thing I struggled with is setting the limits of my y axis. Let’s say, you only want to show the range from 50% to 100%. Using the limits parameter in scale_y_continuous or if you use the lims() or ylim() function, you will break the scale and you will have an empty visualization. However, if you use coord_cartesian() function, you will be able to do it flawlessly.
ggplot(dt, aes(x = Species, y = Sepal.Length, fill = Species)) + geom_bar(stat = 'summary', fun.y = 'mean') + scale_y_continuous(labels = scales::percent_format(accuracy = 1)) + coord_cartesian(ylim = c(0.5,1))
Here’s the final viz.
By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!
thaaanks for this post! Useful and clearly explained.
Thanks for the feedback! That’s an interesting point. I suppose there are many reasons, internal and external, why we do not accomplish all that we might. Regardless, I am thankful for the Lord’s grace in that. Thanks again!