2020: the year that every data scientist became a virologist. For the past weeks we’ve been numbed with statistics and plots about the coronavirus. A recurring feature of these plots is that they often have a logarithmic axes. Here’s how to achieve this in R’s ggplot2.
Lets’s say you are trying to plot daily new COVID-19 cases in the United States using ggplot2 (until 2020-03-28). That would look like this.
ggplot(usa,aes(x = DATE, y = NEW_CASES)) + geom_line() + geom_point() + t + ylab('DAILY NEW CASES (LINEAR SCALE)')
However, epidemics tend to grow exponentially. Given this property, it often makes sense to plot this on a logarithmic scale, and not on a linear one. In ggplot2, we can do this fairly easy using one of the following functions.
Of course, you can do exactly the same for the x axis. Furthermore, the library also allows for logarithmic tick marks using annotation_logticks().
ggplot(usa,aes(x = DATE, y = NEW_CASES)) + geom_line() + geom_point() + t + ylab('DAILY NEW CASES (LINEAR SCALE)') + scale_y_log10() + annotation_logticks(sides = 'l')
By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!
Wanna know how I made these charts so crisp in ggplot2? Have a look at this blog post.