## Calculate cumulative sum (cumsum) by group in R

In this blog post, I tackle a question that you recurringly see on a lot of boards. We are going to calculate the cumulative sum, but within a group that the rows belong to. Thanks to vegetableagony for pointing out that, depending on the size of the dataset, other conclusions…

## Reorder a ggplot2 bar chart by count

Plotting bars in ggplot2 is easy. Yet, in many cases, you want to order these bars according to their frequency (count) or according to any other numeric value. In this blog post, I show you three ways to achieve this. First, let’s load the libraries and create the titanic data…

## Another way of merging tables with data.table in R

Merging a data.table in R is more or less the same as merging a regular data.frame. However, there is one difference. There’s also is a shorthand way to merge two tables that feels more data.table-esque. In this blog post I elaborate on merging tables using R’s data.table library. If you’re…

## Creating and managing a list of dataframes in R

Why do people put data in a list in the first place? Because it can be really darn handy. In this blog post I elaborate on some good use cases for putting data frames in a list. Loading a lot of files In many situations you will be confronted with…

## Displaying weekdays in R’s ggplot2

In the past couple of years, I have been messing around to get weekdays in my ggplot2 charts. However, I discovered a pretty straightforward method to do it properly. All those hours wasted, but no more. I will demonstrate with an example. I use data.table for wrangling, and lubridate for…

## Applying as.factor (or numeric) to multiple columns in R

In this blog post I tackle a problem for which I have been looking for an of-the-shelve solution: Converting the columns of a certain type of class to another class, while preserving the data frame. To this day I haven’t found it. That’s why I wrote this function. Let’s say…

## How to only select categorical or numerical columns in R

Let’s say you want to use principal component analysis on the numerical columns in your data set to reduce the amount of features in your model and get rid of multicollinearity. For that, you’d need to select the numerical columns only. Now how could you do that properly? In the…

## Solve DBI returning integer64 in R

Recently I bumped into an issue where a query, ran using the DBI library with the dbGetQuery function returned an integer64. Even without me noticing it. When converting my data frame (or data.table) to a matrix for clustering purposes, I ran into the following error: Error in if (changes ==…