# how to

## Solve ‘RIGHT JOIN must be parenthesized when following a comma join’ in BigQuery

• 2 min read

In this blog post, I explain why a certain error in BigQuery arises and how you can get rid of it. Although I have abandoned the comma join syntax a while ago, I do happen to use it within the context of arrays in Google BigQuery (all the cool kids…

## Optimizing the number of clusters using Tibshirani’s gap statistic

• 5 min read

When you are clustering, what you are actually trying to do is to find groups of objects so that they are similar to one another, and different from the object of other groups. In other words, you want to minimize the intra-cluster distance and maximize the inter-cluster distance. Clustering algorithms…

## Statistical power, it matters. Even in R.

• 5 min read

Although every statistics book will tell you not to go looking for statistical significance, sadly that’s still what happens in many analyses and scientific research. Often forgotten, is to check for statistical power. Here’s a refresher, and how to do it in R. Remember statistical significance? A finding is statistically…

## The t-test in R revisited

• 6 min read

A two-sample t-test (aka an independent t-test or student’s t-test) is most often used to compare the means of two samples. For example: you might be checking if black cats, on average, have the same body weight as white cats. In this blog post I explain how to do a…

## How to prevent scientific notation in R

• 3 min read

Scientific notation can be handy if you want to save digits. However, if you need to present your results to the board, there’s gonna be that one guy who will ask what the ‘e’-thingy stands for. Here’s how you can make sure that your results are not returned in scientific…

## Applying as.factor (or numeric) to multiple columns in R

• 2 min read

In this blog post I tackle a problem for which I have been looking for an of-the-shelve solution: Converting the columns of a certain type of class to another class, while preserving the data frame. To this day I haven’t found it. That’s why I wrote this function. Let’s say…

## Solve ‘The device does not recognize the command’ when saving a file in RStudio

• 2 min read

The following error message is something that me and my colleagues keep running into when using Rstudio. ‘Error Saving File — The device does not recognize the command’ There are multiple things that cause this, but basically it’s because RStudio is having troubles writing to the file. Here’s what you…

## How to set axis limits in ggplot2 without losing data

• 3 min read

In this blog post, I elaborate on setting axis limits in a plot, generated by ggplot2. There are two ways: one where you pretend the data outside the limits doesn’t exist (using lims), and one where you respect that the data outside the limits exists (using coord_cartesian). The documentation for…

## How to only select categorical or numerical columns in R

Let’s say you want to use principal component analysis on the numerical columns in your data set to reduce the amount of features in your model and get rid of multicollinearity. For that, you’d need to select the numerical columns only. Now how could you do that properly? In the…

## Associate Thonny with a Virtual Environment on Raspberry Pi

In this small blog post I explain how you can work with a virtual environment in Thonny on a Raspberry Pi. But first, let’s create a virtual environment using venv. Open the terminal Change directory to the folder of your project (cd), or create a new one (mkdir) Inside that…