machine learning

Undersampling a Pandas DataFrame

by roelpi
September 15, 2020March 30, 2021
11 Comments
2 min read

In a previous post, I explained how you can sample two Pandas DataFrame exactly the same way. In this blog post, I want to use that helper function to undersample your predictors and target variable. When you are working with an imbalanced data set, it’s often good practice to under-…

by roelpi
January 27, 2020April 5, 2021
3 Comments
4 min read

A couple of weeks ago, I started working with survival analysis. It was fairly new to me, so I had to dig into some new methods. There was one method that captured my attention: random survival forests (RSFs). It’s one of many statistical learning techniques designed to work with right-censored…

by roelpi
November 21, 2019August 31, 2020
3 min read

In this blog post I explain how to create a DataGenerator with a one-hot encoder to encode your labels in the same way for every batch. Some months ago, I tried training a text generator on a huge corpus of text with an LSTM model. Basically, it’s a model that…

by roelpi
August 10, 2019May 15, 2020
8 min read

Toen Roger McNamee in maart 2019 in zijn podcast-interview met Sam Harris verkondigde dat Android, het mobiele besturingssysteemvan Google, een stofzuiger is voor jouw data, had hij niet kunnen voorzien wat 2 maand later op Google IO aangekondigd zou worden: federated learning. Binnenkort moet de tech-gigant jouw data niet meer…

by roelpi
July 21, 2019April 5, 2021
5 min read

In this blog post I will introduce you to building and training your own neural network algorithm in R through Keras & TensorFlow. If you haven’t installed Keras for R yet, please follow the instructions explained in part 1. I have explicitly chosen to work with structured data in this…

by roelpi
June 14, 2019April 5, 2021
1 Comment
3 min read

Personally, Random Forest is one of my favorite algorithms for supervised learning. It’s quick and dirty and still allows for some interpretation. However, R and the RandomForest package are somewhat cryptic when it comes to requirements not met to properly train the algorithm. I bumped a lot into this error…

machine learning

Undersampling a Pandas DataFrame

Dealing with right-censored data in machine learning: Random Survival Forests

One-hot encoding with a TensorFlow DataGenerator

Federated Learning: Een einde aan het privacydebat?

Using Keras in R: Training a model

randomForest gives NA/NaN/Inf in foreign function call and how to solve it