Skip to content

Dealing with right-censored data in machine learning: Random Survival Forests

A couple of weeks ago, I started working with survival analysis. It was fairly new to me, so I had to dig into some new methods. There was one method that captured my attention: random survival forests (RSFs). It’s one of many statistical learning techniques designed to work with right-censored… 

Solved: “running scripts is disabled on this system” in PowerShell

Since I intensively started using Visual Studio Code across my devices, with PowerShell as my main terminal, I started running into the following error quite a lot: <file> cannot be loaded because running scripts is disabled on this system. For more information, see about_Execution_Policies at https:/go.microsoft.com/fwlink/?LinkID=135170. What’s happening here, is… 

Explaining ‘promises’ in D3.js: the what and the why

  • by
  • 3 min read

A while ago I started working in the JavaScript library D3.js to create some interactive visualizations. I even took a rather great Coursera course on the subject — Information Visualization: Programming with D3.js. If you’re not familiar with modern JavaScript syntax, D3.js has a rather steep learning curve. During this… 

Solved: “The caller does not have permission” – Using the API with a private Google Spreadsheet

Ahh, user rights. The cause and solution to all of life’s identification problems. In this blog post I explain how you can access (private) google spreadsheets using the Python gspread library. Before you get started: make sure you have administrator rights to the spreadsheets you are trying to work with.… 

“A Field Guide to Lies and Statistics”: arm yourself against bullshit and data overload

  • by
  • 3 min read

In his book “A Field Guide to Lies and Statistics”, psychologist Daniel Levitin elaborates on some commonly made mistakes when it comes to interpreting data. Although a lot of the topics are closely related to the chapters from the best-selling 1960’s booklet “How to lies with Statistics” by Darrell Huff,… 

When speed matters: going from randomForest to ranger

Random Forest stays my number one go-to algorithm for quickly prototyping prediction algorithms. Last week, I worked on speeding up a feature engineering and training workflow for a marketing project. I moved from the traditional randomForest package to the — already three years old — package ranger. Here are my… 

Pandas’ pivot_table vs. pivot

  • by
  • 4 min read

When you’re an R poweruser, pivoting tables in pandas feels unnecessarily complex. Why are there two pivot functions? Why does it return an index when you wanted a column? Why does it generate multi index columns? Those are the questions I tackle in this blog post. 💥 This blog post… 

Working with NaN’s (nulls/NA’s) in pandas: per column, per row and per group

  • by
  • 2 min read

Getting a firm understanding of NaNs in your dataset ensures you don’t draw wrong conclusions from an incomplete dataset. In this blog post I show how you can list the amount of NaNs per column, per row, and per group. First, let’s create some dummy data, and add some NaNs.…