Roel Peters

Dealing with right-censored data in machine learning: Random Survival Forests

by roelpi
January 27, 2020April 5, 2021
3 Comments
4 min read

A couple of weeks ago, I started working with survival analysis. It was fairly new to me, so I had to dig into some new methods. There was one method that captured my attention: random survival forests (RSFs). It’s one of many statistical learning techniques designed to work with right-censored…

by roelpi
January 13, 2020August 12, 2020
1338 Comments
2 min read

Since I intensively started using Visual Studio Code across my devices, with PowerShell as my main terminal, I started running into the following error quite a lot: <file> cannot be loaded because running scripts is disabled on this system. For more information, see about_Execution_Policies at https:/go.microsoft.com/fwlink/?LinkID=135170. What’s happening here, is…

by roelpi
January 5, 2020August 31, 2020
3 min read

A while ago I started working in the JavaScript library D3.js to create some interactive visualizations. I even took a rather great Coursera course on the subject — Information Visualization: Programming with D3.js. If you’re not familiar with modern JavaScript syntax, D3.js has a rather steep learning curve. During this…

by roelpi
December 30, 2019August 31, 2020
7 Comments
3 min read

Ahh, user rights. The cause and solution to all of life’s identification problems. In this blog post I explain how you can access (private) google spreadsheets using the Python gspread library. Before you get started: make sure you have administrator rights to the spreadsheets you are trying to work with.…

by roelpi
December 25, 2019May 15, 2020
3 min read

In his book “A Field Guide to Lies and Statistics”, psychologist Daniel Levitin elaborates on some commonly made mistakes when it comes to interpreting data. Although a lot of the topics are closely related to the chapters from the best-selling 1960’s booklet “How to lies with Statistics” by Darrell Huff,…

by roelpi
December 19, 2019April 5, 2021
2 Comments
5 min read

Random Forest stays my number one go-to algorithm for quickly prototyping prediction algorithms. Last week, I worked on speeding up a feature engineering and training workflow for a marketing project. I moved from the traditional randomForest package to the — already three years old — package ranger. Here are my…

by roelpi
December 18, 2019October 18, 2021
4 min read

When you’re an R poweruser, pivoting tables in pandas feels unnecessarily complex. Why are there two pivot functions? Why does it return an index when you wanted a column? Why does it generate multi index columns? Those are the questions I tackle in this blog post. 💥 This blog post…

by roelpi
December 16, 2019April 5, 2021
1 Comment
3 min read

In this blog post, I will elaborate on a specific warning, the contexts in which it occurs and how you can solve or prevent it. It’s definitely in my top three of generic warnings that I bump into: NAs introduced by coercion Apparently, some NAs were added to my data…

by roelpi
December 12, 2019August 31, 2020
2 Comments
5 min read

I recently bought a Google Chromebook. It’s light, it’s fast, great battery, but it doesn’t support all the development tools that I’m used to when working on a Windows computer. There’s no Visual Studio Code (Python) and no RStudio (R). Me being a data scientist, that hurt in the beginning.…

by roelpi
December 10, 2019August 12, 2020
2 min read

Getting a firm understanding of NaNs in your dataset ensures you don’t draw wrong conclusions from an incomplete dataset. In this blog post I show how you can list the amount of NaNs per column, per row, and per group. First, let’s create some dummy data, and add some NaNs.…

« Previous
1
…
15
16
17
18
19
…
24
Next »

Dealing with right-censored data in machine learning: Random Survival Forests

Solved: “running scripts is disabled on this system” in PowerShell

Explaining ‘promises’ in D3.js: the what and the why

Solved: “The caller does not have permission” – Using the API with a private Google Spreadsheet

“A Field Guide to Lies and Statistics”: arm yourself against bullshit and data overload

When speed matters: going from randomForest to ranger

Pandas’ pivot_table vs. pivot

Solving R’s “NAs introduced by coercion”

Data Science in the Cloud: Azure Notebooks + GitHub

Working with NaN’s (nulls/NA’s) in pandas: per column, per row and per group