Solving “Found unknown categories […] in column” with sklearn OneHotEncoder

  • by

In this short blog post, I tackle an error related to a classic problem within machine learning: how to treat unseen categorical values and solve the “found unknown categories” error. Imagine you have a train and a test data set with the following values in a column: Train: [‘green’, ‘red’,… 

Solving “trend lines are not supported when marks are stacked” in Tableau

  • by

When you have a scatter plot in Tableau, you often want to add a trend line to indicate a relationship between two variables. However, sometimes it appears that it’s impossible: it’s greyed out and there is a tooltip that says “trend lines are not supported when marks are stacked”. Let’s… 

Undersampling a Pandas DataFrame

  • by

In a previous post, I explained how you can sample two Pandas DataFrame exactly the same way. In this blog post, I want to use that helper function to undersample your predictors and target variable. When you are working with an imbalanced data set, it’s often good practice to under-… 

Prevent column type list when using read_sheet from R’s googlesheets4

  • by

Google Spreadsheets and R: a dynamic duo! An annoying feature in googlesheets4’s read_sheet(), is that within a column, it assigns a type to each cell individually when it is confused. However, this makes perfect sense. Sometimes, a column contains a mix of values that could be integers and values that… 

Excel’s LEFT, RIGHT and MID in R

  • by

LEFT() and RIGHT() are probably the most used string functions within Microsoft Excel (or other spreadsheet software). Many people who make the transition to R wonder where to find these functions. Short story: they’re not available out-of-the-box. However, why not create them yourself or find a library that suits your…