Google Spreadsheets and R: a dynamic duo! An annoying feature in googlesheets4’s read_sheet(), is that within a column, it assigns a type to each cell individually when it is confused. However, this makes perfect sense. Sometimes, a column contains a mix of values that could be integers and values that should be strings. In this blog post I explain how to use the col_types argument so that the function no longer has to guess the column type.
Let’s say one of the columns in your Google spreadsheet had the following values. It could be product IDs from different suppliers, for example.
XA12, ID-19, 1267, 890, DD4, S.1, Q04, 823, 10003
All values could be interpreted as a string, while only some of them can be interpreted as integers. This process of guessing is what read_sheet() does when it’s converting your sheet to a data frame. Because of the ambiguity of our series of values, read_sheet() sets the type for each cell individually.
To prevent this behavior, we have to set the column types manually: the col_types argument is what we’re after. Contrary to more traditional libraries, you can set all column types with a single letter.
- -: skip the column
- ?: guess
- l: logical
- i: integer
- d: double
- D: date
- t: time of day
- T: POSIXct datetime
- c: character
- L: list-column <– we don’t want this
Let’s say our sheet has four columns, three integers and our last one is ambiguous, but we want to set it to be a character column. We would do that like this:
library(googlesheets4) df <- read_sheet(ss = your_sheet_link, col_types = 'iiic')
By the way: if you still want googlesheets4 to make a guess, you can pass the ? parameter. Although, guessing slows down the process of loading the table, especially with large spreadsheets. A good tip: by setting the guess_max parameter, you will likely speed uploading the data frame, because you’ll tell the function to look at fewer rows to make an educated guess.
By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!
Great success!