Combining gsubs to replace multiple patterns

A problem you run into fairly early in a data scientists’ career is replacing a lot of patterns. Of course, you can write a ton load of gsub functions, but that becomes tiring really fast. In this blog post I elaborate on three functions from three separate libraries that can do the same thing, in a more concise way.

First, let’s create a dummy sentence.

s <- 'The quick brown fox jumps over the lazy dog'

The gsubfn function (from the library with the same name) accepts a pattern to look for and a list that explains what the replacements should be. It’s not really fast. Using microbenchmark, this function took 250 microseconds to run.

library(gsubfn)
s <- gsubfn('fox|over|dog', list('fox' = 'horse','over' = 'on', 'dog' = 'wolf'),s)

We can also use the popular magrittr package to achieve the same goal. By chaining the gsub function using the pipe operator, this can be quite concise, and it’s also double as fast as gsubfn(). However, it’s still a lot to write, and only if you have a long variable name, you’ll have some efficiency gains.

library(magrittr)

s %<>% 
  gsub('fox','horse',.) %>%
  gsub('over','on',.) %>%
  gsub('dog','wolf',.)

Finally, there’s stringi. In my opinion, it’s a swiss knife in string manipulation. And it’s known to be blazingly fast. By running a microbenchmark, one can identify it is up to 8 times faster than what we started with.

library(stringi)
s <- stri_replace_all_regex(s, c('fox','over','dog'), c('horse','on','wolf'), vectorize=F)

Conclusion: the fastest way to remove multiple patterns in a string is by using stringi. Another problem solved!

By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!

2 thoughts on “Combining gsubs to replace multiple patterns”

"Ucretsiz hesap olusturun January 31, 2024 at 5:34 pm

Your point of view caught my eye and was very interesting. Thanks. I have a question for you. https://accounts.binance.com/tr/register?ref=P9L9FQKY

Rastrear telefone February 10, 2024 at 1:25 pm

Melhor aplicativo de controle parental para proteger seus filhos – Monitorar secretamente secreto GPS, SMS, chamadas, WhatsApp, Facebook, localização. Você pode monitorar remotamente as atividades do telefone móvel após o download e instalar o apk no telefone de destino. https://www.mycellspy.com/br/

Combining gsubs to replace multiple patterns

2 thoughts on “Combining gsubs to replace multiple patterns”

Leave a Reply Cancel reply

Related Posts

Starting a remote Selenium server in R

How to set the package directory in R

Counting, adding or subtracting business days in R