Home » The many ways to vectorize operations in Julia

The many ways to vectorize operations in Julia

  • by
  • 4 min read

I recently started working with Julia. I’ve been reading some great things about it and I felt I needed to try it out. As I’m working myself through a data science project, I plan on writing a couple of blog posts on stuff that needs some explanation. In this first blog post I want to write about the syntax and semantics of vectorized operations.

In many programming languages, vectorizing (or broadcasting) your operations is recommended, because writing your own loops is often many times slower than its vectorized variant — benchmarks here. In Python, arithmetical operations are easily vectorized through the NumPy package and pandas relies on it heavily. In R, there’s the apply functions, dplyr and data.table to vectorize operations.

Vectorization in Julia is really elegant, but contrary to other programming languages, it’s not necessarily faster.

In Julia, vectorized functions are not required for performance, and indeed it is often beneficial to write your own loops, but they can still be convenient.

Julia Documentation

I’m going to show you five ways to proceed with vectorization and I’m closing off with two alternatives.

But first, let’s create some dummy data: 4 arrays with 5 floats each. It’s worth noting that you can also mix scalars with arrays. Let’s say we want to calculate a + b + sin(c) for each element in these arrays.

using BenchmarkTools

a = [0.,1.,2.,3.,4.]
b = [5.,6.,7.,8.,9.]
c = [10.,11.,12.,13.,14.]
z = similar(a)

The working horse of vectorization in Julia is the broadcast function. Since we perform three operations, this is not really elegant. But hey, it works.

@benchmark z = broadcast(+,a,broadcast(+,b,broadcast(sin,c))) # 360 ns

One of the most elegant ways is by using a particular notation of the dot syntax, which is more or less an equivalent of broadcast. This works for operators and functions — truly, any function can be vectorized using a dot. Dot operations get fused. Instead of performing three loops (two additions and a sine), Julia converts the example below in one loop.

@benchmark z = a .+ b .+ sin.(c) # Median time: 1950 ns

Dot notation has multiple faces. For example: the macro @., which allows you to drop the dot in all your operations as it vectorizes all operations and functions all at once.

@benchmark @. z = a + b + sin(c) # Median time: 975 ns

Then, there is the dot call, which is like the raw version of the dot operators. If I can believe the StackOverflow boards, this was the default way of vectorizing up until Julia 0.5.

@benchmark z = (+).(a,(+).(b,(+).((sin).(c)))) # Median time: 2678 ns

Ofcourse, Julia’s default behaviour for the sum of two arrays, is to sum the elements of the arrays row-wise. That’s why the following is also possible. (You still need to vectorize sin())

@benchmark z = a + b + sin.(c) # Median time: 775 ns

As I wrote earlier, vectorization is not necessarily the fastest way to perform operations over arrays. We can simply use list comprehension — like in Python, which is still faster than the dot call.

@benchmark z = [a[i] + b[i] + sin(c[i]) for i = 1:length(a)] # Median time: 1350 ns

And finally, the map function, which is found in many programming languages, and of course, also in Julia.

function calc(a,b,c)
    a + b + sin(c)
end

@benchmark z = map(calc,a,b,c) # Median time: 551 ns

Although it can be slower, I really prefer the dot syntax. It’s a clever and elegant way to perform element-wise operations relatively fast. You can also use the dot for updating a variable. If your variable already exists, you should use the in-place assignment operator .=. The following two examples are exactly the same!

a .= a .+ b
a .+= b

Great success!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Leave a Reply

Your email address will not be published. Required fields are marked *