I recently started working with Julia. I’ve been reading some great things about it and I felt I needed to try it out. As I’m working myself through a data science project, I plan on writing a couple of blog posts on stuff that needs some explanation. In this first blog post I want to write about the syntax and semantics of vectorized operations.
In many programming languages, vectorizing (or broadcasting) your operations is recommended, because writing your own loops is often many times slower than its vectorized variant — benchmarks here. In Python, arithmetical operations are easily vectorized through the NumPy package and pandas relies on it heavily. In R, there’s the apply functions, dplyr and data.table to vectorize operations.
Vectorization in Julia is really elegant, but contrary to other programming languages, it’s not necessarily faster.
In Julia, vectorized functions are not required for performance, and indeed it is often beneficial to write your own loops, but they can still be convenient.Julia Documentation
I’m going to show you five ways to proceed with vectorization and I’m closing off with two alternatives.
But first, let’s create some dummy data: 4 arrays with 5 floats each. It’s worth noting that you can also mix scalars with arrays. Let’s say we want to calculate a + b + sin(c) for each element in these arrays.
using BenchmarkTools a = [0.,1.,2.,3.,4.] b = [5.,6.,7.,8.,9.] c = [10.,11.,12.,13.,14.] z = similar(a)
The working horse of vectorization in Julia is the broadcast function. Since we perform three operations, this is not really elegant. But hey, it works.
@benchmark z = broadcast(+,a,broadcast(+,b,broadcast(sin,c))) # 360 ns
One of the most elegant ways is by using a particular notation of the dot syntax, which is more or less an equivalent of broadcast. This works for operators and functions — truly, any function can be vectorized using a dot. Dot operations get fused. Instead of performing three loops (two additions and a sine), Julia converts the example below in one loop.
@benchmark z = a .+ b .+ sin.(c) # Median time: 1950 ns
Dot notation has multiple faces. For example: the macro @., which allows you to drop the dot in all your operations as it vectorizes all operations and functions all at once.
@benchmark @. z = a + b + sin(c) # Median time: 975 ns
Then, there is the dot call, which is like the raw version of the dot operators. If I can believe the StackOverflow boards, this was the default way of vectorizing up until Julia 0.5.
@benchmark z = (+).(a,(+).(b,(+).((sin).(c)))) # Median time: 2678 ns
Ofcourse, Julia’s default behaviour for the sum of two arrays, is to sum the elements of the arrays row-wise. That’s why the following is also possible. (You still need to vectorize sin())
@benchmark z = a + b + sin.(c) # Median time: 775 ns
As I wrote earlier, vectorization is not necessarily the fastest way to perform operations over arrays. We can simply use list comprehension — like in Python, which is still faster than the dot call.
@benchmark z = [a[i] + b[i] + sin(c[i]) for i = 1:length(a)] # Median time: 1350 ns
And finally, the map function, which is found in many programming languages, and of course, also in Julia.
function calc(a,b,c) a + b + sin(c) end @benchmark z = map(calc,a,b,c) # Median time: 551 ns
Although it can be slower, I really prefer the dot syntax. It’s a clever and elegant way to perform element-wise operations relatively fast. You can also use the dot for updating a variable. If your variable already exists, you should use the in-place assignment operator .=. The following two examples are exactly the same!
a .= a .+ b a .+= b