How Rust makes the Rayon data parallelism library magical

Rayon is a data parallelism library for the Rust programming language. Common reactions from programmers who start to use Rayon express how it seems magical: "I changed one line and my code now runs in parallel!" As one of Rayon’s authors, I am of course glad to see happy users, but I want to dispel some of the magic and give credit where it’s due—to Rust itself.

How Rust supports Rayon's parallelism

A best-case "magical" scenario often looks something like this with a sequential iterator:

let total = foo_vector.iter_mut()
    .filter(|foo| foo.is_interesting())
    .map(|foo| foo.heavy_computation())
    .sum();

To make this a parallel iterator with Rayon, simply change the first line to call par_iter_mut(), as shown in the following example, and watch it light up all of your CPUs! It’s no problem that foo_vector is a local variable on the stack, or that the computation might be mutating the values. The whole collection is automatically split among multiple threads, updates being processed independently without issues:

let total = foo_vector.par_iter_mut()
    .filter(|foo| foo.is_interesting())
    .map(|foo| foo.heavy_computation())
    .sum();

As a Rust developer, it’s not enough just to go fast. Everything must still also be checked for memory and thread safety. That principle is maintained with Rayon. Suppose the sequential iterator had been written more like this:

let mut total = 0;
foo_vector.iter_mut()
    .filter(|foo| foo.is_interesting())
    .for_each(|foo| total += foo.heavy_computation());

This code is fine sequentially, but if you try to make it parallel with par_iter_mut(), the compiler will return an error:

cannot assign to `total`, as it is a captured variable in a `Fn` closure

There would be a data race if multiple threads tried to update total at the same time. You could solve this by using an atomic type for total, or by using Rayon’s built-in parallel sum() or your own custom fold+reduce on the iterator.

But Rayon is a plain library, with no compiler integration whatsoever, so how can it know there was a data race? That magic is in the Rust language itself, just from Rayon expressing the right generic constraints.

Ownership and borrowing in Rust

Rust has strong semantics for the ownership of values of any type T. The compiler’s static borrow-checker keeps track of where a value has been borrowed as a reference, either as &mut T or &T. When a value is owned without any borrows, the owner can do whatever it wants with the value. Borrowing the value as &mut T is exclusive access, where only the borrower can do anything with the value—but even the borrower must leave the underlying T in a valid state. Borrowing the value as &T is immutable shared access, where both the borrower and the owner can only read the value. The borrow checker enforces all of this in statically determined regions of code (lifetimes), where only unborrowed owners of T or exclusive borrowers of &mut T are allowed to make any modifications. The only time &T can be modified is with types built on UnsafeCell, like atomics or Mutex, that add runtime synchronization.

I recommend reading Rust: A unique perspective for more on this topic.

Thread safety traits

There are two auto-traits controlling all of Rust’s thread safety: Send and Sync. Being automatic means that these traits are inferred based on their composition: a struct will implement Send or Sync only if all of its fields do. If some field does not, such as a raw pointer, the traits do not apply to the struct unless a trait is added with an unsafe impl declaration where the author asserts that thread safety will be maintained in some other way (at risk of undefined behavior).

Send means that T can move control to another thread. For owned values, this simply means the value can be moved entirely, and the original thread has no more access. But we can send references too. For a unique &mut T, the reference can be sent if T: Send is satisfied, passing the unique borrow to the other thread. For a shared &T, the reference can be sent only based on the other trait, T: Sync, which indicates that T can be safely shared with another thread.

The Rustonomicon has a detailed chapter on Send and Sync.

Function traits

There are three traits related to the ways that functions can be called: FnOnce, FnMut, and Fn. Plain functions automatically implement all three traits, but closures implement the traits depending on how the closures use their captured state. If a closure would move or consume any part of its state, it implements only FnOnce called with self by value, because it wouldn’t have state remaining to move or consume a second time. If a closure modifies its state, it implements FnMut called with &mut self, and it can also be called as FnOnce. If a closure just reads its state, or has no state at all like a plain function, it implements Fn called with &self, as well as FnMut and FnOnce.

The blog post Closures: Magic Functions goes into detail about this implementation.

Generic constraints in Rayon

With these powerful tools in the Rust language, Rayon only has to specify its constraints. The parallel iterators and their items have to implement Send, simply because they will be sent between threads. Iterator methods such as filter, map, and for_each have a few more constraints on their callback function/closure F:

  • It must implement Send so you can send it to the thread pool.
  • It must implement Sync so you can share &F references to that callback across multiple threads.
  • It must implement Fn so it can be called from any of those threads with a shared reference.

Thus Rayon requires F: Send + Sync + Fn.

Let's look again at the example that the compiler would reject:

let mut total = 0;
foo_vector.par_iter_mut()
    .filter(|foo| foo.is_interesting())
    .for_each(|foo| total += foo.heavy_computation());

We'll take for granted that the type Foo in this vector implements Send, so it's also perfectly fine to send &mut Foo references among threads as the parallel iterator items. The for_each closure would capture a mutable reference to the accumulated total variable. Assuming that the number has a simple type such as i32, it would be acceptable to Send the closure with &mut i32 to another thread. It would even be fine with Sync to share references to that closure between threads. However, the mutation would make it FnMut, requiring &mut self to actually call it. The error from the compiler should now make sense:

cannot assign to `total`, as it is a captured variable in a `Fn` closure

Because Rayon requires Fn here, that gets locked in by the compiler, and you're not allowed to do anything that would make it FnMut. If you change the total to a type such as AtomicI32, which does allow updates with a shared reference, the code will compile and work just fine:

let mut total = AtomicI32::new(0);
foo_vector.par_iter_mut()
    .filter(|foo| foo.is_interesting())
    .for_each(|foo| total.fetch_add(foo.heavy_computation(), Ordering::Relaxed));

This is the impact of Rust’s fearless concurrency (or parallelism)—not that you will never write bugs in threaded code, but that the compiler will catch the bugs before they can hurt you. Rayon can make it really easy to dip your toes into parallelism, feeling almost magical, but in truth Rayon doesn’t know anything about your code: it just specifies simple constraints and lets the Rust compiler do the hard work of proving it.

For more articles about Rust, please visit Red Hat's topic page.

Last updated: May 3, 2021

Comments