How Rust makes the Rayon data parallelism library magical

This article is the second installment of a series about how to take advantage of the recent Rust support added to Linux. The first article in the series, 3 essentials for writing a Linux system library in Rust, describes special considerations that system libraries require when you are writing in Rust. This article demonstrates how to create a C binding so that programmers in C or C++ can call your Rust library. Rust has not conquered the Linux world yet, so our system library needs to provide bindings to other languages.

Check out the other three articles in this series:

The Rust team has created a great document, Rust API Guidelines, about how to create a robust Rust library and crate. This article focuses on Linux-specific topics.

You can download the demo code from its GitHub repository. The package contains:

  • An echo server listening on the Unix socket /tmp/librabc
  • A Rust crate that connects to the socket and sends a ping packet every 2 seconds
  • A C/Python binding
  • A command-line interface (CLI) for the client

Elements of a C binding

Rust can generate C dynamic libraries (.so files) as well as static libraries (.a files), which can be easily wrapped in Go bindings and Python bindings and used in code written in those languages.

You can refer to the full code of the C library written in Rust in the clib folder of the GitHub repository.

The following line in the repository's Cargo.toml file generates a C binding to your Rust library as a .so file:

crate-type = ["cdylib"]

Use staticlib in place of cdylib to generate a static library.

Two elements of the library code lib.rs should be noted here:

  • The #[no_mangle] macro before each function instructs the Rust compiler not to add special characters to symbol names as it does for Rust native code. The symbols are left plain so that C code can link to this file and refer to the symbols.
  • The extern "C" keywords on functions instruct Rust to use the system ABI (glibc in the case of Linux) instead of the Rust ABI to accommodate the C linker.

An example of a function follows:

#[no_mangle]
pub extern "C" fn rabc_client_new(
    client: *mut *mut RabcClient,
    log: *mut *mut c_char,
    err_kind: *mut *mut c_char,
    err_msg: *mut *mut c_char,
) -> u32 {
    RABC_PASS
}

You might wonder about the use of *mut *mut. These are raw pointers pointing to a raw pointer in Rust, like using void ** in C as an output pointer.

The function has two types of output pointer:

  • *mut *mut c_char is an output pointer to type String(char *)
  • *mut *mut RabcClient is an output pointer to an opaque struct RabcClient

We will look at each of these types.

Output pointer for a string

In the C world, char ** in a function argument returns a string (char *) to the library consumer like this:

if (result != NULL) {
    *result = malloc(strlen("ping") + 1);
    if (*result != NULL) {
        snprintf(*result, strlen("ping") + 1, "%s", "ping");
    }
}

The following Rust code does the same using std::ffi::CString:

if !result.is_null() {
    unsafe {
        *result = std::ptr::null_mut();
    }
    if let Ok(s) = std::ffi::CString::new("ping") {
        unsafe {
            *result = s.into_raw();
        }
    }
}

The Rust documentation states: "Failure to call CString::from_raw will lead to a memory leak. To prevent memory leaking, the memory of CString::into_raw() should be freed (the C free should not be used) via std::ffi::CString::from_raw":

#[no_mangle]
pub extern "C" fn free_foo(
    result: *mut libc::c_char,
) {
    if !result.is_null() {
        unsafe {
            std::ffi::CString::from_raw(result)
        }
    }
}

Output pointer for an opaque struct

In the C world, it is common to expose an opaque struct in a library to let the developer maintain a compatible ABI when adding properties to the struct.

In our example, we need to expose the Rust struct RabcClient. The C compiler does not know the size of the opaque struct, so we can expose it only as a pointer whose size is known to the compiler.

Let's see how this workaround is done in Rust:

#[no_mangle]
pub extern "C" fn rabc_client_new(
    client: *mut *mut RabcClient,
    log: *mut *mut c_char,
    err_kind: *mut *mut c_char,
    err_msg: *mut *mut c_char,
) -> u32 {

    // Many lines omitted
    if client.is_null() {
        return RABC_FAIL_NULL_POINTER;
    }

    unsafe {
        *client = std::ptr::null_mut();
    }

    unsafe {
        *client = Box::into_raw(Box::new(c));
        RABC_PASS
    }
}

The client: *mut *mut RabcClient clause is an output pointer to RabcClient.

Because we should never dereference a null pointer, we put in the check client.is_null().

The line *client = std::ptr::null_mut(); makes sure we always set the output pointer to NULL when an error happens.

The Box::into_raw() function gets the pointer to RabcClient and removes its memory chunk from the Rust memory management system, trusting the library's user to manage the memory.

After finishing all work with RabcClient, the user has to call rabc_client_free(), which frees the memory leaked by Box::into_raw(). The Rust library claims ownership of this memory chunk and drops it, thus freeing the memory:

#[no_mangle]
pub extern "C" fn rabc_client_free(client: *mut RabcClient) {
    if !client.is_null() {
        unsafe {
            drop(Box::from_raw(client));
        }
    }
}

Logging in the C binding

C lacks a standard logging system. So the Rust library in this example stores its JSON-formatted logs in char * strings as an output pointer of a function. A Python or Go binding reads this string and converts it to their logging system.

The full logging code is in the repository's logger.rs file. The code is based on the work of Gabriel Bastos—many thanks to Gabriel.

Given the Rust log crate infrastructure, you just need to implement the log::Log trait by storing logs in a Vec<LogEntry> in memory, then dumping them to MemoryLogger.drain(). The most difficult parts are:

  • The struct MemoryLogger instance should have a static lifetime so that it could be invoked by log::debug() and etc functions in any thread context.
  • Draining the logs must be done in a thread-safe manner.

We will look at each of these issues.

Static lifetime for the MemoryLogger instance

To give struct MemoryLogger a static lifetime, use OnceCell.set():

static INSTANCE: OnceCell<MemoryLogger> = OnceCell::new();

// Many lines omitted
fn init_logger() -> Result<&'static MemoryLogger, RabcError> {
    match INSTANCE.get() {
        Some(l) => {
            Ok(l)
        }
        None => {
            if INSTANCE.set(MemoryLogger::new()).is_err() {
                return Err(foo);
            }
            if let Some(l) = INSTANCE.get() {
                if let Err(e) = log::set_logger(l) {
                    Err(foo)
                } else {
                    Ok(l)
                }
            } else {
                Err(foo)
            }
        }
    }
}

Thread-safe draining of logs

The C library should be thread-safe, so all the data in MemoryLogger should be an atomic type or be protected by a lock. In this example, we protect the data through a std::sync::Mutex):

pub(crate) struct MemoryLogger {
    consumer_count: AtomicU16,
    logs: Mutex<Vec<LogEntry>>,
}

The consumer_count variable tracks the thread count of log consumers. When the last C function using logs invokes MemoryLogger::drain(), the library drops the logs from Vec<LogEntry>. Otherwise, the .drain call returns a copy of the logs that were logged since the specified time.

pub(crate) fn drain(&self, since: SystemTime) -> String {
    let mut logs = self.logs.lock().expect("inner lock poisoned");
    let ret = serde_json::to_string(
        &logs
            .as_slice()
            .iter()
            .filter(|l| l.time >= since)
            .collect::<Vec<&LogEntry>>(),
    )
    .unwrap_or_default();
    if self.consumer_count.fetch_sub(1, Ordering::SeqCst) == 1 {
        logs.clear();
    }
    ret
}

If you try this C library in a multithreaded program, you will find that the logs retrieved are mixed with output from other threads. I don't have a good solution for a thread-local logger yet, but I found suggestions for one at rust-lang/log. A pull request to this demo GitHub project will be much appreciated.

Fixing the SONAME

Rust does not support the SONAME naming convention yet as I draft this article. As a workaround, you can define the following in .cargo/config.toml:

[build]
rustflags = "-Clink-arg=-Wl,-soname=libfoo.so.0"

You can also use patchelf to modify the SONAME after the cargo build:

patchelf --set-soname libfoo.so.0

Memory leak test for the C binding

The C binding uses a lot of unsafe keywords to work with Rust raw pointers. So I recommend running a memory leak test for the C binding. The example project offers a make clib_check command that uses Valgrind against a C program linked to our C binding .so file. The code is in a file named rabc_test.c and the project's Makefile.

Packaging for the C binding

To ship a C binding, you need:

  • The following .so files:
    • librabc.so.0.1
    • librabc.so.0, linked to librabc.so.0.1
    • librabc.so, linked to librabc.so.0
  • The C header file: rabc.h.
  • A Pkgconfig file: rabc.pc.in

You can write a Makefile containing a make install command to install all those files.

For Fedora RPM packaging, the .cargo/config.toml fix shown earlier for SONAME will not work, because Fedora uses .cargo/config to change the dependency source searching folder. Therefore, we will use the patchelf method for Fedora:

%prep
%setup -q
rm .cargo/config.toml
%cargo_prep

%install
env SKIP_PYTHON_INSTALL=1 \
    PREFIX=%{_prefix} \
    LIBDIR=%{_libdir} \
    %make_install
patchelf --set-soname librabc.so.2 \
    %{buildroot}/%{_libdir}/librabc.so.%{version}

Red Hat Enterprse Linux has no patchelf in the build root, so you need to merge the flags defined in .cargo/config. The following snippet shows how I set the flags:

%prep
# Source1 is vendored dependencies
%cargo_prep -V 1

_FLAGS=`sed -ne 's/rustflags = "\(.\+\)"/\1/p' .cargo/config.toml`
sed -i -e "s/rustflags = \[\(.\+\), \]$/rustflags = [\1, \"$_FLAGS\"]/" \
    .cargo/config

What's next?

Requirements might change from time to time, so please refer to the latest Fedora and Red Hat Enterprise Linux packaging guides.

The C binding can also be used in other languages such as Python and Go. The next article in this series demonstrates how to create a Python binding on top of the C binding from this article.

Comments