This article is the second installment of a series about how to take advantage of the recent Rust support added to Linux. The first article in the series, 3 essentials for writing a Linux system library in Rust, describes special considerations that system libraries require when you are writing in Rust. This article demonstrates how to create a C binding so that programmers in C or C++ can call your Rust library. Rust has not conquered the Linux world yet, so our system library needs to provide bindings to other languages.
Check out the other three articles in this series:
-
Part 1: 3 essentials for writing a Linux system library in Rust
-
Part 4: Build trust in continuous integration for your Rust library
The Rust team has created a great document, Rust API Guidelines, about how to create a robust Rust library and crate. This article focuses on Linux-specific topics.
You can download the demo code from its GitHub repository. The package contains:
- An echo server listening on the Unix socket
/tmp/librabc
- A Rust crate that connects to the socket and sends a
ping
packet every 2 seconds - A C/Python binding
- A command-line interface (CLI) for the client
Elements of a C binding
Rust can generate C dynamic libraries (.so
files) as well as static libraries (.a
files), which can be easily wrapped in Go bindings and Python bindings and used in code written in those languages.
You can refer to the full code of the C library written in Rust in the clib folder of the GitHub repository.
The following line in the repository's Cargo.toml
file generates a C binding to your Rust library as a .so
file:
crate-type = ["cdylib"]
Use staticlib
in place of cdylib
to generate a static library.
Two elements of the library code lib.rs
should be noted here:
- The
#[no_mangle]
macro before each function instructs the Rust compiler not to add special characters to symbol names as it does for Rust native code. The symbols are left plain so that C code can link to this file and refer to the symbols. - The
extern "C"
keywords on functions instruct Rust to use the system ABI (glibc in the case of Linux) instead of the Rust ABI to accommodate the C linker.
An example of a function follows:
#[no_mangle]
pub extern "C" fn rabc_client_new(
client: *mut *mut RabcClient,
log: *mut *mut c_char,
err_kind: *mut *mut c_char,
err_msg: *mut *mut c_char,
) -> u32 {
RABC_PASS
}
You might wonder about the use of *mut *mut
. These are raw pointers pointing to a raw pointer in Rust, like using void **
in C as an output pointer.
The function has two types of output pointer:
*mut *mut c_char
is an output pointer to type String(char *
)*mut *mut RabcClient
is an output pointer to an opaque structRabcClient
We will look at each of these types.
Output pointer for a string
In the C world, char **
in a function argument returns a string (char *
) to the library consumer like this:
if (result != NULL) {
*result = malloc(strlen("ping") + 1);
if (*result != NULL) {
snprintf(*result, strlen("ping") + 1, "%s", "ping");
}
}
The following Rust code does the same using std::ffi::CString
:
if !result.is_null() {
unsafe {
*result = std::ptr::null_mut();
}
if let Ok(s) = std::ffi::CString::new("ping") {
unsafe {
*result = s.into_raw();
}
}
}
The Rust documentation states: "Failure to call CString::from_raw will lead to a memory leak. To prevent memory leaking, the memory of CString::into_raw()
should be freed (the C free
should not be used) via std::ffi::CString::from_raw
":
#[no_mangle]
pub extern "C" fn free_foo(
result: *mut libc::c_char,
) {
if !result.is_null() {
unsafe {
std::ffi::CString::from_raw(result)
}
}
}
Output pointer for an opaque struct
In the C world, it is common to expose an opaque struct in a library to let the developer maintain a compatible ABI when adding properties to the struct.
In our example, we need to expose the Rust struct RabcClient
. The C compiler does not know the size of the opaque struct, so we can expose it only as a pointer whose size is known to the compiler.
Let's see how this workaround is done in Rust:
#[no_mangle]
pub extern "C" fn rabc_client_new(
client: *mut *mut RabcClient,
log: *mut *mut c_char,
err_kind: *mut *mut c_char,
err_msg: *mut *mut c_char,
) -> u32 {
// Many lines omitted
if client.is_null() {
return RABC_FAIL_NULL_POINTER;
}
unsafe {
*client = std::ptr::null_mut();
}
unsafe {
*client = Box::into_raw(Box::new(c));
RABC_PASS
}
}
The client: *mut *mut RabcClient
clause is an output pointer to RabcClient
.
Because we should never dereference a null pointer, we put in the check client.is_null()
.
The line *client = std::ptr::null_mut();
makes sure we always set the output pointer to NULL when an error happens.
The Box::into_raw()
function gets the pointer to RabcClient
and removes its memory chunk from the Rust memory management system, trusting the library's user to manage the memory.
After finishing all work with RabcClient
, the user has to call rabc_client_free()
, which frees the memory leaked by Box::into_raw()
. The Rust library claims ownership of this memory chunk and drops it, thus freeing the memory:
#[no_mangle]
pub extern "C" fn rabc_client_free(client: *mut RabcClient) {
if !client.is_null() {
unsafe {
drop(Box::from_raw(client));
}
}
}
Logging in the C binding
C lacks a standard logging system. So the Rust library in this example stores its JSON-formatted logs in char *
strings as an output pointer of a function. A Python or Go binding reads this string and converts it to their logging system.
The full logging code is in the repository's logger.rs file. The code is based on the work of Gabriel Bastos—many thanks to Gabriel.
Given the Rust log
crate infrastructure, you just need to implement the log::Log
trait by storing logs in a Vec<LogEntry>
in memory, then dumping them to MemoryLogger.drain()
. The most difficult parts are:
- The
struct MemoryLogger
instance should have astatic
lifetime so that it could be invoked bylog::debug()
and etc functions in any thread context. - Draining the logs must be done in a thread-safe manner.
We will look at each of these issues.
Static lifetime for the MemoryLogger instance
To give struct MemoryLogger
a static lifetime, use OnceCell.set()
:
static INSTANCE: OnceCell<MemoryLogger> = OnceCell::new();
// Many lines omitted
fn init_logger() -> Result<&'static MemoryLogger, RabcError> {
match INSTANCE.get() {
Some(l) => {
Ok(l)
}
None => {
if INSTANCE.set(MemoryLogger::new()).is_err() {
return Err(foo);
}
if let Some(l) = INSTANCE.get() {
if let Err(e) = log::set_logger(l) {
Err(foo)
} else {
Ok(l)
}
} else {
Err(foo)
}
}
}
}
Thread-safe draining of logs
The C library should be thread-safe, so all the data in MemoryLogger
should be an atomic type or be protected by a lock. In this example, we protect the data through a std::sync::Mutex
):
pub(crate) struct MemoryLogger {
consumer_count: AtomicU16,
logs: Mutex<Vec<LogEntry>>,
}
The consumer_count
variable tracks the thread count of log consumers. When the last C function using logs invokes MemoryLogger::drain()
, the library drops the logs from Vec<LogEntry>
. Otherwise, the .drain
call returns a copy of the logs that were logged since the specified time.
pub(crate) fn drain(&self, since: SystemTime) -> String {
let mut logs = self.logs.lock().expect("inner lock poisoned");
let ret = serde_json::to_string(
&logs
.as_slice()
.iter()
.filter(|l| l.time >= since)
.collect::<Vec<&LogEntry>>(),
)
.unwrap_or_default();
if self.consumer_count.fetch_sub(1, Ordering::SeqCst) == 1 {
logs.clear();
}
ret
}
If you try this C library in a multithreaded program, you will find that the logs retrieved are mixed with output from other threads. I don't have a good solution for a thread-local logger yet, but I found suggestions for one at rust-lang/log. A pull request to this demo GitHub project will be much appreciated.
Fixing the SONAME
Rust does not support the SONAME naming convention yet as I draft this article. As a workaround, you can define the following in .cargo/config.toml
:
[build]
rustflags = "-Clink-arg=-Wl,-soname=libfoo.so.0"
You can also use patchelf
to modify the SONAME after the cargo build
:
patchelf --set-soname libfoo.so.0
Memory leak test for the C binding
The C binding uses a lot of unsafe
keywords to work with Rust raw pointers. So I recommend running a memory leak test for the C binding. The example project offers a make clib_check
command that uses Valgrind against a C program linked to our C binding .so
file. The code is in a file named rabc_test.c
and the project's Makefile.
Packaging for the C binding
To ship a C binding, you need:
- The following
.so
files:librabc.so.0.1
librabc.so.0
, linked tolibrabc.so.0.1
librabc.so
, linked tolibrabc.so.0
- The C header file:
rabc.h
. - A Pkgconfig file:
rabc.pc.in
You can write a Makefile containing a make install
command to install all those files.
For Fedora RPM packaging, the .cargo/config.toml
fix shown earlier for SONAME will not work, because Fedora uses .cargo/config
to change the dependency source searching folder. Therefore, we will use the patchelf
method for Fedora:
%prep
%setup -q
rm .cargo/config.toml
%cargo_prep
%install
env SKIP_PYTHON_INSTALL=1 \
PREFIX=%{_prefix} \
LIBDIR=%{_libdir} \
%make_install
patchelf --set-soname librabc.so.2 \
%{buildroot}/%{_libdir}/librabc.so.%{version}
Red Hat Enterprse Linux has no patchelf
in the build root, so you need to merge the flags defined in .cargo/config
. The following snippet shows how I set the flags:
%prep
# Source1 is vendored dependencies
%cargo_prep -V 1
_FLAGS=`sed -ne 's/rustflags = "\(.\+\)"/\1/p' .cargo/config.toml`
sed -i -e "s/rustflags = \[\(.\+\), \]$/rustflags = [\1, \"$_FLAGS\"]/" \
.cargo/config
##What's next?
Requirements might change from time to time, so please refer to the latest Fedora and Red Hat Enterprise Linux packaging guides.
The C binding can also be used in other languages such as Python and Go. The next article in this series demonstrates how to create a Python binding on top of the C binding from this article.
Last updated: August 14, 2023