Optimizing Open Source Projects for Contribution

Josh Matthews, @lastontheboat

...And In the Darkness Bind Them:







Fluent Rust Bindings for C APIs

Josh Matthews, @lastontheboat

Goals

Rust Exercise Prerequisites

If you're uncertain about writing Rust code, consider pairing with someone more experienced!

The basics

lib.h lib.rs
                  void init();
                
                  extern "C" {
                      pub fn init();
                  }
                

Convention

Unsafe, low-level bindings Safe, high-level bindings
                  mod ffi {
                      extern "C" {
                          fn init();
                      }
                  }
                
                  pub fn init() {
                      unsafe {
                          ffi::init();
                      }
                  }
                

The goal: encapsulate unsafety behind safe API barrier.

Primitive C types

Use the libc crate for maximum portability:

use libc::{c_void, c_char, c_short, c_uint, c_float};

Most primitive C types only require casting the equivalent Rust value.

More C types

lib.h
                  void* accept_double_return_void_ptr(double d);
                
lib.rs
                  extern "C" {
                      fn accept_double_return_void_ptr(d: c_double)
                          -> *const c_void;
                  }
                

C Strings

Strings in C are null-terminated:

const char* s = "Rust Belt Rust";
0x8000 = Rust
0x8004 =  Bel
0x8008 = t Ru
0x800D = st0 

Rust Strings

Strings in Rust are more complicated:

let s = &"Rust Belt Rust";
0x8000 = 0x9000
0x8004 = 14
0x9000 = Rust
0x9004 =  Bel
0x9008 = t Ru
0x900D = st..
                  struct StringSlice {
                      data: *const u8,
                      len: usize,
                  }
                

No null terminator in memory.

C Strings in Rust

            // Panic if string contains a null byte
            let s = CString::new("Rust Belt Rust").unwrap();
          

Allocates enough memory for string contents + null terminator.

Use s.as_ptr() to pass a pointer to C functions.

C Strings in Rust

            let s = CString::new("Rust Belt Rust").unwrap();
            let s_ptr = s.as_ptr();
            unsafe {
                ffi::puts(s_ptr);
            }
          

C Strings in Rust

            let s = CString::new("Rust Belt Rust").unwrap();
            let s_ptr = s.as_ptr();
            drop(s);
            unsafe {
                ffi::puts(s_ptr);
            }
          

Warning: pointer is invalid after destruction.

C Strings in Rust

This is safe:

            unsafe {
                ffi::puts(CString::new("Rust!").unwrap().as_ptr());
            }
          

C Strings in Rust

But this is not:

            unsafe {
                let s = CString::new("Rust!").unwrap().as_ptr();
                ffi::puts(s);
            }
          

Temporary values only last as long as the expression context. As function arguments, that is the duration of the call.

String literals

Sometimes you need static strings, or worry-free literals.

            unsafe {
                ffi::puts(b"Rust Belt Rust\0");
            }
          

The type of a byte string is &[u8].

Warning: it's easy to forget the null-terminator.

C Strings in Rust (redux)

CString is for Rust->C. For C->Rust, we need CStr.

            unsafe {
                let s_ptr: *const c_char = ffi::getenv(b"PATH\0");
                let s: &CStr = CStr::from_ptr(s_ptr);
            }
          

Warning: pointer has no meaningful lifetime.

Either copy the value or use it with caution.

C Strings in Rust (redux)

Copying the value from a CStr into a Rust string is straightforward:

            unsafe {
                let s_ptr: *const c_char = ffi::getenv(b"PATH\0");
                let s: &CStr = CStr::from_ptr(s_ptr);
                let s: &str = s.to_str().unwrap();
                let s: String = s.to_owned();
            }
          

CStr::to_str_lossy() can avoid the intermediate Result.

Vectors and C

C APIs often deal with pointers and explicit length values.

lib.h
                  void fill_int_buffer(int* buf, uint n);
                
lib.rs
                  extern "C" {
                      fn fill_int_buffer(buf: *mut c_int, n: c_uint);
                  }
                

Vectors and C

Extracting these from Rust slices is straightforward:

            let ints = &mut [0, 0, 0, 0];
            unsafe {
                ffi::fill_int_buffer(ints.as_mut_ptr(),
                                     ints.len() as c_uint);
            }
          

Dealing with pointers to pointers

We can transform Vec<T>/&[T] into *const T with as_ptr(). Vec<Vec<T>>/&[&[T]] requires allocating an intermediate vector.


C APIs that accept pointers expect a pointer to contiguous memory.
There is no contiguous memory containing inner pointers:

Vec<Vec<T>> is different than Vec<*const T>

Dealing with pointers to pointers

            let mut outer = vec![vec![1.0, 2.0], vec![3.0, 4.0]];
             
            let mut outer_c: Vec<*mut c_float> =
                outer.iter_mut()
                     .map(|inner| inner.as_mut_ptr())
                     .collect()
            unsafe {
                scale_2x2_matrix(outer_c.as_mut_ptr(), 2.0);
            }
          

Exercise #1

  1. cd exercise1; cargo build && cargo test
  2. Given src/library.h, fill in the ffi module in src/lib.rs
  3. Add safe bindings to src/lib.rs that wrap the FFI functions

The safe bindings should accept Rust types as arguments and convert them as appropriate in order to invoke FFI functions.

See hints in comments; see working code in src/solution.rs.

Exercise #1 discussion

Next on the agenda - complex types!

C structures

C structures require very little modification for use with FFI.

lib.h lib.rs

                  struct image_t {
                      unsigned char* pixels;
                      unsigned int width;
                      unsigned int height;
                  };
                
                  #[repr(C)]
                  struct image_t {
                      pixels: *mut c_uchar,
                      width: c_uint,
                      height: c_uint,
                  }
                

Opaque C structures

Sometimes C headers only provide forward declarations as part of APIs.

            struct image_t;
             
            struct image_t* image_create();
            unsigned int image_get_width(struct image_t*);
            unsigned int image_get_height(struct image_t*);
          

Opaque C structures (cont.)

This does not pose a problem for Rust bindings:

            struct image_t(c_void);
            extern "C" {
                fn image_create() -> *mut image_t;
                fn image_get_width(image: *const image_t) -> c_uint;
                fn image_get_height(image: *const image_t) -> c_uint;
            }
          

Like C library users, these values cannot be constructed by Rust code.

Managing memory for C types

If a C API provides construction and deletion functions, this maps nicely to Rust's unique ownership model.

            struct image_t;
            struct image_t* image_create(unsigned int w,
                                         unsigned int h);
            void image_free(struct image_t* image);
          

Managing memory for C types (cont.)

The low level types are unsurprising:

            struct image_t(c_void);
            extern "C" {
                fn image_create(w: c_uint, h: c_uint) -> *mut image_t;
                fn image_free(image: *mut image_t);
            }
          

Managing memory for C types (cont.)

We can encapsulate low-level types in high-level wrappers:

            struct Image {
                ptr: *mut image_t,
            }
          

There is no need to duplicate any state from the low-level type.
The purpose of the wrapper is to mediate access to the unsafe pointer.

Managing memory for C types (cont.)

We can follow Rust construction idioms:

            impl Image {
                pub fn new(w: usize, h: usize) -> Image {
                    Image {
                        ptr: unsafe { ffi::image_create(w, h) },
                    }
                }
            }
          

Managing memory for C types (cont.)

Let's ensure that the memory does not leak or get used unsafely:

            impl Drop for Image {
                fn drop(&mut self) {
                    unsafe {
                        ffi::image_free(self.ptr);
                    }
                }
            }
          

Managing memory for C types (cont.)

We also need to wrap any related APIs that require raw pointers:

            impl Image {
                pub fn width(&self) -> usize {
                    unsafe { ffi::image_get_width(self.ptr) }
                }
                pub fn height(&self) -> usize {
                    unsafe { ffi::image_get_height(self.ptr) }
                }
            }
          

Managing memory for C types (cont.)

Beware of null pointers!

What if image_create can't allocate? Either:

Managing refcounted C types

Some APIs define types that have shared instead of unique ownership:

            struct image_t;
            struct image_t* image_create(unsigned int w,
                                         unsigned int h);
            void image_addref(struct image_t* image);
            void image_release(struct image_t* image);
          

Managing refcounted C types (cont.)

The FFI declarations remain unsurprising:

            struct image_t(c_void);
            extern "C" {
                fn image_create(w: c_uint, h: c_uint) -> *mut image_t;
                fn image_addref(image: *mut image_t);
                fn image_release(image: *mut image_t);
            }
          

Managing refcounted C types (cont.)

Our constructor needs some care:

            impl Image {
                pub fn new(w: usize, h: usize) -> Image {
                    let ptr = unsafe { ffi::image_create(w, h) };
                    unsafe { ffi::image_addref(ptr) };
                    Image {
                        ptr: ptr,
                    }
                }
            }
          

Managing refcounted C types (cont.)

Our destructor looks very similar:

            impl Drop for Image {
                fn drop(&mut self) {
                    unsafe {
                        ffi::image_release(self.ptr);
                    }
                }
            }
          

Managing refcounted C types (cont.)

And now something new - enabling shared ownership:

            impl Clone for Image {
                fn clone(&self) -> Image {
                    unsafe { ffi::image_addref(self.ptr); }
                    Image {
                        ptr: self.ptr
                    }
                }
            }
          

Summary

Who is responsible for creating and destroying values?

Can those responsibilities be associated with Rust's lifetimes?

Keep unsafe pointers hidden behind safe interfaces.

Exercise #2

  1. cd exercise2; cargo build && cargo test
  2. Given a ffi module in src/lib.rs, write safe wrappers for the string_t and slice_t types.
  3. Create a safe binding for the substring function. Provide a more meaningful return type than int.

See hints in comments; see working code in src/solution.rs.

Exercise #2 discussion

Next up - allocation and iteration!

Allocation Rule of Thumb

Don't let C code free memory allocated by Rust.
Be careful when freeing memory allocated by C.

Allocator mismatch errors are frustrating.

Long-term loans from Rust->C

So far we've only lent memory to C within a single stack frame.
Longer loans need well-defined endpoints for reclaiming the memory.

For strings:
CString::into_raw pairs with CString::from_raw

For arbitrary values:
Box::into_raw matches Box::from_raw

These will leak memory if used unevenly!

Long-term loans from Rust->C

            // Set the prefix that will be added to any subsequent
            // prints. The string will be copied for internal use.
            void set_print_prefix(const char* prefix);
          

versus

            // Set the prefix that will be added to any subsequent
            // prints. String must remain valid during all prints.
            void set_print_prefix(const char* prefix);
          

Long-term loans from Rust->C

This implementation for set_print_prefix only works if the string is copied. Otherwise, the pointer is invalid after returning.

            pub fn set_print_prefix(prefix: &str) {
                let s = CString::new(prefix).unwrap();
                unsafe {
                    ffi::set_print_prefix(s.as_ptr());
                }
            }
          

Long-term loans from Rust->C

Solution: transfer ownership using into_raw and from_raw

            let s = CString::new("Rust Belt Rust").unwrap();
            unsafe {
                ffi::set_print_prefix(s.into_raw());
            }
            unsafe {
                let _s = CString::from_raw(ffi::get_print_prefix());
            }
          

Rust iterators

At its core, Rust's support for iteration rests on two things.

The Iterator trait:

            trait Iterator {
                type Item;
                fn next(&mut self) -> Option<Self::Item>;
            }
          

This defines a protocol which declares the type of item returned, and how to obtain the next item.

Rust iterators (cont.)

The other important ingredient is that iterator types typically borrow the value being iterated, to prevent it from being modified or destroyed.

            struct MatrixIterator<'a> {
                matrix: &'a Matrix,
                index: usize,
            }
          

Iterators retain all state necessary to obtain the next item when requested.

Rust iterators (cont.)

An iterator type implements the Iterator trait:

            impl<'a> Iterator for MatrixIterator<'a> {
                type Item = f32;
                fn next(&mut self) -> Option<f32> {
                    self.index += 1;
                    match (self.index - 1) {
                        0 => Some(&self.matrix.m0),
                        1 => Some(&self.matrix.m1),
                        _ => None
                    }
                }
            }
          

Rust iterators (cont.)

Finally some other code constructs an iterator value:

            impl Matrix {
                pub fn iter(&self) -> MatrixIterator {
                    MatrixIterator {
                        matrix: self,
                        idx: 0,
                    }
                }
            }
          

Rust iterators (cont.)

Suddenly it Just Works:

            for val in matrix.iter().map(|v| v * 2) {
                println!("{}", val);
            }
          

Exercise #3

  1. cd exercise3; cargo build && cargo test
  2. Given a ffi module in src/lib.rs, a write safe wrapper for the query_result_t type.
  3. Create an iterator implementation that enables iterating over the results in a result set in order.
  4. Transfer ownership of a string to the query_result_t type and back.

See hints in comments; see working code in src/solution.rs.

Exercise #3 discussion

Next up - calling from Rust -> C -> Rust!

Function pointers

Function pointers in Rust look similar to function declarations.

            fn div2(v: u32) -> f32 { v as f32 / 2.0 }
             
            let f: fn(u32) -> f32;
            f = div2;
             
            assert!(f(5) == div2(5));
          

Function pointers

C callbacks are Rust function pointers with more extern and unsafe.

lib.h
                  typedef void* (*allocator_t)(unsigned int);
                  void init(allocator_t allocator);
                
lib.rs
                  type allocator_t =
                      Option<unsafe extern "C" fn(c_uint) -> *mut c_void>;
                  fn init(allocator: allocator_t);
                

Function pointers

            unsafe extern "C" null_allocator(_: c_uint) -> *mut c_void {
                ptr::null_mut()
            }
             
            // Pass a pointer to a Rust function,
            ffi::init(Some(null_allocator));
            // or a null pointer...
            ffi::init(None);
          

High-level callbacks

Callbacks from C APIs will receive low-level arguments. Our bindings need a translation layer.

There are two common mechanisms for this:

High-level callbacks

C API for per-context callback:

            void* context_get_private(struct context_t*);
            void context_set_private(struct context_t*, void*);
             
            typedef void (*operation_callback_t)(struct context_t*);
            void context_set_callback(struct context_t*,
                                      operation_callback_t);
          

High-level callbacks

Rust FFI declarations:

            fn context_get_private(cx: *mut context_t) -> *mut void;
            fn context_set_private(cx: *mut context_t, priv: *mut void);
             
            type operation_callback_t =
                Option<unsafe extern "C" (*mut context_t)>;
            fn context_set_callback(cx: *mut context_t,
                                    cb: operation_callback_t);
          

High-level callbacks

Associate high-level state with low-level data in the constructor:

            struct ContextPrivate(u32);
             
            let cx = Context { ptr: unsafe { ffi::context_create() } };
            let prv = Box::new(ContextPrivate(0));
            unsafe {
                ffi::context_set_private(cx.ptr,
                                         Box::into_raw(prv))
            }
          

High-level callbacks

Invoke high-level callbacks with high-level state:

            unsafe extern "C" fn callback(cx: *mut context_t) {
                let p = context_get_private(cx) as *mut ContextPrivate;
                rust_callback(&mut *p);
            }
             
            unsafe {
                ffi::context_set_callback(cx.ptr, callback)
            }
          

High-level callbacks

High level callback (no hint of unsafety):

            struct ContextPrivate(u32);
             
            fn rust_callback(data: &mut ContextPrivate) {
                // Count the number of times invoked
                data.0 += 1;
            }
          

High-level callbacks

Clean up in the associated destructor:

            impl Drop for Context {
                fn drop(&mut self) {
                    let p = unsafe { context_get_private(self.ptr) };
                    let _ = unsafe { Box::from_raw(p) };
                    ...
                }
            }
          

Using closures as function pointers

Sometimes more dynamic function pointers are desirable.

Closure types in Rust are two pointers (function + environment), so we cannot just use them in place of C function pointers.

We can build on the previous pattern to enable high-level APIs that run closures when C callbacks are invoked.

Using closures as function pointers

Storing closure trait objects in the private data allows subsequent execution:

            type OperationCallback = Box<FnMut(&mut ContextState)>;
             
            struct ContextState(u32);
             
            struct ContextPrivate {
                operation: OperationCallback,
                state: ContextState,
            }
          

Using closures as function pointers

Some careful partitioning of the members of the private state may be necessary to satisfy the borrow checker:

            unsafe extern "C" fn callback(cx: *mut context_t) {
                let p = context_get_private(cx) as *mut ContextPrivate;
                let p = &mut *p;
                (p.operation)(&mut p.state);
            }
          
        

Upgrading low-level types

One issue that crops up in callbacks is how to deal with passing arguments down to high-level callbacks.

unsafe extern "C" fn callback(cx: *mut context_t)
type OperationCallback = Box<FnMut(&mut Context)>;

If Context is a Rust type with unique ownership semantics, it is not safe to synthesize an instance for the duration of the callback.

What we really need is to borrow the high-level type for the callback, but it may not be reachable from the callback's code.

Upgrading low-level types

Let's split Context type into OwnedContext and BorrowedContext

            // Responsible for APIs that cannot destroy this context.
            struct BorrowedContext { ptr: *mut ffi::context_t }
             
            // Responsible only for APIs that destroy this context.
            struct OwnedContext { cx: BorrowedContext }
          

There is a clear similarity to Vec<T>/[T], String/str, T/&T...

Upgrading low-level types

To minimize friction, we can delegate OwnedContext to BorrowedContext as approprate:

            impl Deref for OwnedContext {
                type Target = BorrowedContext;
                fn deref(&self) -> &BorrowedContext {
                    &self.cx
                }
            }
          

Exercise #4

  1. cd exercise4; cargo build && cargo test
  2. Given a ffi module in src/lib.rs, write a safe wrapper for logger_t.
  3. Initialize the high-level wrapper with data that is used to make decisions in the logging callback.
  4. Use the finalizer callback to perform any clean up necessary.

See hints in comments; see working code in src/solution.rs.

Exercise #4 discussion

  1. How would the high level API design change if the logger's print hook invoked a closure?
  2. Does the API support not setting callbacks?
  3. Could a logger instance store copies of the logged messages?

Next up - the end!

Further resources

Thanks

Red panda (Firefox) Photo by Yortw