Passing Strings

Description

When passing strings to FFI functions, there are four principles that should be followed:

  1. Make the lifetime of owned strings as long as possible.
  2. Minimize unsafe code during the conversion.
  3. If the C code can modify the string data, use Vec instead of CString.
  4. Unless the Foreign Function API requires it, the ownership of the string should not transfer to the callee.

Motivation

Rust has built-in support for C-style strings with its CString and CStr types. However, there are different approaches one can take with strings that are being sent to a foreign function call from a Rust function.

The best practice is simple: use CString in such a way as to minimize unsafe code. However, a secondary caveat is that the object must live long enough, meaning the lifetime should be maximized. In addition, the documentation explains that “round-tripping” a CString after modification is UB, so additional work is necessary in that case.

Code Example

pub mod unsafe_module {

    // other module content

    extern "C" {
        fn seterr(message: *const libc::c_char);
        fn geterr(buffer: *mut libc::c_char, size: libc::c_int) -> libc::c_int;
    }

    fn report_error_to_ffi<S: Into<String>>(err: S) -> Result<(), std::ffi::NulError> {
        let c_err = std::ffi::CString::new(err.into())?;

        unsafe {
            // SAFETY: calling an FFI whose documentation says the pointer is
            // const, so no modification should occur
            seterr(c_err.as_ptr());
        }

        Ok(())
        // The lifetime of c_err continues until here
    }

    fn get_error_from_ffi() -> Result<String, std::ffi::IntoStringError> {
        let mut buffer = vec![0u8; 1024];
        unsafe {
            // SAFETY: calling an FFI whose documentation implies
            // that the input need only live as long as the call
            let written: usize = geterr(buffer.as_mut_ptr(), 1023).into();

            buffer.truncate(written + 1);
        }

        std::ffi::CString::new(buffer).unwrap().into_string()
    }
}

Advantages

The example is written in a way to ensure that:

  1. The unsafe block is as small as possible.
  2. The CString lives long enough.
  3. Errors with typecasts are always propagated when possible.

A common mistake (so common it’s in the documentation) is to not use the variable in the first block:

pub mod unsafe_module {

    // other module content

    fn report_error<S: Into<String>>(err: S) -> Result<(), std::ffi::NulError> {
        unsafe {
            // SAFETY: whoops, this contains a dangling pointer!
            seterr(std::ffi::CString::new(err.into())?.as_ptr());
        }
        Ok(())
    }
}

This code will result in a dangling pointer, because the lifetime of the CString is not extended by the pointer creation, unlike if a reference were created.

Another issue frequently raised is that the initialization of a 1k vector of zeroes is “slow”. However, recent versions of Rust actually optimize that particular macro to a call to zmalloc, meaning it is as fast as the operating system’s ability to return zeroed memory (which is quite fast).

Disadvantages

None?

Last change: 2024-10-17, commit: 2e96120