FFI around mpv, are my data types sound?

⚓ rust    📅 2025-06-07    👤 surdeus    đŸ‘ī¸ 2      

surdeus

I'm working on a libmpv client crate.

My main concern is the handling of data that crosses the FFI boundary, which can be found in src/types/, specifically, node.rs, node_array.rs, node_map.rs, and byte_array.rs.

The Node type is an enum that can hold, among other things, a NodeArray or a NodeMap, each of which contains other Nodes which themselves may be NodeArrays or NodeMaps. A Node can also contain a ByteArray, though there is no further nesting after that.

The bindgen types that actually must be sent to mpv are:

#[repr(C)]
#[derive(Copy, Clone)]
pub struct mpv_node {
    pub u: mpv_node__bindgen_ty_1,
    pub format: mpv_format,
}

#[repr(C)]
#[derive(Copy, Clone)]
pub union mpv_node__bindgen_ty_1 {
    pub string: *mut ::std::os::raw::c_char,
    pub flag: ::std::os::raw::c_int,
    pub int64: i64,
    pub double_: f64,
    pub list: *mut mpv_node_list,
    pub ba: *mut mpv_byte_array,
}

#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct mpv_node_list {
    pub num: ::std::os::raw::c_int,
    pub values: *mut mpv_node,
    // keys is only valid if the mpv_format tag in the containing mpv_node is MPV_FORMAT_NODE_MAP, otherwise, NULL
    pub keys: *mut *mut ::std::os::raw::c_char,
}

#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct mpv_byte_array {
    pub data: *mut ::std::os::raw::c_void,
    pub size: usize,
}

Since these types contain raw pointers to structures with different layouts than the native Rust types I'm using for the more ergonomic API, I have to first deconstruct the Rust types into an intermediate type which can hold any owned buffers[1], as well as the underlying mpv representation, and that intermediate step is primarily what I'm concerned about.

Here direct links to the MpvRepr types for each of the above: MpvNode, MpvNodeArray, MpvNodeMap, and MpvByteArray.

And then here are direct links to the ToMpvRepr implementations for each: Node, NodeArray, NodeMap, and ByteArray.

These MpvRepr types are not public, and are designed such that they are only used in the types MpvSend implementation; here is Node's, for example. As such, their lifetime should be pretty short, limited only to the completion of the mpv function call (which is wrapped in the lambda passed into to_mpv(), see here for an example).

However, because I'm dealing with raw pointers, I'm a bit concerned about maintaining validity. In my first attempt at this, I goofed and didn't Box the underlying mpv types, like mpv_node. When I started writing my set of test cases, the first few passed, but once I started nesting more deeply, I started getting some corruption. I guess after a few months of Rust I forgot about stack vs. heap.

This was fixed by Box'ing the underlying mpv_* structure and ensuring all the other fields which are referenced by that structure are allocated on the heap somehow (for example, in NodeArray, _flat_reprs, which node_list contains a pointer to).

However, after reading about Pin, I'm a bit concerned about the possibility of data moving. As far as I know, Box shouldn't move it's allocation, and Vec shouldn't if it doesn't reallocate (which I try to prevent by creating the Vec with the correct capacity). But I'm wondering if there's something I'm overlooking, and whether I should be using Pin? These structures aren't exactly self-referential, but they do have raw pointers to other heap structures, which do have to be stored, even if only for a very limited lifetime.

I'll take any feedback on any part of the library, the main user interface is handle.rs and the docs are live here: libmpv_client - Rust

However my primary goal is feedback on the way I'm handling the conversion to mpv-compatible types, and any potential soundness or pointer validity issues.

Thank you for any advice you can give, I'm very grateful!

P.S., one idea I already have, though I don't think it'll really have a practical effect, is moving the _flat_reprs in NodeArray and the _flat_reprs and _flat_keys in NodeMap to be Box'd slices, so there is a clear delineation between the buffers which get sent to mpv as pointers, and the Vecs which exist merely to keep the intermediate MpvRepr alive.


  1. The biggest reason why I have to store these intermediate types is because Node can hold a CString, which we have to own. If not, then each of these types could just have a to_mpv() which takes a reference to the original type and returns the relevant mpv_* data type already constructed; however, since NodeMap and NodeArray could contain a Node which is a string, in which case we need to retain ownership so it's CString doesn't get dropped. There's also the matter of the arrays of mpv_nodes in NodeArray and NodeMap, but I had a different solution for that as well at one point. I thought about flattening out all of the referenced CStrings in the whole Node tree into a single Vec<CString>, but the logic for this got very complicated, so I went with this sort of recursive solution. â†Šī¸Ž

1 post - 1 participant

Read full topic

đŸˇī¸ rust_feed