Review this fix for unsafe_libyaml, yaml_string_extend

โš“ Rust    ๐Ÿ“… 2025-09-17    ๐Ÿ‘ค surdeus    ๐Ÿ‘๏ธ 1      

surdeus

The advisory has recently been posted addressing problems in libyml::string::yaml_string_extend, part of unsafe-libyaml.

This library, in one form or another, is still part of many YAML implementations (see here), including my own crate, until very recently. This code, auto-generated from C, provides low-level YAML parsing and generation. Hence, this issue potentially affects a large number of users. Let's not leave without attention.

Responding to the advisory, I switched immediately to unsafe-libyaml-norway as they recommend. However, after careful review to yaml_string_extend there, I suspect that the problem persists. The code there seems not much different from the old unsafe-libyaml and still computes sizes with
(*end).c_offset_from(*start)

which seem resolving to ptr::offset_from on *const u8. That operation is only defined when both pointers are derived from the same allocation (or oneโ€‘past). Otherwise, it is undefined behavior. Also, the size math uses force_mul(2_i64) which can silently overflow if buffers get large and offset_from for the pointer rebase (*pointer = new_start.wrapping_offset((*pointer).c_offset_from(*start) โ€ฆ)) has the same precondition: *pointer must be within the same allocation as *start**. This may be UB too.

I propose an alternative that handles the uninitialized (NULL) case, avoiding offset_from entirely. For size calculations, it uses integer address diffs and checks growth with checked_mul to prevent integer overflow.

pub(crate) unsafe fn yaml_string_extend(
    start: *mut *mut yaml_char_t,
    pointer: *mut *mut yaml_char_t,
    end: *mut *mut yaml_char_t,
) {
    let cur_start = *start;
    let cur_ptr   = *pointer;
    let cur_end   = *end;

    // Compute old capacity and used bytes without `offset_from`.
    // Handle the uninitialized (NULL) case explicitly.
    let (old_cap, used) = if cur_start.is_null() {
        (0usize, 0usize)
    } else {
        let s = cur_start as usize;
        let p = cur_ptr   as usize;
        let e = cur_end   as usize;

        // Sanity checks in debug builds; keeps Miri happy if invariants are violated upstream.
        debug_assert!(p >= s && p <= e, "pointer must be within [start, end]");
        debug_assert!(e >= s, "end must be >= start");

        (e - s, p.saturating_sub(s))
    };

    // Pick a new capacity (bytes). Start with a small nonzero capacity if previously uninitialized.
    let base = if old_cap == 0 { 16usize } else { old_cap };
    let new_cap = match base.checked_mul(2) {
        Some(v) => v,
        None => {
            // Overflow: leave unchanged or handle via your error reporting.
            // You could also clamp to isize::MAX if your allocator accepts it.
            return;
        }
    };

    // Allocate / reallocate.
    let new_start = yaml_realloc(cur_start as *mut c_void, new_cap as size_t) as *mut yaml_char_t;
    if new_start.is_null() {
        // Allocation failed; leave inputs unchanged (or set an error flag if you have one).
        return;
    }

    // Zero only the grown tail.
    if new_cap > old_cap {
        let tail = new_start.add(old_cap);
        memset(tail as *mut c_void, 0, (new_cap - old_cap) as size_t);
    }

    // Rebase `pointer` and `end` into the new allocation.
    let new_ptr = new_start.add(used);
    let new_end = new_start.add(new_cap);

    *pointer = new_ptr;
    *end     = new_end;
    *start   = new_start;
}

I also looked at STRING_EXTEND here, I think that also can be rewritten more robustly, I suggest

macro_rules! STRING_EXTEND {
    ($string:expr) => {{
        // Number of bytes you need ahead of `pointer`
        const NEED: usize = 5;
        let p = $string.pointer as usize;
        let e = $string.end     as usize;
        if $string.start.is_null() || p.saturating_add(NEED) > e {
            unsafe {
                yaml_string_extend(
                    core::ptr::addr_of_mut!($string.start),
                    core::ptr::addr_of_mut!($string.pointer),
                    core::ptr::addr_of_mut!($string.end),
                );
            }
        }
    }};
}

I would be great to have the community check if these fixes are sound. Then we can open a pull request to the maintained branches of unsafe-libyaml, or if (hopefully unlikely) the community is too passive, I will create my own fork.

1 post - 1 participant

Read full topic

๐Ÿท๏ธ Rust_feed