Derefencing out of bounds under miri

⚓ Rust    📅 2026-01-30    👤 surdeus    👁️ 9      

surdeus

Warning

This post was published 32 days ago. The information described in this article may have changed.

I'm trying to read data that comes in messages of the following format

// A single zero byte signals a special message (outside of scope here)
0x0
// A message starting with a byte 1 <= tag < 0x80 is a "small" message
// and is immediately followed by that number - 1 of bytes
0xXY | message bytes
// A large message starts with u32 encoded in big-endian, and the first bit
// of the first byte set, followed by the indicated number - 4 of bytes
0x8Y 0xYY 0xYY 0xYY | message bytes

// Example
// [12, 104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100]
//  ^^ a small message of 12 bytes
//      ^--------------------payload-----------------------^
//                 decodes to b"hello world"

My first though was to encode this as an enum, but no repr(_) annotation seems to encode the correct choice of tag here, so I went with the following union instead

#[repr(C)]
union Payload {
    small: ManuallyDrop<SmallPayload>,
    large: ManuallyDrop<LargePayload>,
    special: SpecialPayload,
}
#[repr(C)]
struct SmallPayload {
    size: u8, // invariant: 0 < len < 0x80
}
#[repr(C, align(4))]
struct LargePayload {
    size: [u8; 4], // first bits of [0] are 0b1...
}
#[derive(Clone, Copy, Debug)]
#[repr(C, u8)]
enum Marker {
    SpecialPad = 0b0000_0000,
}
#[derive(Clone, Copy, Debug)]
#[repr(C)]
struct SpecialPayload {
    marker: Marker,
}

It is obviously incorrect to just construct values here, since the message bytes are out-of-band and need to be accessible immediately after the Payload structure. I have a load method verifying this (panics if assumptions are violated, real implementation returns an Error).

fn load(bytes: &[u8]) -> &Payload {
    assert!(bytes.as_ptr().addr().is_multiple_of(align_of::<Payload>()), "wrong alignment");
    let size = bytes[0];
    if size < 0x80 { // small
        let _payload = &bytes[1..size as usize];
    } else if size > 0 { // large
        let size = u32::from_be_bytes(bytes[0..4].try_into().unwrap()) & 0x7FFF_FFFF;
        let _payload = &bytes[4..size as usize];
    }
    unsafe { &*(bytes as *const _ as *const _) }
}

I then provide a wrapper to access the message from a message, for example for a small message, I have

impl SmallPayload {
    fn message(&self) -> &[u8] {
        let message_slice = 1..self.size as usize;
        let as_bytes = unsafe { std::slice::from_raw_parts((self as *const Self).cast(), message_slice.end) };
        &as_bytes[message_slice]
    }
}

Full playground link.

Now, the problem: miri does not agree with me.

error: Undefined Behavior: trying to retag from <512076> for SharedReadOnly permission at alloc37223[0x1], but that tag does not exist in the borrow stack for this location

As far as I understand it, the problem is that the reference to a Payload (and subsequently SmallPayload) does not get to read from the bytes past its end, even though these are in the same allocation and - in spirit at least - part of the payload object.

How should I model this to make miri agree? For the same matter, how does rkyv achieve this for its ArchivedString anyway?

6 posts - 2 participants

Read full topic

🏷️ Rust_feed