Derefencing out of bounds under miri
⚓ Rust 📅 2026-01-30 👤 surdeus 👁️ 9I'm trying to read data that comes in messages of the following format
// A single zero byte signals a special message (outside of scope here)
0x0
// A message starting with a byte 1 <= tag < 0x80 is a "small" message
// and is immediately followed by that number - 1 of bytes
0xXY | message bytes
// A large message starts with u32 encoded in big-endian, and the first bit
// of the first byte set, followed by the indicated number - 4 of bytes
0x8Y 0xYY 0xYY 0xYY | message bytes
// Example
// [12, 104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100]
// ^^ a small message of 12 bytes
// ^--------------------payload-----------------------^
// decodes to b"hello world"
My first though was to encode this as an enum, but no repr(_) annotation seems to encode the correct choice of tag here, so I went with the following union instead
#[repr(C)]
union Payload {
small: ManuallyDrop<SmallPayload>,
large: ManuallyDrop<LargePayload>,
special: SpecialPayload,
}
#[repr(C)]
struct SmallPayload {
size: u8, // invariant: 0 < len < 0x80
}
#[repr(C, align(4))]
struct LargePayload {
size: [u8; 4], // first bits of [0] are 0b1...
}
#[derive(Clone, Copy, Debug)]
#[repr(C, u8)]
enum Marker {
SpecialPad = 0b0000_0000,
}
#[derive(Clone, Copy, Debug)]
#[repr(C)]
struct SpecialPayload {
marker: Marker,
}
It is obviously incorrect to just construct values here, since the message bytes are out-of-band and need to be accessible immediately after the Payload structure. I have a load method verifying this (panics if assumptions are violated, real implementation returns an Error).
fn load(bytes: &[u8]) -> &Payload {
assert!(bytes.as_ptr().addr().is_multiple_of(align_of::<Payload>()), "wrong alignment");
let size = bytes[0];
if size < 0x80 { // small
let _payload = &bytes[1..size as usize];
} else if size > 0 { // large
let size = u32::from_be_bytes(bytes[0..4].try_into().unwrap()) & 0x7FFF_FFFF;
let _payload = &bytes[4..size as usize];
}
unsafe { &*(bytes as *const _ as *const _) }
}
I then provide a wrapper to access the message from a message, for example for a small message, I have
impl SmallPayload {
fn message(&self) -> &[u8] {
let message_slice = 1..self.size as usize;
let as_bytes = unsafe { std::slice::from_raw_parts((self as *const Self).cast(), message_slice.end) };
&as_bytes[message_slice]
}
}
Now, the problem: miri does not agree with me.
error: Undefined Behavior: trying to retag from <512076> for SharedReadOnly permission at alloc37223[0x1], but that tag does not exist in the borrow stack for this location
As far as I understand it, the problem is that the reference to a Payload (and subsequently SmallPayload) does not get to read from the bytes past its end, even though these are in the same allocation and - in spirit at least - part of the payload object.
How should I model this to make miri agree? For the same matter, how does rkyv achieve this for its ArchivedString anyway?
6 posts - 2 participants
🏷️ Rust_feed