Is this possible or always UB?
โ Rust ๐ 2026-02-02 ๐ค surdeus ๐๏ธ 8In short: I am wondering if I can express something in Rust/LLVM or if its impossible/always UB. It involves reading memory that may be simultaneously written to by a different thread, and that memory is allocated by Rust.
On the side I write and learn about high-performance storage-engines for fun. I want to implement a technique used in Btree-based storage engines called "optimistic lock coupling"
The mechanism is relatively simple but almost immediately UB when expressed in Rust, or LLVM in general infact, in any way that I can think of. I am not a hugely experienced unsafe user though, so maybe someone can come up with a solution.
Optimistic-lock-coupling works as follows in this context:
We have a "frame", which (for the sake of this question) is just some struct that holds a reference to a buffer - a chunk of bytes like [u8; 4096] - and an atomic "version" variable (an AtomicU64 lets say).
We want to allow multiple readers (threads) to optimistically read from the frame without modifiying any value, like doing a fetch_add or CAS on the version atomic, or taking a lock, because of cache coherency concerns. In the happy path a reader just goes: load-version -> read-buffer -> load-version and no cache invalidations are caused.
However, we can and will, at times, have a writer-thread come and write to the page while we're in the middle of reading it. This is why we check the version again at the end, because the writer would increment the version counter before AND after its buffer-write. We WILL incur data races but we will simply discard the read data.
To summarize the read path:
- we (reading thread) check the atomic version counter and store the value
- we do a read of the page (this involves some SIMD strcmp stuff to find our value)
- we check the atomic version counter again, and if its changed we discard our result from
step 2 and retry all over again.
The problem:
This is instantly UB because we WILL be reading while someone else is writing, and WILL have a data race, however, putting rust/llvm aside momentarily, we are handling this (logically) because we just discard that torn/invalid data, and we don't do anything dangerous like dereference it until we have double checked the version atomic, and ensured that our read did not incur a data race.
And so my question: Is there a way to soundly express this in rust (or LLVM in general) without UB? Or is this genuinely not expressable?
A couple notes here to preempt certain answers:
- We could use something like
&[AtomicU8]or packed&[AtomicU64], but this will prevent the compiler from omitting SIMD for our strcmp logic. This is unacceptable. - Thing like
read_volatilestill consider this UB, because a) the data at the ptr is not "valid" for reads (it may be getting written to by another thread) and b) the memory does exist inside the rust allocations (source: read_volatile in std::ptr - Rust). The intent ofread_volatileis to reason about memory that may change for reasons outside our program, which is not the case here. - We could write our read logic in
asm!(), but thats horrible, and also still maybe UB anyway, I don't know offhand). - This is a real technique used in real systems. Please spare me the "don't design it this way" replies.
- http://sites.computer.org/debull/A19mar/p73.pdf
- leanstore/backend/leanstore/sync-primitives/PageGuard.hpp at master ยท leanstore/leanstore ยท GitHub
- (I don't read C++/Templatese so don't ask me about that code).
Thank you for any insight! I am quite curious about this.
4 posts - 3 participants
๐ท๏ธ Rust_feed