Info
This post is auto-generated from RSS feed The Rust Programming Language Forum - Latest topics. Source: Premature optimization â trying to understand atomics and ordering
Premature optimization â trying to understand atomics and ordering
â Rust đ 2025-09-03 đ¤ surdeus đī¸ 2This is the first time I'm using atomics in any language. I'm hoping to receive some feedback on how I understand them.
Literature that I've already read:
In my application I found the need for a set data structure shared between threads[1]. The set only needs to support insert
and remove
(remove
for existence checking, each item only needs to be checked once).
In its current form it's an Arc<Mutex<HashMap<T>>>
. In actual usage, I found that the set is going to be empty most of the time. This leads me to think that, if I could separately track whether the set is empty, most of the checking could perhaps be done without locking and therefore be faster[2][3].
Below is a version I came up with (for the sake of readability I'm using u32
for T
and thus omitting all the HashSet
trait bounds):
use std::collections::HashSet;
use std::sync::Arc;
use std::sync::atomic::{AtomicBool, Ordering::*};
use parking_lot::Mutex;
pub struct SharedSet {
set: Arc<Mutex<HashSet<u32>>>,
is_empty: Arc<AtomicBool>,
}
impl SharedSet {
pub fn insert(&self, value: u32) {
let mut set = self.set.lock();
self.is_empty.store(false, SeqCst);
set.insert(value);
}
/// Returns true if the value was present in the set.
pub fn remove(&self, value: &u32) -> bool {
if self.is_empty.load(Relaxed) {
false
} else {
self.remove_slow(value)
}
}
fn remove_slow(&self, value: &u32) -> bool {
let mut set = self.set.lock();
let contained = set.remove(value);
self.is_empty.store(set.is_empty(), Release);
contained
}
}
Here are my thought processes on each part:
insert
and SeqCst
pub fn insert(&self, value: u32) {
let mut set = self.set.lock();
self.is_empty.store(false, SeqCst);
set.insert(value);
}
This sets is_empty
to false before inserting the value. The intuition, and how I initially wrote it, was to set the flag afterwards:
set.insert(value);
self.is_empty.store(false, Release);
However, I believe this would allow for a situation in which remove
from another thread could check is_empty
between set.insert
and is_empty.store
, resulting in it incorrectly taking the fast path. Essentially, setting the flag before insertion prevents "false positives" (believing the set is empty when it is not) which are unacceptable for the application, in exchange for "false negatives" (believing the set is not empty when it could be) which would only result in remove
taking the slow (albeit correct) path.
My question is: is the use of SeqCst
necessary here to prevent the aforementioned "false positives", i.e. to prevent the set from being populated when is_empty
is still true
, which, if I interpret this correctly, could theoretically happen due to compiler or hardware reordering? I understand that a Release
store ensures operations before it are visible ("happen-before") to another thread (after a corresponding Acquire
write), but I think here I am trying to achieve the somewhat opposite: ensuring set.insert
is always after the flag is set ("NOT happen-before"?) and not somehow flipped, however unlikely that might be, and I don't see how this could be done with any other ordering[4]. Meanwhile, a lot of places seem to suggest that SeqCst
is rarely necessary, hence why I'm so iffy about this.
remove
and Relaxed
pub fn remove(&self, value: &u32) -> bool {
if self.is_empty.load(Relaxed) {
false
} else {
self.remove_slow(value)
}
}
This does a Relaxed
load to determine whether to use the fast path or not. My reasoning for not using a stronger ordering is that, because insert
and remove_slow
already use stronger ordering to guarantee that the value of is_empty
is logically correct with respect to the state of set
, a Relaxed
read is allowed here, the worst-case scenario being the occasional false negatives, is this correct?
Any resources, and critique on how I use the terminologies, are also appreciated. Thanks!
The actual problem: this is a packet processing program and I need to track certain packets to prevent them from being processed more than once, except it is not possible to modify or tag them, so I decide to temporarily store seen packets in this set and compare subsequent packets with them. âŠī¸
I titled this "premature optimization." I doubt that lock contention is going to be that much noticeable during real-world use. I am also aware of mature libs that implement concurrent sets/maps, so this is more about me trying to understand atomics. Still, during benchmark, when running remove
in a hot loop across multiple threads, in the best case scenario (set is empty for all but the first check), the optimized version takes 4% of wall time to complete compared to the mutex version. âŠī¸
RwLock
didn't make sense at the time because if an item is indeed in the set after checking, the lock still needs to be write
for the item to be removed. Although given the usage pattern, one could argue that promoting a read
to a write
in such cases would still eliminate most of the contention and will likely suffice. âŠī¸
I did also ask Claude Sonnet and GPT-5 about this, except neither conversation felt trustworthy. ChatGPT did suggest that is_empty.store(false, SeqCst)
be replaced with an acquire fence: is_empty.store(false, Release); fence(Acquire)
, but I don't see how this is correct at all. âŠī¸
3 posts - 3 participants
đˇī¸ Rust_feed