Premature optimization — trying to understand atomics and ordering

⚓ Rust 📅 2025-09-03 👤 surdeus 👁️ 10

Warning

This post was published 90 days ago. The information described in this article may have changed.

Info

This post is auto-generated from RSS feed The Rust Programming Language Forum - Latest topics. Source: Premature optimization — trying to understand atomics and ordering

This is the first time I'm using atomics in any language. I'm hoping to receive some feedback on how I understand them.

Literature that I've already read:

Rust Atomics and Locks (https://marabos.nl/atomics/), chapters 1-3
std::memory_order - cppreference.com
Preshing's The Happens-Before Relation, The Synchronizes-With Relation, and Memory Barriers Are Like Source Control Operations

In my application I found the need for a set data structure shared between threads^[1]. The set only needs to support insert and remove (remove for existence checking, each item only needs to be checked once).

In its current form it's an Arc<Mutex<HashMap<T>>>. In actual usage, I found that the set is going to be empty most of the time. This leads me to think that, if I could separately track whether the set is empty, most of the checking could perhaps be done without locking and therefore be faster^[2]^[3].

Below is a version I came up with (for the sake of readability I'm using u32 for T and thus omitting all the HashSet trait bounds):

use std::collections::HashSet;
use std::sync::Arc;
use std::sync::atomic::{AtomicBool, Ordering::*};

use parking_lot::Mutex;

pub struct SharedSet {
    set: Arc<Mutex<HashSet<u32>>>,
    is_empty: Arc<AtomicBool>,
}

impl SharedSet {
    pub fn insert(&self, value: u32) {
        let mut set = self.set.lock();
        self.is_empty.store(false, SeqCst);
        set.insert(value);
    }

    /// Returns true if the value was present in the set.
    pub fn remove(&self, value: &u32) -> bool {
        if self.is_empty.load(Relaxed) {
            false
        } else {
            self.remove_slow(value)
        }
    }

    fn remove_slow(&self, value: &u32) -> bool {
        let mut set = self.set.lock();
        let contained = set.remove(value);
        self.is_empty.store(set.is_empty(), Release);
        contained
    }
}

Here are my thought processes on each part:

`insert` and `SeqCst`

pub fn insert(&self, value: u32) {
    let mut set = self.set.lock();
    self.is_empty.store(false, SeqCst);
    set.insert(value);
}

This sets is_empty to false before inserting the value. The intuition, and how I initially wrote it, was to set the flag afterwards:

set.insert(value);
self.is_empty.store(false, Release);

However, I believe this would allow for a situation in which remove from another thread could check is_empty between set.insert and is_empty.store, resulting in it incorrectly taking the fast path. Essentially, setting the flag before insertion prevents "false positives" (believing the set is empty when it is not) which are unacceptable for the application, in exchange for "false negatives" (believing the set is not empty when it could be) which would only result in remove taking the slow (albeit correct) path.

My question is: is the use of SeqCst necessary here to prevent the aforementioned "false positives", i.e. to prevent the set from being populated when is_empty is still true, which, if I interpret this correctly, could theoretically happen due to compiler or hardware reordering? I understand that a Release store ensures operations before it are visible ("happen-before") to another thread (after a corresponding Acquire write), but I think here I am trying to achieve the somewhat opposite: ensuring set.insert is always after the flag is set ("NOT happen-before"?) and not somehow flipped, however unlikely that might be, and I don't see how this could be done with any other ordering^[4]. Meanwhile, a lot of places seem to suggest that SeqCst is rarely necessary, hence why I'm so iffy about this.

`remove` and `Relaxed`

pub fn remove(&self, value: &u32) -> bool {
    if self.is_empty.load(Relaxed) {
        false
    } else {
        self.remove_slow(value)
    }
}

This does a Relaxed load to determine whether to use the fast path or not. My reasoning for not using a stronger ordering is that, because insert and remove_slow already use stronger ordering to guarantee that the value of is_empty is logically correct with respect to the state of set, a Relaxed read is allowed here, the worst-case scenario being the occasional false negatives, is this correct?

Any resources, and critique on how I use the terminologies, are also appreciated. Thanks!

The actual problem: this is a packet processing program and I need to track certain packets to prevent them from being processed more than once, except it is not possible to modify or tag them, so I decide to temporarily store seen packets in this set and compare subsequent packets with them. ↩︎
I titled this "premature optimization." I doubt that lock contention is going to be that much noticeable during real-world use. I am also aware of mature libs that implement concurrent sets/maps, so this is more about me trying to understand atomics. Still, during benchmark, when running remove in a hot loop across multiple threads, in the best case scenario (set is empty for all but the first check), the optimized version takes 4% of wall time to complete compared to the mutex version. ↩︎
RwLock didn't make sense at the time because if an item is indeed in the set after checking, the lock still needs to be write for the item to be removed. Although given the usage pattern, one could argue that promoting a read to a write in such cases would still eliminate most of the contention and will likely suffice. ↩︎
I did also ask Claude Sonnet and GPT-5 about this, except neither conversation felt trustworthy. ChatGPT did suggest that is_empty.store(false, SeqCst) be replaced with an acquire fence: is_empty.store(false, Release); fence(Acquire), but I don't see how this is correct at all. ↩︎

3 posts - 3 participants

Read full topic

🏷️ Rust_feed

👍 󠁮󠁮󠁮󠁮 👎 󠁮󠁮󠁮󠁮

insert and SeqCst

remove and Relaxed

`insert` and `SeqCst`

`remove` and `Relaxed`