On SpinLock performance
⚓ Rust 📅 2025-10-24 👤 surdeus 👁️ 2I was experimenting with atomic types and, by pure chance, I stumbled upon a strange way to improve the performance of my naive spin lock, (and by quite a margin >10x on a couple of machines I tested): simply adding a sleep. I’m sure this is a known phenomenon, but I couldn’t find much about it, which is why I’m posting here.
From what I’ve read, spinlocks are sometimes frowned upon, but I wrote this as an MVP to show what I’m talking about, so, as a disclaimer, this isn’t representative of my real use case; . Anyway, I’ve included my implementation and the tests I’ve been using to benchmark it.
use std::{
cell::UnsafeCell,
marker::PhantomData,
ops::{Deref, DerefMut},
sync::atomic::{AtomicBool, Ordering},
time::Duration,
};
struct SpinLock<T> {
inner: UnsafeCell<T>,
locked: AtomicBool,
with_sleep: bool,
}
impl<T> SpinLock<T> {
pub fn new(inner: T, with_sleep: bool) -> Self {
Self {
inner: UnsafeCell::new(inner),
locked: AtomicBool::new(false),
with_sleep,
}
}
pub fn lock(&self) -> LockGuard<'_, T> {
while self.locked.swap(true, Ordering::Acquire) {
if self.with_sleep {
std::thread::sleep(Duration::from_nanos(1));
}
std::hint::spin_loop();
}
LockGuard {
lock: self,
_p: PhantomData,
}
}
}
unsafe impl<T> Sync for SpinLock<T> where T: Send {}
struct LockGuard<'a, T> {
lock: &'a SpinLock<T>,
_p: PhantomData<&'a mut T>,
}
impl<T> Deref for LockGuard<'_, T> {
type Target = T;
fn deref(&self) -> &Self::Target {
unsafe { &*self.lock.inner.get() }
}
}
impl<T> DerefMut for LockGuard<'_, T> {
fn deref_mut(&mut self) -> &mut Self::Target {
unsafe { &mut *self.lock.inner.get() }
}
}
impl<T> Drop for LockGuard<'_, T> {
fn drop(&mut self) {
self.lock.locked.store(false, Ordering::Release);
}
}
#[cfg(test)]
mod nascar {
use crate::lock::SpinLock;
use std::sync::Mutex;
const THREADS: usize = 4;
const ITERATIONS: usize = 300000;
#[test]
fn spinlock_slow() {
let q = SpinLock::new(Box::new(0), false);
std::thread::scope(|s| {
for _ in 0..THREADS {
s.spawn(|| {
for _ in 0..ITERATIONS {
**q.lock() += 1;
}
});
}
});
assert_eq!(**q.lock(), ITERATIONS * THREADS);
}
#[test]
fn spinlock_fast() {
let q = SpinLock::new(Box::new(0), true);
std::thread::scope(|s| {
for _ in 0..THREADS {
s.spawn(|| {
for _ in 0..ITERATIONS {
**q.lock() += 1;
}
});
}
});
assert_eq!(**q.lock(), ITERATIONS * THREADS);
}
#[test]
fn mutex() {
let q = Mutex::new(Box::new(0));
std::thread::scope(|s| {
for _ in 0..THREADS {
s.spawn(|| {
for _ in 0..ITERATIONS {
**q.lock().unwrap() += 1;
}
});
}
});
assert_eq!(**q.lock().unwrap(), THREADS * ITERATIONS);
}
}
Is the implementation correct or is this a fluke? Why does that happens, and does it happens on all architectures (mine is x86_64)? If inserting a sleep makes it faster, is there an heuristic on how to pick a value for the sleep? is there any other mechanism to achieve a better result other than sleep? (e.g. I tried a short loop followed by yield_now() but the result wasn't impressive, and also _mm_pause() with slightly better results)
Any feedback is appreciated ![]()
2 posts - 2 participants
🏷️ Rust_feed