BitWheel: A hierarchical timer wheel for low-latency event loops
⚓ Rust 📅 2025-12-15 👤 surdeus 👁️ 1I've been working on a custom runtime for trading systems and needed a timer driver that could handle high-frequency insert/cancel cycles with predictable latency. The standard approaches (BTreeMap, cascading, etc) all had characteristics that didn't quite fit, so I built BitWheel.
The problem
Event-driven systems often need to schedule a procedure to fire after some delay—then cancel it when a subsequent event makes it irrelevant. Think request timeouts, debouncing, or deferred cleanup. The typical pattern is: insert timer, wait, cancel timer because something else happened first. High insert/cancel churn with relatively few timers actually firing.
For this workload, tail latency matters more than throughput. A p99.9 spike during a burst can cascade into missed deadlines elsewhere in your system. Two things hurt tails here:
Allocations: If every insert or cancel touches the allocator, you're at the mercy of its latency distribution. BTreeMap gives you O(log n) operations, but each one may allocate or rebalance.
Cascading: Traditional hierarchical timer wheels move timers from coarser to finer gears as their deadline approaches. When many timers cascade simultaneously (e.g., a batch of requests all timing out together), you get latency spikes in your poll path.
BitWheel avoids both: pre-allocated slots with no cascading between gears.
Design decisions
No cascading: Most timer wheels move timers from coarser to finer gears as their deadline approaches. BitWheel doesn't—timers stay in their original slot until they fire. The tradeoff is precision: a timer in a higher gear (longer delay) has coarser resolution. A 1-hour timer might fire within a ~64ms window. For most applications, this is fine.
Fixed-capacity slots: Each slot is a pre-allocated slab with intrusive linked lists. Insert and remove are O(1) with no allocator calls. When a slot fills, we linear-probe to adjacent slots. This trades memory for determinism.
Insert can fail: Unlike most timer implementations, insert() returns Result. If your slots overflow, that's a signal to apply backpressure—your configuration doesn't match your workload. For systems where this is unacceptable, BitWheelWithFailover catches overflow in a BTreeMap.
Skip-ahead: The wheel tracks the next fire time globally. Empty polls jump directly there instead of iterating idle slots.
Performance
Tested against BTreeMap and hierarchical_hash_wheel_timer on a realistic trading workload (80% inserts, 5% cancels, continuous polling):
| Operation | BitWheel | BitWheel+Failover | BTreeMap | HHWT |
|---|---|---|---|---|
| Insert p50 | 26 ns | 27 ns | 42 ns | 28 ns |
| Insert p99.9 | 61 ns | 66 ns | 79 ns | 87 ns |
| Poll p50 | 15 ns | 17 ns | 17 ns | 19 ns |
| Poll p99.9 | 61 ns | 65 ns | 149 ns | 1515 ns |
| Cancel p50 | 14 ns | 14 ns | 54 ns | 20 ns |
| Cancel p99.9 | 33 ns | 36 ns | 156 ns | 45 ns |
The HHWT poll tail is the cascading cost—when timers migrate between gears, you pay for it.
Usage
use bitwheel::{BalancedWheel, Timer};
use std::time::{Duration, Instant};
struct OrderTimeout { order_id: u64 }
impl Timer for OrderTimeout {
type Context = Vec<u64>;
fn fire(&mut self, _now: Instant, ctx: &mut Self::Context) -> Option<Instant> {
ctx.push(self.order_id);
None // one-shot
}
}
let mut wheel: Box<BalancedWheel<OrderTimeout>> = BalancedWheel::boxed();
// Insert
let handle = wheel.insert(Instant::now() + Duration::from_millis(100),
OrderTimeout { order_id: 123 })?;
// Cancel (order filled)
wheel.cancel(handle);
// Poll in event loop
let mut fired = Vec::new();
wheel.poll(Instant::now(), &mut fired)?;
The wheel is generic over a Timer trait. Returning Some(next_instant) from fire() reschedules the timer (useful for periodic heartbeats).
When to use this
BitWheel is designed for single-threaded event loops where you control the poll cadence. It's a good fit if:
- You're building a custom runtime and need a timer driver
- You have high insert/cancel churn (order timeouts, request deadlines)
- Tail latency matters more than memory footprint
- You can tolerate coarser precision for long-delay timers
BitWheel has no internal synchronization, but that doesn't preclude use in multi-threaded systems. The typical pattern is one wheel per thread via thread-local storage—each thread owns its wheel exclusively, no locking required.
What BitWheel doesn't provide is a thread-safe API for cross-thread timer scheduling. If you need to schedule a timer from thread A to fire on thread B, you'll need your own channel or queue in front of it.
Links
Feedback welcome—particularly on the API design and whether the tradeoffs make sense for your use cases.
1 post - 1 participant
🏷️ Rust_feed