BitWheel: A hierarchical timer wheel for low-latency event loops

⚓ Rust 📅 2025-12-15 👤 surdeus 👁️ 1

Info

This post is auto-generated from RSS feed The Rust Programming Language Forum - Latest topics. Source: BitWheel: A hierarchical timer wheel for low-latency event loops

I've been working on a custom runtime for trading systems and needed a timer driver that could handle high-frequency insert/cancel cycles with predictable latency. The standard approaches (BTreeMap, cascading, etc) all had characteristics that didn't quite fit, so I built BitWheel.

The problem

Event-driven systems often need to schedule a procedure to fire after some delay—then cancel it when a subsequent event makes it irrelevant. Think request timeouts, debouncing, or deferred cleanup. The typical pattern is: insert timer, wait, cancel timer because something else happened first. High insert/cancel churn with relatively few timers actually firing.

For this workload, tail latency matters more than throughput. A p99.9 spike during a burst can cascade into missed deadlines elsewhere in your system. Two things hurt tails here:

Allocations: If every insert or cancel touches the allocator, you're at the mercy of its latency distribution. BTreeMap gives you O(log n) operations, but each one may allocate or rebalance.

Cascading: Traditional hierarchical timer wheels move timers from coarser to finer gears as their deadline approaches. When many timers cascade simultaneously (e.g., a batch of requests all timing out together), you get latency spikes in your poll path.

BitWheel avoids both: pre-allocated slots with no cascading between gears.

Design decisions

No cascading: Most timer wheels move timers from coarser to finer gears as their deadline approaches. BitWheel doesn't—timers stay in their original slot until they fire. The tradeoff is precision: a timer in a higher gear (longer delay) has coarser resolution. A 1-hour timer might fire within a ~64ms window. For most applications, this is fine.

Fixed-capacity slots: Each slot is a pre-allocated slab with intrusive linked lists. Insert and remove are O(1) with no allocator calls. When a slot fills, we linear-probe to adjacent slots. This trades memory for determinism.

Insert can fail: Unlike most timer implementations, insert() returns Result. If your slots overflow, that's a signal to apply backpressure—your configuration doesn't match your workload. For systems where this is unacceptable, BitWheelWithFailover catches overflow in a BTreeMap.

Skip-ahead: The wheel tracks the next fire time globally. Empty polls jump directly there instead of iterating idle slots.

Performance

Tested against BTreeMap and hierarchical_hash_wheel_timer on a realistic trading workload (80% inserts, 5% cancels, continuous polling):

Operation	BitWheel	BitWheel+Failover	BTreeMap	HHWT
Insert p50	26 ns	27 ns	42 ns	28 ns
Insert p99.9	61 ns	66 ns	79 ns	87 ns
Poll p50	15 ns	17 ns	17 ns	19 ns
Poll p99.9	61 ns	65 ns	149 ns	1515 ns
Cancel p50	14 ns	14 ns	54 ns	20 ns
Cancel p99.9	33 ns	36 ns	156 ns	45 ns

The HHWT poll tail is the cascading cost—when timers migrate between gears, you pay for it.

Usage

use bitwheel::{BalancedWheel, Timer};
use std::time::{Duration, Instant};

struct OrderTimeout { order_id: u64 }

impl Timer for OrderTimeout {
    type Context = Vec<u64>;
    
    fn fire(&mut self, _now: Instant, ctx: &mut Self::Context) -> Option<Instant> {
        ctx.push(self.order_id);
        None // one-shot
    }
}

let mut wheel: Box<BalancedWheel<OrderTimeout>> = BalancedWheel::boxed();

// Insert
let handle = wheel.insert(Instant::now() + Duration::from_millis(100), 
                          OrderTimeout { order_id: 123 })?;

// Cancel (order filled)
wheel.cancel(handle);

// Poll in event loop
let mut fired = Vec::new();
wheel.poll(Instant::now(), &mut fired)?;

The wheel is generic over a Timer trait. Returning Some(next_instant) from fire() reschedules the timer (useful for periodic heartbeats).

When to use this

BitWheel is designed for single-threaded event loops where you control the poll cadence. It's a good fit if:

You're building a custom runtime and need a timer driver
You have high insert/cancel churn (order timeouts, request deadlines)
Tail latency matters more than memory footprint
You can tolerate coarser precision for long-delay timers

BitWheel has no internal synchronization, but that doesn't preclude use in multi-threaded systems. The typical pattern is one wheel per thread via thread-local storage—each thread owns its wheel exclusively, no locking required.

What BitWheel doesn't provide is a thread-safe API for cross-thread timer scheduling. If you need to schedule a timer from thread A to fire on thread B, you'll need your own channel or queue in front of it.

Links

Feedback welcome—particularly on the API design and whether the tradeoffs make sense for your use cases.

1 post - 1 participant

Read full topic

🏷️ Rust_feed

👍 󠁮󠁮󠁮󠁮 👎 󠁮󠁮󠁮󠁮