Taskvisor 0.2: event-driven task supervision for Tokio (restart policies, backoff, event bus)

โš“ Rust    ๐Ÿ“… 2026-06-11    ๐Ÿ‘ค surdeus    ๐Ÿ‘๏ธ 1      

surdeus

Hi all!
I just released taskvisor 0.2.1 and this is its first public announcement.

taskvisor is a small library on top of Tokyo (no unsafe, no heavy deps) that runs your background tasks, restarts them according to a per-task policy, and publishes a structured event for every lifecycle step.

What you get:

  • Restart policies as data: Never, OnFailure, Always { interval } per task;
  • Backoff with jitter: exponential / constant; Full / Equal / Decorrelated jitter;
  • A lifecycle event bus: implement one trait method on_event for metrics, alerts, logging where each subscriber gets its own bounded queue, slow subscribers never block the runtime;
  • Panics are supervised too: a panicking task is caught, surfaced as a failure event, and retried per policy (it won't take down the process or leak)
  • Dynamic management: add / cancel / remove tasks at runtime, addressed by a runtime TaskId
  • Optional admission control (feature: controller): named slots with Queue, Replace, DropIfRunning policies
  • Graceful shutdown with a grace period, then force-abort of stragglers.

Example:
A flaky task that fails twice and then succeeds with a subscriber printing what the supervisor does:

use std::sync::Arc;
use std::sync::atomic::{AtomicU32, Ordering};

use taskvisor::prelude::*;

struct Printer;
impl Subscribe for Printer {
    fn on_event(&self, ev: &Event) {
        if let Some(task) = ev.task.as_deref() {
            println!("  {:?} (task={task})", ev.kind);
        }
    }
}

#[tokio::main(flavor = "current_thread")]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let attempts = Arc::new(AtomicU32::new(0));
    let flaky: TaskRef = TaskFn::arc("flaky", move |_ctx| {
        let attempts = Arc::clone(&attempts);
        async move {
            if attempts.fetch_add(1, Ordering::Relaxed) < 2 {
                Err(TaskError::Fail { reason: "boom".into(), exit_code: None })
            } else {
                Ok(())
            }
        }
    });

    let spec = TaskSpec::restartable(flaky);
    Supervisor::new(SupervisorConfig::default(), vec![Arc::new(Printer)])
        .run(vec![spec])
        .await?;
    Ok(())
}
TaskAddRequested -> TaskAdded ->TaskStarting -> TaskFailed -> BackoffScheduled -> TaskStarting -> TaskFailed -> BackoffScheduled -> TaskStarting -> TaskStopped -> ActorExhausted -> TaskRemoved

What it is not:

  • not an actor framework
  • not a job queue
  • not a tower replacement

Where it fits:
taskvisor is for long-running services that own a set of resident background tasks.
The things that must be running the whole time the process is up:

  • queue consumers;
  • pollers;
  • sync loops;
  • connection keepers;
  • periodic jobs;
  • embedded workers.

If a task dies, you want it restarted with backoff;
if it misbehaves, you want to see it (metrics, alerts);
if the set changes at runtime, you want to add and remove tasks without restarting the service.
That's the niche.

Links:
crates.io ยท docs.rs ยท github ยท examples


193 tests, #![forbid(unsafe_code)], MSRV 1.90.

taskvisor is the supervision core of a larger toolkit I'm building (subprocess execution, HTTP/gRPC control plane), but it stands on its own.

I'd especially appreciate feedback on the SupervisorHandle API (TaskId-based addressing) and the slot/admission-control design.

Thanks!

1 post - 1 participant

Read full topic

๐Ÿท๏ธ Rust_feed