Tokio's `block_in_place` and deferred wakers

⚓ Rust    📅 2026-02-02    👤 surdeus    👁️ 8      

surdeus

Hi, everyone.

So, in the project I'm working on there is a bunch of long-running "subsystem" tasks, each of which looks like this (pseudo-code):

let subsystem_object = ...;
loop {
    select! {
        biased;

        result = (&mut shutdown_rx) => {
            // shutdown_rx is a one-shot receiver indicating
            // that the app is being shut down.
            break;
        }
        
        Some(action) = action_rx.recv() => {
            // "action_rx" is mpsc::UnboundedReceiver.
            // "action" contains a closure (returning a future) to run on
            // subsystem_object and a one-shot sender to send the result
            // back to the "caller".
            run the closure on subsystem_object, await, and send the result.
        }
    }
}

Normally, a "subsystem object" is async, but there are 2 subsystems, A and B, in which they're sync.
A's object needs to call B, so it sends an action to B's channel as usual and then waits for the result via block_in_place.

I know this is not pretty but IMO it should work - the blocking call is only made in one direction, from A to B, and there exists at most one such call at a time.
However under a particular scenario (particularly slow machine under heavy load where other subsystems bombard B with actions to perform) B stalls completely and remains in this state indefinitely until shutdown is initiated.

When it happens I can see the following:

  1. B is not coping well with the load, e.g. its action_rx usually contains over a dozen unprocessed actions.
  2. Right before calling block_in_place, A switches to the same thread where B is running.
  3. After A calls block_in_place, B is never polled again, until shutdown.

After browsing Tokio (1.49) source code for a while I think the following might be happening under the hood:

  1. B's budget is exhausted at some point, so when it's polled, select!'s future returns Pending and the waker is deferred.
  2. When A calls block_in_place, the current "worker core" gets stolen and put on another thread via spawn_blocking. But the deferred waker stays on the current thread and it's only triggered once the blocking call completes.
  3. Since B's action channel already had a bunch of items in it when it was last checked, it didn't register a waker, so sending new items to the channel won't wake it up.
  4. A's call to block_in_place can only finish when B handles the action, but B won't be informed of new actions until block_in_place finishes, because A and B are both on the same thread.

The stalling goes away if I wrap the whole select! in a call to task::unconstrained.

So my questions are:

  • First of all, is my understanding of the Tokio machinery correct?

  • If so, can the described behavior be considered a bug that should be reported? (e.g. perhaps the deferred waker should be triggered before doing the blocking call).

  • Is there a better workaround than using unconstrained?
    I mean, I understand that the correct solution is to at least make A async and use spawn_blocking instead of block_in_place (or ideally make them both async and get rid of the blocking calls altogether), but this will be a major refactoring.
    So I'm looking for an "easy" workaround at this moment.

2 posts - 2 participants

Read full topic

🏷️ Rust_feed