Need help finding thread starvation source
⚓ Rust 📅 2025-12-13 👤 surdeus 👁️ 2I have a service that experiences severe thread starvation, but can't seem to point a finger on the reason for that, following are the details.
The service consists of the following parts, mainly:
tokioruntime without specified flavour, but I guess it's single-threaded because of k8s setup (setting higher worker thread count manually did not remove all starvation, but I didn't check if it improved it)- Running in kid with less than 1 CPU limits, but CPU consumption is usually less than one percent of requests or limits
- Several
rdkafkaconsumers for separate brokers - Single
flumempmc channel that was supposed to allow for parallel processing - Single consumer from that channel, for some messages this consumer makes network requests with
reqwest - Several
rdkafkaproducers that publish all of the results from (5)
What I tried:
- Added measurements for all the futures in my code, it shows that many of them take longer than 0.1ms to complete, but practically none show more than 3ms
- Measured poll delays in the same wrapper, those show that futures regularly take more than 1 second, sometimes more than 10 and upwards to more than 1000 in some cases
- Used
tokio-consoleto try to find very busy tasks, but the most busy ones had 30s Busy vs 2h Idle, and were axum serving Prometheus metrics. tokio-consolepointed to blocking DNS resolution that takes about 200ms half of the time, but afaik reqwest does that resolution in a separate thread not to block everything
Right now I'm trying to log when every future wrapper finishes poll and connect it to the moment other futures are woken at the same moment, but this feels like it's a bit wrong direction, since full traces from tokio show very little wake ups during the moments when many futures wait for waking, it feels like that should be some task that have a massive poll duration but there are none.
An example would be when at a time T1 several future wrappers report that they were not polled for 3 seconds, but TRACE logs for previous 3 seconds only show several wake ups for tasks with long numerical IDs (I also can't connect those to my tasks, because the latter seem to have short IDs in two digits, not in seven or more). I can try to provide cleaned logs if needed, but right now I'm in need of advice on how to proceed.
Now I also think that long wait for being polled might be ok for IO operations, but the main symptom that started all of my investigation was that network interaction sometimes behaves like that after I wrapped networking in tokio::time::timeout:
- Connection is created,
awaitpoint is reached - Timeout of 30 seconds is reached
- Only then a poll of connection happens, some data is transmitted and received
- Body is about to be read from the result, but that's another
awaitand by this time we're past timeout - We get timeout error, although we just started actual network request
There also was a suspicious threshold of 3600 seconds, at which point network interactions without timeout wrapper timed out without legible result, I suspect that's some internal max poll wait timeout or something?
1 post - 1 participant
🏷️ Rust_feed