Streaming CSV allocates too much memory

โš“ Rust    ๐Ÿ“… 2025-08-08    ๐Ÿ‘ค surdeus    ๐Ÿ‘๏ธ 6      

surdeus

I have a problem with my containers OOMing in production.

What seems to be the problem is that streaming data via reqwest and processsing through csv_async allocates memory for the full file being read:

		let resp = client
			.get(&url)
			.send_with_digest_auth(&self.i.config.credentials.user, &self.i.config.credentials.password)
			.await
			.with_context(|| format!("get feed: '{url}'"))?
			.error_for_status()
			.with_context(|| format!("get feed status: '{url}'"))?;
		let r = StreamReader::new(resp.bytes_stream().map_err(std::io::Error::other));
		let br = BufReader::new(r);
		let mut dec = GzipDecoder::new(br);
		dec.multiple_members(true);
		let mut reader = csv_async::AsyncReaderBuilder::new()
			.delimiter(b',')
			.double_quote(true)
			.has_headers(true)
			.create_reader(dec);
		while let Some(record) = records.next().await {
		    // ... processing data
		}

Even if I donโ€™t do any processing, the full file size of the file being read is allocated in memory, which can be tens of GiBs.

I need to change this so only the โ€œcurrent windowโ€ of the stream is allocated (which is what I had expected to begin with).

Do you know why this happens and how to fix it?

Imports:

use async_compression::tokio::bufread::GzipDecoder;
use csv_async::StringRecord;
use diqwest::WithDigestAuth;
use futures_util::TryStreamExt;
use tokio::io::BufReader;
use tokio_util::io::StreamReader;

Is there a way to find out which layer of the streaming causes the memory to grow?

2 posts - 2 participants

Read full topic

๐Ÿท๏ธ Rust_feed