Info
This post is auto-generated from RSS feed The Rust Programming Language Forum - Latest topics. Source: Streaming CSV allocates too much memory
I have a problem with my containers OOMing in production.
What seems to be the problem is that streaming data via reqwest and processsing through csv_async allocates memory for the full file being read:
let resp = client
.get(&url)
.send_with_digest_auth(&self.i.config.credentials.user, &self.i.config.credentials.password)
.await
.with_context(|| format!("get feed: '{url}'"))?
.error_for_status()
.with_context(|| format!("get feed status: '{url}'"))?;
let r = StreamReader::new(resp.bytes_stream().map_err(std::io::Error::other));
let br = BufReader::new(r);
let mut dec = GzipDecoder::new(br);
dec.multiple_members(true);
let mut reader = csv_async::AsyncReaderBuilder::new()
.delimiter(b',')
.double_quote(true)
.has_headers(true)
.create_reader(dec);
while let Some(record) = records.next().await {
// ... processing data
}
Even if I donโt do any processing, the full file size of the file being read is allocated in memory, which can be tens of GiBs.
I need to change this so only the โcurrent windowโ of the stream is allocated (which is what I had expected to begin with).
Do you know why this happens and how to fix it?
Imports:
use async_compression::tokio::bufread::GzipDecoder;
use csv_async::StringRecord;
use diqwest::WithDigestAuth;
use futures_util::TryStreamExt;
use tokio::io::BufReader;
use tokio_util::io::StreamReader;
Is there a way to find out which layer of the streaming causes the memory to grow?
2 posts - 2 participants
๐ท๏ธ Rust_feed