Info
This post is auto-generated from RSS feed The Rust Programming Language Forum - Latest topics. Source: Preserve the String object from File::lines, but only pass a slice on
Hi,
So this may very well be a recurring topic, and I apologize in advance if it has been answered many, many times before... It feels like a problem others must have stumbled upon, and maybe I should have asked before implementing my own solution from scratch
So here's the thing: I'm parsing a file with a format that is heavily line-based, and I have written a deserialization function that reads an iterator of lines, treats them as string slices, and builds a structure that may contain smaller slices of those lines. Obviously, this means that the built structure has a lifetime related to the lifetime of the input data - and therein lies the problem.
If I read the full contents of the file into memory, I have a String
object that I own, and I can pass str::lines
to my deserialization function. However, if I want to read the file line by line, e.g. from a subprocess or from a decompression function or if the file is simply a bit too large, then there seems to be a problem:
BufRead::lines
and similar methods return String
s - quite understandably! There is no pre-owned storage to point at!String
as a slice to my deserializtion function, the compiler obviously complains that the slice refers to a String
that will be dropped very, very soonString
s into an array and keep it until I'm done, but that kind of defeats the purpose of not reading the full file's contents into memory!So I guess what I'm asking for is, is there a way to preserve the contents of the String
, but let the deserialization function use it as a slice? My first thought was to use Arc
or something similar, but whatever I do in a function that handles successive lines and provides them to the iterator, any objects I create there will be dropped very, very soon, just as the String
itself.
So my solution was to go off and implement a new trait that provides a very small subset of the str
methods - just as much as I need for this particular project - and proxies them to bytes::Bytes
internal storage that is, hopefully, always a valid UTF-8-encoded string. Of course, I'm aware of a couple of serious drawbacks of this method:
Deref
for &str
, since the str
methods are defined to return str
objects, which would drop the ownership; I have to proxy them into returning Self
insteadstr
methods as I - or others - may possibly want to use at some future point, and even though the implementations are usually trivial, it still feels sort of wrongstr
methods that I simply cannot proxy in the same way, e.g. ones using the std::str::pattern::Pattern
trait, or at least I cannot implement then in a no-std library, since core::str::pattern
is still experimental #[cfg(..)]
shenanigans related to different Rust versions in the future; right now I'm really happy that str::from_utf8_unchecked()
is stable in 1.87, and that's what I'm targettingAll that said, what I have so far is the StrLike
and AdoptableStr
traits and the StrOfBytes
implementation in my - still unreleased - str-of-bytes
library; for the moment it is only used in another still unreleased module, facet-deb822
...but the main point of this post is to ask what have I missed, is there already an implementation of something like that - a struct that behaves as much like str
as possible, but retains ownership of the data? As pointed out above, implementations that provide a Deref
are not enough, since the ownership will be dropped as soon as the deserialization library invokes .split_once()
or .strip_prefix()
or something like that.
Of course, "oh come on, you're looking at it totally the wrong way! here's a much better way to do what you really need" answers will also be welcome And yes, I know how to use parser combinators, but I think that at least
nom
has the same issue - a string slice in the result has no knowledge of the memory storage it refers to.
Thanks in advance for any insights!
2 posts - 2 participants
🏷️ rust_feed