New serde deserialization framework for YAML data that parses YAML into Rust structures without building syntax tree

⚓ Rust    📅 2025-09-30    👤 surdeus    👁️ 13      

surdeus

Warning

This post was published 111 days ago. The information described in this article may have changed.

I am pleased to share a new library, serde-saphyr, for using the Serde serialization framework with YAML data. It takes a different approach from the classic serde-yaml, which is no longer maintained and has been much discussed here.

The main difference from other YAML implementations (including my own serde-yaml-bw) is that this library parses YAML directly into Rust data structures without first building an intermediate syntax tree. There is no generic Value type. Parsing happens only once the target Rust type is known (effectively making the type definition the schema). This approach required special handling of YAML anchors: anchored sequences are “replayed” by retaining their parser events. Merge keys, however, turned out to be more problematic. I found no practical way to implement them, and since in YAML 1.2 they are considered optional and even deprecated, I decided not to support them.

The library also relies on saphyr-parser, a pure Rust YAML parser. This removes the dependency on translated C code with its unavoidable unsafe constructs. While unsafe-libyaml generally works well, extended fuzz testing showed it can stall when faced with large amounts of crafted “pseudo-YAML” input. By contrast, saphyr handles these cases much more robustly. These findings also led me to adopt saphyr as a front-line preparser for serde-yaml-bw.

Why this approach?

  • Light on resources: With almost no intermediate allocations, parsing is more efficient—especially if anchors are used only sparingly.
  • Simpler: No need to maintain code for handling all possible Value variants.
  • Type-driven parsing: Input that does not match the expected Rust types is rejected immediately.
  • Safer by construction: No dynamic “any” objects; the common YAML-based code-execution exploits are not applicable.

Benchmarking

Parsing generated YAML, file size 25.00 MiB, release build:

Crate Time (ms) Notes
serde-saphyr 290.54 No unsafe, no unsafe-libyaml
serde-yaml-ng 470.72
serde-yaml 477.33 Original, deprecated, repo archived
serde-norway 479.57
serde-yml 490.92 Repo archived
serde-yaml_bw 702.99 Slower because Saphyr does a budget check upfront before calling libyaml

Execution time varied slightly between runs; I ran the benchmarks multiple times and recorded values that looked representative.

This library will not be a fit for every use case (parser only, no merge keys, and smallvec 2.0.0 it uses is still alpha), but if you prefer a type-driven, low-overhead design without a Value abstraction, it may be worth trying out. Feedback and contributions are welcome.

1 post - 1 participant

Read full topic

🏷️ Rust_feed