Crate: An XML / XHTML parser

⚓ Rust    📅 2025-07-28    👤 surdeus    👁️ 12      

surdeus

Warning

This post was published 128 days ago. The information described in this article may have changed.

This is a simple, fast XML/XHTML parser that constructs a read-only tree structure similar to a DOM from an Vec XML/XHTML file representation.

Loosely based on the PUGIXML parsing method and structure that is described here, it is an in-place parser: all strings are kept in the received Vec for which the parser takes ownership. Its content is modified to expand entities to their UTF-8 representation (in attribute values and PCData). Position index of elements is preseved in the vector. Tree nodes are kept to their minimum size for low-memory-constrained environments. A single pre-allocated vector contains all the nodes of the tree. Its maximum size depends on the xxx_node_count feature selected.

The parsing process is limited to normal tags, attributes, and PCData content. No processing instruction (<? .. ?>), comment (), CDATA (<![CDATA .. ]]>), DOCTYPE (), or DTD inside DOCTYPE ([ ... ]) is retrieved. Basic validation is done to the XHTML structure to ensure content coherence.

You can find it here.

1 post - 1 participant

Read full topic

🏷️ Rust_feed