๐Ÿงต Stringlet UTF-8 Hack Option Niche?

โš“ Rust    ๐Ÿ“… 2026-05-14    ๐Ÿ‘ค surdeus    ๐Ÿ‘๏ธ 1      

surdeus

With the release of Stringlet 0.10, Iโ€™ve looked at Option and Result niche optimization. Alas I didnโ€™t find anything that seems applicable here.

Iโ€™ve come up with a scheme that would work for (all inline and if needed, other) strings: According to the UTF-8 standard no byte may currently (and maybe forever) be 0b1111_1xxx. That gives eight possible niche values in the first byte of non-zero sized UTF-8 byte arrays. Any way of expressing this to the compiler would be highly welcome!

I have another related need. I want to introduce a comfort wrapper unifying all kinds. Each being repr(C), the enum would implicitly (or if need be, explicitly) also be. So each, and thus the enum, would share the above niche:

enum Stringlet<const SIZE: usize> {
    Fixed(FixedStringlet<SIZE>),
      Var(  VarStringlet<SIZE>),
     Trim( TrimStringlet<SIZE>),
     Slim( SlimStringlet<SIZE>),
}

Here only VarStringlet stores the actual length in one extra byte. Since stringlets will rarely be 255 bytes big, that leaves room to store the discriminator. (Currently SlimStringlet, and hence the whole enum, is even capped at size 64, but Iโ€™m looking at relaxing that.) Again any way of expressing this to the compiler would be highly welcome!

11 posts - 5 participants

Read full topic

๐Ÿท๏ธ Rust_feed