Getting rustc to allocate a particular local in a register

โš“ Rust    ๐Ÿ“… 2025-07-29    ๐Ÿ‘ค surdeus    ๐Ÿ‘๏ธ 11      

surdeus

Warning

This post was published 176 days ago. The information described in this article may have changed.

I have a problem where a tight loop in Rust compiles worse than the C++ loop it competes with. This results in over 2x execution time compared to C++ with the kind of input that stays in the tight loop.

Specifically, the Rust loop from icu4x/components/normalizer/src/lib.rs at 932aaa3f6b7afe322c305d124fe5c237f008be6d ยท unicode-org/icu4x ยท GitHub back to line 2537 compiles worse than the loop normalizer2impl.cpp - mozsearch . See 1932875 - (icu_normalizer) Use ICU4X for str_normalize for the instructions that these compile to.

It appears that rustc puts composition_passthrough_bound on the stack while clang puts minNoMaybeCP in a register, which is probably the main explanation for the performance difference.

Additionally, the Rust instruction sequence is one instruction longer (9 instead of 8) due to rustc using setne as part of the check for reaching the end of the input. (It seems that simpler iteration over &[u16] doesn't result in this setne + lea pattern.)

What the appropriate way to guide rustc to put composition_passthrough_bound in a register?

1 post - 1 participant

Read full topic

๐Ÿท๏ธ Rust_feed