Absolutely absurd bench results with fat LTO
â Rust đ 2026-06-13 đ¤ surdeus đī¸ 1I'm seeing impossible benchmark improvement when using fat LTO, improvement from circa 50 ns / iter to 0.25 ns per iteration. My best guess is some time computation error. I don't know how to verify assemblies for actual code. (âĨīšâĨ)
I prepared repository so anybody captivated can have a look.
Release: Default
cargo bench --test bench --profile release
running 2 tests
test gcd_naive_2_test ... bench: 44.29 ns/iter (+/- 6.54)
test gcd_naive_test ... bench: 54.92 ns/iter (+/- 3.56)
[profile.bench]
opt-level = 3
lto = "fat"
debug = false
split-debuginfo = 'off'
strip = "none"
debug-assertions = false
overflow-checks = false
panic = 'abort'
incremental = false
codegen-units = 16
rpath = false
cargo bench --test bench --profile bench
test gcd_naive_2_test ... bench: 0.25 ns/iter (+/- 0.01)
test gcd_naive_test ... bench: 0.25 ns/iter (+/- 0.01)
Benchmarked code in question is some recursive binary GCD implementation.
4 posts - 3 participants
đˇī¸ Rust_feed