Absolutely absurd bench results with fat LTO

⚓ Rust    📅 2026-06-13    👤 surdeus    đŸ‘ī¸ 1      

surdeus

I'm seeing impossible benchmark improvement when using fat LTO, improvement from circa 50 ns / iter to 0.25 ns per iteration. My best guess is some time computation error. I don't know how to verify assemblies for actual code. (â•Ĩīšâ•Ĩ)

I prepared repository so anybody captivated can have a look.

Release: Default

cargo bench --test bench --profile release

running 2 tests
test gcd_naive_2_test ... bench:          44.29 ns/iter (+/- 6.54)
test gcd_naive_test   ... bench:          54.92 ns/iter (+/- 3.56)
[profile.bench]
opt-level = 3            
lto = "fat"
debug = false
split-debuginfo = 'off'
strip = "none"
debug-assertions = false
overflow-checks = false
panic = 'abort'
incremental = false
codegen-units = 16
rpath = false
cargo bench --test bench --profile bench

test gcd_naive_2_test ... bench:           0.25 ns/iter (+/- 0.01)
test gcd_naive_test   ... bench:           0.25 ns/iter (+/- 0.01)

Benchmarked code in question is some recursive binary GCD implementation.

4 posts - 3 participants

Read full topic

đŸˇī¸ Rust_feed