Absolutely absurd bench results with fat LTO

⚓ Rust 📅 2026-06-13 👤 surdeus 👁️ 1

Info

This post is auto-generated from RSS feed The Rust Programming Language Forum - Latest topics. Source: Absolutely absurd bench results with fat LTO

I'm seeing impossible benchmark improvement when using fat LTO, improvement from circa 50 ns / iter to 0.25 ns per iteration. My best guess is some time computation error. I don't know how to verify assemblies for actual code. (╥﹏╥)

I prepared repository so anybody captivated can have a look.

Release: Default

cargo bench --test bench --profile release

running 2 tests
test gcd_naive_2_test ... bench:          44.29 ns/iter (+/- 6.54)
test gcd_naive_test   ... bench:          54.92 ns/iter (+/- 3.56)

[profile.bench]
opt-level = 3            
lto = "fat"
debug = false
split-debuginfo = 'off'
strip = "none"
debug-assertions = false
overflow-checks = false
panic = 'abort'
incremental = false
codegen-units = 16
rpath = false

cargo bench --test bench --profile bench

test gcd_naive_2_test ... bench:           0.25 ns/iter (+/- 0.01)
test gcd_naive_test   ... bench:           0.25 ns/iter (+/- 0.01)

Benchmarked code in question is some recursive binary GCD implementation.

4 posts - 3 participants

Read full topic

🏷️ Rust_feed

👍 󠁮󠁮󠁮󠁮 👎 󠁮󠁮󠁮󠁮