Info
This post is auto-generated from RSS feed The Rust Programming Language Forum - Latest topics. Source: Unexpected parallel sort performance
Hello Rust community!
I would love to receive some thoughts on why a safe parallel mergesort performs better than an unsafe implementation.
I compare three simple parallel merge sorts (all implementations and benchmark can be found here):
split_at_mut
*mut i32
thereby does not need any splittingWhat I expect:
In both parallel and sequential case, Slices is slower than both Slices Unchecked and Single Array.
What I observed after benchmarking:
Why when I switch to parallel execution, Single Array sorting method performs worse than Slice? Single Array doesn't do any index bounds checking, so doesn't Slice Unchecked, so I would assume they would perform similarly, as in sequential case. What could be the reasons?
Array Size | Slices (baseline) | Single Array | Diff | Slices Unchecked | Diff |
---|---|---|---|---|---|
512 | 16µs | 15µs | -7% | 16µs | 0% |
2048 | 77µs | 70µs | -8% | 76µs | -1% |
16384 | 761µs | 687µs | -9% | 730µs | -4% |
65536 | 3290µs | 3000µs | -8% | 3195µs | -2% |
250k | 14ms | 13ms | -4% | 13.9ms | -0.7% |
1kk | 62ms | 57ms | -8% | 60ms | -3% |
5kk | 332ms | 325ms | -2% | 322ms | -3% |
10kk | 737ms | 684ms | -7% | 716ms | -3% |
Array Size | Slices (baseline) | Single Array | Diff | Slices Unchecked | Diff |
---|---|---|---|---|---|
250k | 15ms | 18ms | +20% | 15.2ms | +1% |
500k | 26ms | 31ms | +19% | 25.8ms | -1% |
2kk | 60ms | 69ms | +15% | 58.6ms | -2% |
8kk | 141ms | 156ms | +10% | 133ms | -6% |
16kk | 210ms | 235ms | +11% | 192ms | -9% |
64kk | 642ms | 748ms | +16% | 599ms | -7% |
100kk | 990ms | 1126ms | +13% | 901ms | -9% |
Disclaimer: I am very new to benchmarking so please let me know if I did something wrong
I run SLURM tasks running in a Singularity container with Rust 1.88 installed,
requesting 32G of memory and exclusive mode (#SBATCH --exclusive)
I use Criterion, see the benchmarking code for full details, but in short:
Ubuntu 20.04.6 LTS
> uname -a
Linux xcs-vnode-1 5.4.0-208-generic #228-Ubuntu SMP Fri Feb 7 19:41:33 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
> systemd-detect-virt
kvm
I do not try to write a super fast parallel merge sort, I was just curious why different versions of the same algorithm perform this way.
The original project, if you are interested, came from trying to write a formally verified parallel merge sort written in Verus, you can see it here
2 posts - 2 participants
🏷️ rust_feed