Are vectorization failures due to Rust or LLVM?

⚓ Rust    📅 2025-11-19    👤 surdeus    👁️ 11      

surdeus

Here's a simplified piece of extremely performance-sensitive decompression code I would like to use:

pub unsafe fn decompress_offsets(
    base_bit_idx: usize,
    src: &[u8],
    offset_bits_csum_scratch: &[u32],
    offset_bits_scratch: &[u32],
    latents: &mut [u64],
) {
    for (&offset_bits, (&offset_bits_csum, latent)) in offset_bits_scratch.iter().zip(
        offset_bits_csum_scratch
            .iter()
            .zip(latents.iter_mut()),
    ) {
      let bit_idx = base_bit_idx as u32 + offset_bits_csum;
      let byte_idx = bit_idx / 8;
      let bits_past_byte = bit_idx % 8;
      *latent = read_u64_at(
        src,
        byte_idx as usize,
        bits_past_byte,
        offset_bits,
      ).wrapping_add(*latent);
    }
}

#[inline]
unsafe fn read_u64_at(
  src: &[u8],
  byte_idx: usize,
  bits_past_byte: u32,
  n: u32,
) -> u64 {
  debug_assert!(n <= 57);
  let raw_bytes = *(src.as_ptr().add(byte_idx) as *const [u8; 8]);
  let value = u64::from_le_bytes(raw_bytes);
  (value >> bits_past_byte) & ((1 << n) - 1)
}

godbolt link

This vectorizes on x64 but fails to do so on aarch64. I can get some very similar loops to vectorize, if I

  1. remove the final wrapping add, or
  2. write to another dst: &mut [u64] buffer instead of working in-place.

However, I would rather not do those things for performance reasons, and in reality I have several generic versions of this loop, so I can't easily write inline assembly.

Things I've tried:

  • looked at the LLVM IR. The vectorizing versions have a vector.body section, but I'm not sure if rustc produces that or LLVM does and I'm just looking at IR after all the optimization passes.
  • looked at the assembly on both platforms. It appears to me that what I want is definitely possible by tweaking the assembly from (1.) above.

So how can I tell if a vectorization failure is due to Rust or LLVM? If the former, how can we improve the compiler in this case? Are there any good workarounds for the moment?

2 posts - 2 participants

Read full topic

🏷️ Rust_feed