Disable unsupported SIMD code generation

⚓ Rust    📅 2025-07-06    👤 surdeus    👁️ 19      

surdeus

Warning

This post was published 122 days ago. The information described in this article may have changed.

Problem

I've written some SIMD intrinsics code targeting x86-64: SSE4.1, AVX, and AVX2. I began reading some of the assembly output, and came across a troubling result. When I made the mistake of not explicitly enabling AVX2 but only AVX, the compiler generated weird code sequences for AVX2 intrinsics.

This is very problematic since code like this may go unnoticed unless the actual assembly is inspected with every version. Is there any way to disable or protect against this code generation?

Example

See that no_avx2 below still compiles, but to a very slow version of yes_avx2. One expects _mm256_and_si256 to compile to nearly a single vandps instruction.

Code:

use std::arch::x86_64::*;

#[target_feature(enable = "avx2")]
pub unsafe fn yes_avx2(a: __m256i, b: __m256i) -> __m256i {
    _mm256_and_si256(a, b)
}

#[target_feature(enable = "avx")]
pub unsafe fn no_avx2(a: __m256i, b: __m256i) -> __m256i {
    _mm256_and_si256(a, b)
}

Assembly (compiler explorer with -C opt-level=3):

core::core_arch::x86::avx2::_mm256_and_si256::hda082db2f6c5855a:
        vmovaps (%rdx), %ymm0
        vandps  (%rsi), %ymm0, %ymm0
        vmovaps %ymm0, (%rdi)
        vzeroupper
        retq

example::no_avx2::h5e8ad84ba7048fcb:
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %rbx
        andq    $-32, %rsp
        subq    $96, %rsp
        movq    %rdi, %rbx
        vmovaps (%rsi), %ymm0
        vmovaps %ymm0, (%rsp)
        vmovaps (%rdx), %ymm0
        vmovaps %ymm0, 32(%rsp)
        movq    %rsp, %rsi
        leaq    32(%rsp), %rdx
        vzeroupper
        callq   core::core_arch::x86::avx2::_mm256_and_si256::hda082db2f6c5855a
        movq    %rbx, %rax
        leaq    -8(%rbp), %rsp
        popq    %rbx
        popq    %rbp
        retq

example::yes_avx2::ha14dc1d4af094302:
        movq    %rdi, %rax
        vmovaps (%rdx), %ymm0
        vandps  (%rsi), %ymm0, %ymm0
        vmovaps %ymm0, (%rdi)
        vzeroupper
        retq

2 posts - 2 participants

Read full topic

🏷️ rust_feed