Surprisingly bad codegen fixed by `extern "C"`
โ Rust ๐ 2026-06-16 ๐ค surdeus ๐๏ธ 1Only because I read The Rust Calling Convention We Deserve ยท mcyoung a while back I had a hunch that something really wrong is going on here:
pub type Reg = u8;
// `repr` does not matter here, but added for completeness
#[repr(C)]
pub struct Rs1Rs2Operands {
pub rs1: Reg,
pub rs2: Reg,
}
#[repr(u16, align(4))]
pub enum ContractInstruction {
CAddi4spn {
rs1: Reg,
rs2: Reg,
rd: Reg,
nzuimm: u16,
},
CLw {
rs1: Reg,
rs2: Reg,
rd: Reg,
uimm: u8,
},
}
impl ContractInstruction {
#[unsafe(no_mangle)]
pub fn get_rs1_rs2_operands(self) -> Rs1Rs2Operands {
match self {
Self::CAddi4spn { rs1, rs2, .. } => Rs1Rs2Operands { rs1, rs2 },
Self::CLw { rs1, rs2, .. } => Rs1Rs2Operands { rs1, rs2 },
}
}
#[unsafe(no_mangle)]
pub extern "C" fn get_rs1_rs2_operands_c(self) -> Rs1Rs2Operands {
match self {
Self::CAddi4spn { rs1, rs2, .. } => Rs1Rs2Operands { rs1, rs2 },
Self::CLw { rs1, rs2, .. } => Rs1Rs2Operands { rs1, rs2 },
}
}
}
I would have expected the return value to occupy a single register (after all it is just two u8s) and the two methods to generate exactly the same code. Surprisingly (and not in a good way) the code is not the same:
get_rs1_rs2_operands:
mov rax, rdi
mov edx, eax
shr edx, 24
shr eax, 16
ret
get_rs1_rs2_operands_c:
mov rax, rdi
shr eax, 16
ret
https://rust.godbolt.org/z/qbWTsb6v1
In a tight loop this is, obviously, very bad for performance to have twice as many instructions for no reason.
I'm sure there were discussions about this in the past, but did really nothing happen about it yet? I'm really not looking forward to slapping extern "C" everywhere just to get a more reasonable assembly.
2 posts - 2 participants
๐ท๏ธ Rust_feed