Any way to force the *omission* of frame pointers?

⚓ Rust    📅 2025-10-30    👤 surdeus    👁️ 7      

surdeus

Warning

This post was published 32 days ago. The information described in this article may have changed.

I'm using Rust to write code for an embedded target of mine -- a CPU made on a digital circuit simulator, implementing a subset of Thumb (specifically, thumbv6m-none-eabi).

The CPU is slow (~40kHz) so every wasted instruction adds up to a noticeable slowing down of stuff I run. I understand this is nowhere near anything Rust was designed to run on.

There is something I'm blocking on: all functions, even leaves, have a frame pointer, which I don't need.

For example, here's a small function that implements division using a hardware divider mapped to memory:

#[unsafe(export_name = "__aeabi_uidiv")]
pub extern "C" fn __aeabi_uidiv(a: u32, b: u32) -> u32 {
    let mut res;
    unsafe {
        core::arch::asm!("ldr {res}, [{addr}]",
            addr = in(reg) 0xffff_ff20,
            res = lateout(reg) res,
            in("r0") a,
            in("r1") b,
        )
    }
    res
}

Since the ABI dictates parameters to be passed in sequential registers (r0, r1) and the result returned to r0, I would expect the following code:

__aeabi_uidiv:
    movs r2, #223
    mvns r2, r2 ; simply getting the address of the MMIO port
    ldr r0, [r2] ; r0 and r1 are already populated with the parameters
    bx lr

This is the smallest possible Thumb code for what I'm trying to do.

But here is what I'm getting with opt-level=3, lto="fat", --release:

__aeabi_uidiv:
	.fnstart
	.save	{r7, lr}
	push	{r7, lr}
	.setfp	r7, sp
	add	r7, sp, #0
	movs	r2, #223
	mvns	r2, r2
	@APP
	ldr	r0, [r2]
	@NO_APP
	pop	{r7, pc}

There is no reason for lr to be saved here, since the function is a leaf. But most importantly, it uses r7 as a frame pointer when I would expect no such thing to be done with opt-level=3. That's 2 additional instructions (and since push/pop take up multiple cycles, it's actually more like 6). In relative terms, 50% to 150% more instructions!

There is a codegen parameter to force frame pointers (-C force-frame-pointers), but nothing to force their omission, to my knowledge.

Is there anything I can pass to the compiler to force it to generate the optimal code? Otherwise, I'll have to rewrite all those small functions to assembly. For the example I gave here, that's not really an issue, but I have more complex functions around, such as memcpy4, that end up completely botched if I write them in Rust. Of course, this is the kind of stuff global_asm! is for, but it'd be really nice to write this stuff in Rust.

Thanks!

edit: edited example code to use constant to make point clearer

7 posts - 4 participants

Read full topic

🏷️ Rust_feed