Open source LLM compiler in Rust for models on Huggingface
⚓ Rust 📅 2026-03-13 👤 surdeus 👁️ 2Open source LLM compiler for models on Huggingface. 152 tok/s. 11.3W. 5.3B CPU instructions. mlx-lm: 113 tok/s. 14.1W. 31.4B CPU instructions on macbook M1 Pro.
UNC is 1.35x faster while using 25% less GPU power, resulting in 1.7x better energy efficiency. The compiled approach eliminates Python runtime and framework dispatch overhead entirely — 8.4x fewer CPU instructions means less heat, less power, and more headroom for the GPU.
UNC uses an LLVM inspired IR frontend for weight+architecture abstraction before applying optimisations
HuggingFace model
|
[ Frontend ] Parse config.json + safetensors
|
[ IR Graph ] Hardware-agnostic tensor graph
|
[ Compiler ] Fusion, quantization, memory planning
|
+------------------+------------------+------------------+
| | | |
[ Metal ] [ CUDA ] [ ROCm ] [ WASM ]
Obj-C + Metal PTX kernels HIP kernels WebGPU shaders
shaders (planned) (planned) (planned)
|
Native binary
Mach-O (AOT) or
.unc bundle (JIT)
IR: Hardware-agnostic typed tensor graph with BatchMatMul, QuantizedMatVec, RMSNorm, LayerNorm, QKNorm, RoPE, SDPA, SwiGLU, KVCacheAppend, Gather, etc. The IR is target-independent — the same graph can be lowered to Metal (current), CUDA, ROCm, WASM, or CPU-only backends with acceleration providers like Intel oneDNN.
Compiler passes: Weight binding, dead code elimination, QKV fusion, Gate+Up fusion, SwiGLU fusion, Add+RMSNorm fusion, RoPE+KV fusion, PSQ pipeline, dual-path (GEMM/GEMV), kernel matching, barrier analysis, memory planning with buffer aliasing.
1 post - 1 participant
🏷️ Rust_feed