Lele: Bare-Metal ML Inference Engine in Pure Rust(compile onnx into rust)
⚓ Rust 📅 2026-02-10 👤 surdeus 👁️ 8Hi everyone! I'd like to introduce lele - a standalone inference engine for running ML models without runtime dependencies.
What makes it different?
Instead of wrapping C++ libraries like ONNX Runtime, lele compiles ONNX models directly into specialized Rust code with hand-crafted SIMD kernels. Think of it as an ahead-of-time compiler for neural networks.
Key Features
- Zero dependencies - Generated models are pure Rust
- SIMD optimized - Hand-written kernels for NEON (Apple Silicon), AVX/SSE (x86_64), and WASM SIMD128
- WebAssembly ready - Full browser support with small binary sizes (~1.7MB for complete models)
- Memory efficient - Static buffer allocation and zero-copy weight loading
Currently Supported Models
- SenseVoice (Multi-lingual ASR)
- Silero VAD (Voice Activity Detection)
- Supertonic (Text-to-Speech)
- YOLO26 (Object Detection)
Performance Snapshot
On Apple Silicon, lele achieves 1.7x faster inference than ONNX Runtime for Silero VAD. Other models are still being optimized - it's a work in progress showing that the compile-time specialization approach can work.
Repository: https://github.com/miuda-ai/lele
Thanks for reading! Happy to answer any questions.
WASM demo here:
1 post - 1 participant
🏷️ Rust_feed
