Lele: Bare-Metal ML Inference Engine in Pure Rust(compile onnx into rust)

⚓ Rust 📅 2026-02-10 👤 surdeus 👁️ 13

Warning

This post was published 69 days ago. The information described in this article may have changed.

Info

This post is auto-generated from RSS feed The Rust Programming Language Forum - Latest topics. Source: Lele: Bare-Metal ML Inference Engine in Pure Rust(compile onnx into rust)

Hi everyone! I'd like to introduce lele - a standalone inference engine for running ML models without runtime dependencies.

What makes it different?

Instead of wrapping C++ libraries like ONNX Runtime, lele compiles ONNX models directly into specialized Rust code with hand-crafted SIMD kernels. Think of it as an ahead-of-time compiler for neural networks.

Key Features

Zero dependencies - Generated models are pure Rust
SIMD optimized - Hand-written kernels for NEON (Apple Silicon), AVX/SSE (x86_64), and WASM SIMD128
WebAssembly ready - Full browser support with small binary sizes (~1.7MB for complete models)
Memory efficient - Static buffer allocation and zero-copy weight loading

Currently Supported Models

SenseVoice (Multi-lingual ASR)
Silero VAD (Voice Activity Detection)
Supertonic (Text-to-Speech)
YOLO26 (Object Detection)

Performance Snapshot

On Apple Silicon, lele achieves 1.7x faster inference than ONNX Runtime for Silero VAD. Other models are still being optimized - it's a work in progress showing that the compile-time specialization approach can work.

Repository: https://github.com/miuda-ai/lele

Thanks for reading! Happy to answer any questions.

WASM demo here:

1 post - 1 participant

Read full topic

🏷️ Rust_feed

👍 󠁮󠁮󠁮󠁮 👎 󠁮󠁮󠁮󠁮