Lele: Bare-Metal ML Inference Engine in Pure Rust(compile onnx into rust)

⚓ Rust    📅 2026-02-10    👤 surdeus    👁️ 8      

surdeus

Hi everyone! I'd like to introduce lele - a standalone inference engine for running ML models without runtime dependencies.

What makes it different?

Instead of wrapping C++ libraries like ONNX Runtime, lele compiles ONNX models directly into specialized Rust code with hand-crafted SIMD kernels. Think of it as an ahead-of-time compiler for neural networks.

Key Features

  • Zero dependencies - Generated models are pure Rust
  • SIMD optimized - Hand-written kernels for NEON (Apple Silicon), AVX/SSE (x86_64), and WASM SIMD128
  • WebAssembly ready - Full browser support with small binary sizes (~1.7MB for complete models)
  • Memory efficient - Static buffer allocation and zero-copy weight loading

Currently Supported Models

  • SenseVoice (Multi-lingual ASR)
  • Silero VAD (Voice Activity Detection)
  • Supertonic (Text-to-Speech)
  • YOLO26 (Object Detection)

Performance Snapshot

On Apple Silicon, lele achieves 1.7x faster inference than ONNX Runtime for Silero VAD. Other models are still being optimized - it's a work in progress showing that the compile-time specialization approach can work.

Repository: https://github.com/miuda-ai/lele

Thanks for reading! Happy to answer any questions.

WASM demo here:

yolo26

1 post - 1 participant

Read full topic

🏷️ Rust_feed