MetaXuda: Metal GPU runtime for ML on Apple Silicon (1.1 TOPS with Tokio async)
⚓ Rust 📅 2026-01-18 👤 surdeus 👁️ 5Hey Rustaceans! ![]()
I built MetaXuda - a native GPU runtime for machine learning on Apple Silicon, entirely in Rust.
Motivation:
Got tired of "buy Windows for ML" advice. Most ML libraries are CUDA-only with zero macOS GPU support. Translation layers like ZLUDA add overhead, so I built from scratch using Metal.
Tech Stack:
- Rust core with Tokio async runtime
- Metal for GPU acceleration
- PyO3 for Python bindings (cuda_pipeline.so)
- Arrow-based in-kernel quantization
- Multi-tier memory manager (GPU → RAM → SSD)
Performance:
- 1.1 TOPS throughput (95% of M3 Max theoretical peak)
- 230+ GPU operations (math, transform, ML primitives)
- 93.37% GPU utilization cap (prevents macOS starvation)
- Zero race conditions via centralized scheduler
Architecture Highlights:
- Migrated from sync → async (40+ iterations to get it right!)
- Stream managers + thread-pool groups coordinated by scheduler
- Handles 100GB+ workloads through intelligent memory tiering
- CUDA-compatible API naming for library interop
Current Status:
- Works with Numba (bypasses execution path)
- pip install metaxuda
- Toolkit integration (scikit-learn, XGBoost) coming next
- CUDA API coverage still in progress
Known Challenges:
- Apple's Metal stream limits are undocumented (reverse-engineered what I could)
- Some intentional blocking favors stability over raw speed
- ~1-in-million scheduler notification misses (rare edge case)
Links:
- GitHub: GitHub - Perinban/MetaXuda-: An Metal based Cuda Framework
- PyPI: pip install metaxuda
- Show HN discussion: MetaXuda – 1.1 Tops GPU Runtime for Apple Silicon ML (Rust and Metal) | Hacker News
Looking for feedback on:
- Async scheduler design patterns (Tokio + Metal coordination)
- Memory tier eviction strategies
- Anyone hitting Apple GPU quirks I should know about?
License inquiries: p.perinban@gmail.com
Would love thoughts from the community, especially on the Rust/async architecture choices!
1 post - 1 participant
🏷️ Rust_feed