Take a look at the neural network framework I wrote, which is implemented in Rust + CUDA

⚓ Rust    📅 2026-05-01    👤 surdeus    👁️ 1      

surdeus

Highlights

  • Rust-first, not Rust-only implementation
    • Rust owns the framework structure and most high-level logic.
    • CUDA C++ is used for optional GPU acceleration.
    • CPU-only builds remain available without the cuda feature.
  • Dynamic autograd built around tensor graph construction
  • Module-style abstraction for model components
  • Separated layers / ops / models for easier experimentation
  • Flexible precision system
    • parameter dtype
    • runtime dtype
    • activation dtype
    • KV-cache dtype
  • Quantization-aware loading
    • load float weights normally
    • quantize on load to i8
    • generate offline quantized safetensors
  • CPU and CUDA execution paths with explicit kernel/backend work
  • Hugging Face tokenizers integration
  • Safetensors support with memory-mapped and streamed loading modes
  • Release profile tuned with lto, panic = "abort", and strip

2 posts - 2 participants

Read full topic

🏷️ Rust_feed