Announcing `numr`: A "Batteries-Included" Numerical Library for Rust (NumPy + GPU + Autograd)

⚓ Rust 📅 2026-02-04 👤 surdeus 👁️ 9

Warning

This post was published 75 days ago. The information described in this article may have changed.

Info

This post is auto-generated from RSS feed The Rust Programming Language Forum - Latest topics. Source: Announcing `numr`: A "Batteries-Included" Numerical Library for Rust (NumPy + GPU + Autograd)

Hi everyone,

I’ve started working on a new project called numr, and I wanted to share the vision and get early feedback.

The core idea is simple: What if NumPy was built today, in Rust, with the features we always wished it had built-in?

We all love ndarray and the existing Rust ecosystem, but fragmentation is a real pain point. You often need separate crates for BLAS, LAPACK, sparse arrays, and especially GPU support. If you need gradients, you usually have to switch to a full-blown DL framework like burn or candle.

numr aims to be the foundational numerical layer that unifies these. It is designed to be backend-agnostic, differentiable, and extensible.

What Makes `numr` Different?

1. "Same Code, Any Backend" Architecture
numr is built around a generic Tensor<R: Runtime> abstraction. You write your logic once, and it runs on:

CPU: (AVX2/AVX-512/NEON accelerated)
CUDA: (Native PTX kernels for NVIDIA)
WebGPU: (Cross-platform support for AMD, Intel, and Apple Silicon)

Unlike wrappers around cuBLAS or MKL, numr implements native kernels for operations, meaning no massive external C++ dependencies and full transparency down to the metal.

2. Built-in Autograd (Reverse & Forward Mode)
Differentiation isn't an afterthought. It supports:

Reverse-mode: For standard gradient descent/training.
Forward-mode: For efficient Jacobian-Vector Products (JVP), crucial for scientific computing tasks like stiff ODE solvers.

3. Modern & Comprehensive Dtypes
Beyond standard f32/f64, numr has native support for:

f16 / bf16 (Half precision)
fp8 (FP8E4M3, FP8E5M2 for modern ML workloads)
Complex numbers (Complex64/128)
Sparse Tensors (CSR, CSC, COO formats) are integrated directly, not a separate crate.

The "SciPy" Layer: `solvr`

To prove the robustness of numr, I am simultaneously building solvr, a library for higher-level scientific computing (equivalent to SciPy). It currently implements algorithms for:

Optimization: BFGS (using tensor ops, fully GPU-accelerated), simple gradient descent.
Integration: Trapezoidal, Simpson's rule, and ODE solvers (RK45, Dop853).
Signal Processing: FFT, Convolution, STFT.

Because solvr is built on numr traits, all of these algorithms run seamlessly on CUDA or WebGPU without changing a single line of code.

Current Status

This is currently experimental (beta) software.

The architecture is stable.
Many kernels (Matmul, Unary, Binary, Reductions) are implemented for all backends.
However, performance tuning (vs. vendor libs) is ongoing, and the API is subject to change.

Check it out

I’m looking for feedback on the API design and contributors who are interested in writing native kernels (WGSL/CUDA/Rust) or high-level scientific algorithms.

Repository:

Example usage:

use numr::prelude::*;

// Define a device (CPU, Cuda, or Wgpu)
let device = CudaRuntime::default_device()?;

// Create tensors directly on GPU
let a = Tensor::<CudaRuntime>::randn(&[1024, 1024], &device)?;
let b = Tensor::<CudaRuntime>::randn(&[1024, 1024], &device)?;

// Operations use native GPU kernels
let c = a.matmul(&b)?;

Thanks for reading!

1 post - 1 participant

Read full topic

🏷️ Rust_feed

👍 󠁮󠁮󠁮󠁮 👎 󠁮󠁮󠁮󠁮

What Makes numr Different?

The "SciPy" Layer: solvr

Current Status

Check it out

What Makes `numr` Different?

The "SciPy" Layer: `solvr`