Load raw .gguf models in rust on ROCm

⚓ Rust 📅 2025-12-25 👤 surdeus 👁️ 1

Info

This post is auto-generated from RSS feed The Rust Programming Language Forum - Latest topics. Source: Load raw .gguf models in rust on ROCm

I'm building a toy project where I want to use some raw .gguf models. Since it's a toy project there aren't really any end goals other than "Stuff I want to do".

The catch is, My host device is a AMD device and must use ROCm acceleration. (Beelink GTR 9 Pro with Ryzen AI Max that has 128gb Unified memory, Currently set to 96gb VRAM)

My requirements are

The whole thing needs to be portable, must fit in a single docker image and must load model files directly.
Uses AMD ROCm acceleration.

What I've already done

Non ROCm versions with both llama_cpp_2 and mistral_rs (Using the CPU directly).
Versions with ollama running inside the same container, and versions with a separate ollama container. These actually use ROCm acceleration.

Neither mistral nor llama_cpp_2 don't seem to have full ROCm support yet

The actual Question

Anyone have any suggestion on how I could use gguf models (or any other kind) directly with ROCm support in rust?

Note that I have 0 experience in training LLMs or how they actually work.

1 post - 1 participant

Read full topic

🏷️ Rust_feed

👍 󠁮󠁮󠁮󠁮 👎 󠁮󠁮󠁮󠁮