Load raw .gguf models in rust on ROCm
⚓ Rust 📅 2025-12-25 👤 surdeus 👁️ 1I'm building a toy project where I want to use some raw .gguf models. Since it's a toy project there aren't really any end goals other than "Stuff I want to do".
The catch is, My host device is a AMD device and must use ROCm acceleration. (Beelink GTR 9 Pro with Ryzen AI Max that has 128gb Unified memory, Currently set to 96gb VRAM)
My requirements are
- The whole thing needs to be portable, must fit in a single docker image and must load model files directly.
- Uses AMD ROCm acceleration.
What I've already done
- Non ROCm versions with both llama_cpp_2 and mistral_rs (Using the CPU directly).
- Versions with ollama running inside the same container, and versions with a separate ollama container. These actually use ROCm acceleration.
Neither mistral nor llama_cpp_2 don't seem to have full ROCm support yet
The actual Question
- Anyone have any suggestion on how I could use gguf models (or any other kind) directly with ROCm support in rust?
Note that I have 0 experience in training LLMs or how they actually work.
1 post - 1 participant
🏷️ Rust_feed