Load raw .gguf models in rust on ROCm

⚓ Rust    📅 2025-12-25    👤 surdeus    👁️ 1      

surdeus

I'm building a toy project where I want to use some raw .gguf models. Since it's a toy project there aren't really any end goals other than "Stuff I want to do".

The catch is, My host device is a AMD device and must use ROCm acceleration. (Beelink GTR 9 Pro with Ryzen AI Max that has 128gb Unified memory, Currently set to 96gb VRAM)

My requirements are

  1. The whole thing needs to be portable, must fit in a single docker image and must load model files directly.
  2. Uses AMD ROCm acceleration.

What I've already done

  1. Non ROCm versions with both llama_cpp_2 and mistral_rs (Using the CPU directly).
  2. Versions with ollama running inside the same container, and versions with a separate ollama container. These actually use ROCm acceleration.

Neither mistral nor llama_cpp_2 don't seem to have full ROCm support yet

  1. mistral.rs
  2. llama_cpp_2

The actual Question

  1. Anyone have any suggestion on how I could use gguf models (or any other kind) directly with ROCm support in rust?

Note that I have 0 experience in training LLMs or how they actually work.

1 post - 1 participant

Read full topic

🏷️ Rust_feed