Structuring an algorithm for multiple threads
โ Rust ๐ 2026-02-18 ๐ค surdeus ๐๏ธ 8Hi all, I am learning Rust by implementing an algorithm that I have already implemented in Kotlin.
I am trying now to work out how I can structure the algorithm to fit most neatly within Rusts ownership model. (I am an experienced coder but without formal CompSci training).
The algorithm is a clustering algorithm for partitioning an image. The object of the excercise is to partition the image into โclustersโ each of which represent a group of pixels that are close to each other. Here โcloseโ means wiithin five dimensional space (x, y, r, g b).
- Each Pixel is associated with a Cluster.
- There are far fewer Clusters than Pixels, so the values in a Cluster are affected by multiple Pixels
- Each Cluster has a running total of red, green blue, x, y and count of pixels
- There is no need for a Cluster to know its Pixels, only to be able to calculate the average of each of the five values (x, y, r, g, b)
The algorithm for each iteration:
for each pixel
find nearest cluster
if this is not the current cluster
subtract x,y,r,g,b from current cluster (count--)
set current cluster to new cluster
add x,y,r,g,b to new cluster (count++)
for each cluster
recalculate average x, y, r, g, b for use in distance calculations
It seems to me that iterating over the pixels is the best place to introduce threading. Rayon works well.
Problem 1: how do I associate a Cluster with a Pixel. I could imagine using a HashMap, but that would need to be updated by multiple threads.
Problem 2: how do I update the Cluster returned from the HashMap - the Cluster will be accessed by multiple threads.
The simplest way may be to create a โgateโ (mutex?) at the point where a Pixel needs to reference and update the Clusters.This retains the parallelism for the distance calculations, but looses it for the updates, and the bottleneck may slow the algorithm down.
Alternately the updates could be sent down a channel and handled by multiple threads. There could however be an update for each pixel (certainly during the earlier iterations). Would this amount of allocation/de-allocation kill the performance?
As soon as I add mutex or channel I need per-thread storage of the cloned values, so do I lose the ability to use Rayon?
I would appreciate the views of experienced Rustaceans on which approach to take.
Thanks
6 posts - 3 participants
๐ท๏ธ Rust_feed