Structuring an algorithm for multiple threads

โš“ Rust    ๐Ÿ“… 2026-02-18    ๐Ÿ‘ค surdeus    ๐Ÿ‘๏ธ 8      

surdeus

Hi all, I am learning Rust by implementing an algorithm that I have already implemented in Kotlin.

I am trying now to work out how I can structure the algorithm to fit most neatly within Rusts ownership model. (I am an experienced coder but without formal CompSci training).

The algorithm is a clustering algorithm for partitioning an image. The object of the excercise is to partition the image into โ€˜clustersโ€™ each of which represent a group of pixels that are close to each other. Here โ€˜closeโ€™ means wiithin five dimensional space (x, y, r, g b).

  • Each Pixel is associated with a Cluster.
  • There are far fewer Clusters than Pixels, so the values in a Cluster are affected by multiple Pixels
  • Each Cluster has a running total of red, green blue, x, y and count of pixels
  • There is no need for a Cluster to know its Pixels, only to be able to calculate the average of each of the five values (x, y, r, g, b)

The algorithm for each iteration:

for each pixel
    find nearest cluster
    if this is not the current cluster 
        subtract x,y,r,g,b from current cluster (count--)
        set current cluster to new cluster
        add x,y,r,g,b to new cluster (count++)

for each cluster
    recalculate average x, y, r, g, b for use in distance calculations

It seems to me that iterating over the pixels is the best place to introduce threading. Rayon works well.
Problem 1: how do I associate a Cluster with a Pixel. I could imagine using a HashMap, but that would need to be updated by multiple threads.

Problem 2: how do I update the Cluster returned from the HashMap - the Cluster will be accessed by multiple threads.

The simplest way may be to create a โ€˜gateโ€™ (mutex?) at the point where a Pixel needs to reference and update the Clusters.This retains the parallelism for the distance calculations, but looses it for the updates, and the bottleneck may slow the algorithm down.

Alternately the updates could be sent down a channel and handled by multiple threads. There could however be an update for each pixel (certainly during the earlier iterations). Would this amount of allocation/de-allocation kill the performance?

As soon as I add mutex or channel I need per-thread storage of the cloned values, so do I lose the ability to use Rayon?

I would appreciate the views of experienced Rustaceans on which approach to take.

Thanks

6 posts - 3 participants

Read full topic

๐Ÿท๏ธ Rust_feed