Beginner: need help optimizing nested parallel iterators

⚓ rust    📅 2025-05-18    👤 surdeus    👁️ 5      

surdeus

Warning

This post was published 43 days ago. The information described in this article may have changed.

Hi, I'm pretty new to Rust and I was trying to write a project to get my hands dirty with the language, really figure it out. I wanted to write a bioinformatics tool that uses a multiple sequence alignment file to calculate a pairwise distance metric between the sequences. My entire repository is here for reference: GitHub - theabhirath/pairsnp-rs: Pairwise SNP distance matrices from Multiple Sequence Alignments, written in Rust.

But I'm running into a specific issue when I profile this code. Specifically, compared to C++ code parallelized using MPI, one of my functions is very, very slow. I've added the code for this function below:

fn calculate_pairwise_snp_distances(
    a_snps: &[RoaringBitmap],
    c_snps: &[RoaringBitmap],
    g_snps: &[RoaringBitmap],
    t_snps: &[RoaringBitmap],
    nseqs: usize,
    seq_length: u64,
) -> Vec<Vec<u64>> {
    (0..nseqs)
        .into_par_iter()
        .map(|i| {
            (i + 1..nseqs)
                .into_par_iter()
                .map(|j| {
                    let mut res = &a_snps[i] & &a_snps[j];
                    res |= &c_snps[i] & &c_snps[j];
                    res |= &g_snps[i] & &g_snps[j];
                    res |= &t_snps[i] & &t_snps[j];
                    seq_length - res.len()
                })
                .collect()
        })
        .collect()
}

I'm using roaring-rs for faster bitmaps (Roaring Bitmaps) and rayon for parallelization using into_par_iter, but when I profile it I see that most of the time in this code is spent waiting and extending the result vector. Is there a more efficient way to write this sort of parallel code in Rust? Any help in optimizing the performance of this function would be appreciated!

3 posts - 3 participants

Read full topic

🏷️ rust_feed