Warning
This post was published 43 days ago. The information described in this article may have changed.
Hi, I'm pretty new to Rust and I was trying to write a project to get my hands dirty with the language, really figure it out. I wanted to write a bioinformatics tool that uses a multiple sequence alignment file to calculate a pairwise distance metric between the sequences. My entire repository is here for reference: GitHub - theabhirath/pairsnp-rs: Pairwise SNP distance matrices from Multiple Sequence Alignments, written in Rust.
But I'm running into a specific issue when I profile this code. Specifically, compared to C++ code parallelized using MPI, one of my functions is very, very slow. I've added the code for this function below:
fn calculate_pairwise_snp_distances(
a_snps: &[RoaringBitmap],
c_snps: &[RoaringBitmap],
g_snps: &[RoaringBitmap],
t_snps: &[RoaringBitmap],
nseqs: usize,
seq_length: u64,
) -> Vec<Vec<u64>> {
(0..nseqs)
.into_par_iter()
.map(|i| {
(i + 1..nseqs)
.into_par_iter()
.map(|j| {
let mut res = &a_snps[i] & &a_snps[j];
res |= &c_snps[i] & &c_snps[j];
res |= &g_snps[i] & &g_snps[j];
res |= &t_snps[i] & &t_snps[j];
seq_length - res.len()
})
.collect()
})
.collect()
}
I'm using roaring-rs
for faster bitmaps (Roaring Bitmaps) and rayon
for parallelization using into_par_iter
, but when I profile it I see that most of the time in this code is spent waiting and extending the result vector. Is there a more efficient way to write this sort of parallel code in Rust? Any help in optimizing the performance of this function would be appreciated!
3 posts - 3 participants
🏷️ rust_feed