Bad idea? Using criterion to measure not time

โš“ Rust    ๐Ÿ“… 2025-12-02    ๐Ÿ‘ค surdeus    ๐Ÿ‘๏ธ 1      

surdeus

Hello there,
This might be a bad idea, but i have started using criterion to benchmark time, and it came to me the idea to use all robustuness of Criterion and hack it in a way to measure something other than time, stuff such as a ML model accuracy.

Here is an example of what im trying to do, in the following project im measuring the prediction accuracy through two metrics: Sensitivity and Specificity, and i want to bench these values instead of the time it takes to compleete them.

Using Custom Measurements - Criterion.rs Documentation i came up with the following file:

use criterion::{
    Criterion, criterion_group, criterion_main,
    measurement::{Measurement, ValueFormatter},
};
use linfa_nn::{CommonNearestNeighbour, NearestNeighbour};
use shopping::{evaluate, load_data_from_url, predict};

const TEST_RATIO: f32 = 0.4;
const K_NEIGHBORS: usize = 2;
const URL: &str = "https://cdn.cs50.net/ai/2023/x/projects/4/shopping.zip";

struct MetricMeasurement;

impl Measurement for MetricMeasurement {
    type Intermediate = f64;
    type Value = f64;

    fn start(&self) -> Self::Intermediate {
        0.0
    }

    fn end(&self, i: Self::Intermediate) -> Self::Value {
        i
    }

    fn add(&self, v1: &Self::Value, v2: &Self::Value) -> Self::Value {
        v1 + v2
    }

    fn zero(&self) -> Self::Value {
        0.0
    }

    fn to_f64(&self, value: &Self::Value) -> f64 {
        *value
    }

    fn formatter(&self) -> &dyn criterion::measurement::ValueFormatter {
        &MetricFormatter
    }
}

struct MetricFormatter;

impl ValueFormatter for MetricFormatter {
    fn format_value(&self, value: f64) -> String {
        format!("{value:.4}")
    }

    fn scale_values(&self, _typical_value: f64, _values: &mut [f64]) -> &'static str {
        ""
    }

    fn scale_throughputs(
        &self,
        _typical_value: f64,
        _throughput: &criterion::Throughput,
        _values: &mut [f64],
    ) -> &'static str {
        ""
    }

    fn scale_for_machines(&self, _values: &mut [f64]) -> &'static str {
        ""
    }
}

fn bench_sensitivity(c: &mut Criterion<MetricMeasurement>) {
    let (evidence, labels) =
        load_data_from_url(URL).expect("Failed to load CSV from URL for benchmark");

    let dataset = linfa::Dataset::new(evidence, labels);
    let mut rnd = rand::thread_rng();

    c.bench_function("Model Sensitivity", |b| {
        b.iter_custom(|iters| {
            let mut total = 0.0;

            for _ in 0..iters {
                let (train, test) = dataset
                    .clone()
                    .shuffle(&mut rnd)
                    .split_with_ratio(TEST_RATIO);

                let model = CommonNearestNeighbour::KdTree
                    .from_batch(train.records(), linfa_nn::distance::L2Dist)
                    .expect("NN index");

                let predictions = predict(&*model, train.targets(), test.records(), K_NEIGHBORS);
                let (sensitivity, _specificity) = evaluate(test.targets(), &predictions);
                total += sensitivity;
            }

            total as f64 / iters as f64
        })
    });
}

fn bench_specificity(c: &mut Criterion<MetricMeasurement>) {
    let (evidence, labels) =
        load_data_from_url(URL).expect("Failed to load CSV from URL for benchmark");

    let dataset = linfa::Dataset::new(evidence, labels);

    c.bench_function("Model Specificity", |b| {
        b.iter_custom(|iters| {
            let mut total = 0.0;

            let mut rnd = rand::thread_rng();
            for _ in 0..iters {
                let (train, test) = dataset
                    .clone()
                    .shuffle(&mut rnd)
                    .split_with_ratio(TEST_RATIO);

                let model = CommonNearestNeighbour::KdTree
                    .from_batch(train.records(), linfa_nn::distance::L2Dist)
                    .expect("NN index");

                let predictions = predict(&*model, train.targets(), test.records(), K_NEIGHBORS);
                let (_sensitivity, specificity) = evaluate(test.targets(), &predictions);
                assert!(specificity > 0.5);
                total += specificity;
            }

            total as f64 / iters as f64
        })
    });
}

criterion_group! {
    name = bench_accuracy;
    config = Criterion::default().with_measurement(MetricMeasurement);
    targets = bench_sensitivity, bench_specificity
}
criterion_main!(bench_accuracy);

The whole repo can be found here: Files ยท feat/benches ยท CS50-AI / shopping ยท GitLab

Now the benches are able to run but are giving a much worse accuracy than repeatedly running the main function.

Now is this a good idea? have anyone tried to do anything similar? should i continue trying to make this work or is it a waste of time?

2 posts - 2 participants

Read full topic

๐Ÿท๏ธ Rust_feed