Bad idea? Using criterion to measure not time
โ Rust ๐ 2025-12-02 ๐ค surdeus ๐๏ธ 1Hello there,
This might be a bad idea, but i have started using criterion to benchmark time, and it came to me the idea to use all robustuness of Criterion and hack it in a way to measure something other than time, stuff such as a ML model accuracy.
Here is an example of what im trying to do, in the following project im measuring the prediction accuracy through two metrics: Sensitivity and Specificity, and i want to bench these values instead of the time it takes to compleete them.
Using Custom Measurements - Criterion.rs Documentation i came up with the following file:
use criterion::{
Criterion, criterion_group, criterion_main,
measurement::{Measurement, ValueFormatter},
};
use linfa_nn::{CommonNearestNeighbour, NearestNeighbour};
use shopping::{evaluate, load_data_from_url, predict};
const TEST_RATIO: f32 = 0.4;
const K_NEIGHBORS: usize = 2;
const URL: &str = "https://cdn.cs50.net/ai/2023/x/projects/4/shopping.zip";
struct MetricMeasurement;
impl Measurement for MetricMeasurement {
type Intermediate = f64;
type Value = f64;
fn start(&self) -> Self::Intermediate {
0.0
}
fn end(&self, i: Self::Intermediate) -> Self::Value {
i
}
fn add(&self, v1: &Self::Value, v2: &Self::Value) -> Self::Value {
v1 + v2
}
fn zero(&self) -> Self::Value {
0.0
}
fn to_f64(&self, value: &Self::Value) -> f64 {
*value
}
fn formatter(&self) -> &dyn criterion::measurement::ValueFormatter {
&MetricFormatter
}
}
struct MetricFormatter;
impl ValueFormatter for MetricFormatter {
fn format_value(&self, value: f64) -> String {
format!("{value:.4}")
}
fn scale_values(&self, _typical_value: f64, _values: &mut [f64]) -> &'static str {
""
}
fn scale_throughputs(
&self,
_typical_value: f64,
_throughput: &criterion::Throughput,
_values: &mut [f64],
) -> &'static str {
""
}
fn scale_for_machines(&self, _values: &mut [f64]) -> &'static str {
""
}
}
fn bench_sensitivity(c: &mut Criterion<MetricMeasurement>) {
let (evidence, labels) =
load_data_from_url(URL).expect("Failed to load CSV from URL for benchmark");
let dataset = linfa::Dataset::new(evidence, labels);
let mut rnd = rand::thread_rng();
c.bench_function("Model Sensitivity", |b| {
b.iter_custom(|iters| {
let mut total = 0.0;
for _ in 0..iters {
let (train, test) = dataset
.clone()
.shuffle(&mut rnd)
.split_with_ratio(TEST_RATIO);
let model = CommonNearestNeighbour::KdTree
.from_batch(train.records(), linfa_nn::distance::L2Dist)
.expect("NN index");
let predictions = predict(&*model, train.targets(), test.records(), K_NEIGHBORS);
let (sensitivity, _specificity) = evaluate(test.targets(), &predictions);
total += sensitivity;
}
total as f64 / iters as f64
})
});
}
fn bench_specificity(c: &mut Criterion<MetricMeasurement>) {
let (evidence, labels) =
load_data_from_url(URL).expect("Failed to load CSV from URL for benchmark");
let dataset = linfa::Dataset::new(evidence, labels);
c.bench_function("Model Specificity", |b| {
b.iter_custom(|iters| {
let mut total = 0.0;
let mut rnd = rand::thread_rng();
for _ in 0..iters {
let (train, test) = dataset
.clone()
.shuffle(&mut rnd)
.split_with_ratio(TEST_RATIO);
let model = CommonNearestNeighbour::KdTree
.from_batch(train.records(), linfa_nn::distance::L2Dist)
.expect("NN index");
let predictions = predict(&*model, train.targets(), test.records(), K_NEIGHBORS);
let (_sensitivity, specificity) = evaluate(test.targets(), &predictions);
assert!(specificity > 0.5);
total += specificity;
}
total as f64 / iters as f64
})
});
}
criterion_group! {
name = bench_accuracy;
config = Criterion::default().with_measurement(MetricMeasurement);
targets = bench_sensitivity, bench_specificity
}
criterion_main!(bench_accuracy);
The whole repo can be found here: Files ยท feat/benches ยท CS50-AI / shopping ยท GitLab
Now the benches are able to run but are giving a much worse accuracy than repeatedly running the main function.
Now is this a good idea? have anyone tried to do anything similar? should i continue trying to make this work or is it a waste of time?
2 posts - 2 participants
๐ท๏ธ Rust_feed