Speech-prep: a focused Rust crate for speech audio preprocessing
ā Rust š 2026-04-05 š¤ surdeus šļø 7I just open-sourced speech-prep, a small Rust crate for the front end of speech pipelines: audio format detection, WAV decoding, preprocessing, chunking, and VAD.
https://crates.io/crates/speech-prep
This was extracted from a private codebase and cleaned up into a standalone crate with a tighter public API and a narrower scope. The goal is not to be a general audio toolbox. The goal is to make the common speech-prep path simple, predictable, and easy to integrate into Rust systems.
Current scope:
- format detection for common speech inputs
- WAV decoding
- sample conversion / normalization
- preprocessing utilities
- voice activity detection
- fixtures and examples for validation
A minimal example looks like this:
use std::sync::Arc;
use speech_prep::{NoopVadMetricsCollector, VadConfig, VadDetector, VadMetricsCollector};
fn main() -> Result<(), speech_prep::Error> {
let metrics: Arc<dyn VadMetricsCollector> = Arc::new(NoopVadMetricsCollector);
let detector = VadDetector::new(VadConfig::default(), metrics)?;
let samples = vec![0.0f32; 16_000];
let _segments = detector.detect(&samples, 16_000)?;
Ok(())
}
A few constraints up front:
- decoding is WAV-only today
- the crate is intentionally focused on speech prep, not full media handling
- Iād rather keep it small and solid than broaden it too early
If you work on ASR, alignment, pronunciation scoring, diarization, or other speech systems in Rust, Iād especially like feedback on API shape, naming, and missing primitives.
1 post - 1 participant
š·ļø Rust_feed