Speech-prep: a focused Rust crate for speech audio preprocessing

āš“ Rust    šŸ“… 2026-04-05    šŸ‘¤ surdeus    šŸ‘ļø 7      

surdeus

I just open-sourced speech-prep, a small Rust crate for the front end of speech pipelines: audio format detection, WAV decoding, preprocessing, chunking, and VAD.

https://crates.io/crates/speech-prep

This was extracted from a private codebase and cleaned up into a standalone crate with a tighter public API and a narrower scope. The goal is not to be a general audio toolbox. The goal is to make the common speech-prep path simple, predictable, and easy to integrate into Rust systems.

Current scope:

  • format detection for common speech inputs
  • WAV decoding
  • sample conversion / normalization
  • preprocessing utilities
  • voice activity detection
  • fixtures and examples for validation

A minimal example looks like this:

  use std::sync::Arc;
  use speech_prep::{NoopVadMetricsCollector, VadConfig, VadDetector, VadMetricsCollector};

  fn main() -> Result<(), speech_prep::Error> {
      let metrics: Arc<dyn VadMetricsCollector> = Arc::new(NoopVadMetricsCollector);
      let detector = VadDetector::new(VadConfig::default(), metrics)?;

      let samples = vec![0.0f32; 16_000];
      let _segments = detector.detect(&samples, 16_000)?;

      Ok(())
  }

A few constraints up front:

  • decoding is WAV-only today
  • the crate is intentionally focused on speech prep, not full media handling
  • I’d rather keep it small and solid than broaden it too early

If you work on ASR, alignment, pronunciation scoring, diarization, or other speech systems in Rust, I’d especially like feedback on API shape, naming, and missing primitives.

1 post - 1 participant

Read full topic

šŸ·ļø Rust_feed