Speech-prep: a focused Rust crate for speech audio preprocessing

⚓ Rust 📅 2026-04-05 👤 surdeus 👁️ 7

Info

This post is auto-generated from RSS feed The Rust Programming Language Forum - Latest topics. Source: Speech-prep: a focused Rust crate for speech audio preprocessing

I just open-sourced speech-prep, a small Rust crate for the front end of speech pipelines: audio format detection, WAV decoding, preprocessing, chunking, and VAD.

https://crates.io/crates/speech-prep

This was extracted from a private codebase and cleaned up into a standalone crate with a tighter public API and a narrower scope. The goal is not to be a general audio toolbox. The goal is to make the common speech-prep path simple, predictable, and easy to integrate into Rust systems.

Current scope:

format detection for common speech inputs
WAV decoding
sample conversion / normalization
preprocessing utilities
voice activity detection
fixtures and examples for validation

A minimal example looks like this:

  use std::sync::Arc;
  use speech_prep::{NoopVadMetricsCollector, VadConfig, VadDetector, VadMetricsCollector};

  fn main() -> Result<(), speech_prep::Error> {
      let metrics: Arc<dyn VadMetricsCollector> = Arc::new(NoopVadMetricsCollector);
      let detector = VadDetector::new(VadConfig::default(), metrics)?;

      let samples = vec![0.0f32; 16_000];
      let _segments = detector.detect(&samples, 16_000)?;

      Ok(())
  }

A few constraints up front:

decoding is WAV-only today
the crate is intentionally focused on speech prep, not full media handling
I’d rather keep it small and solid than broaden it too early

If you work on ASR, alignment, pronunciation scoring, diarization, or other speech systems in Rust, I’d especially like feedback on API shape, naming, and missing primitives.

1 post - 1 participant

Read full topic

🏷️ Rust_feed

👍 󠁮󠁮󠁮󠁮 👎 󠁮󠁮󠁮󠁮