Orpheus SDK
Voice Activity Detection
Coming soon. Voice Activity Detection (VAD) is in development and will be available in a future release of Orpheus.
What it is
Voice Activity Detection identifies, in real time, whether the current audio chunk contains speech. It's the foundation for downstream behaviors like utterance segmentation, silence trimming, and barge-in.
Who it's for
Teams building real-time voice products (voice agents, conversational AI, transcription pipelines) where reliable speech-vs-silence decisions matter. VAD is what makes "the user finished talking" a measurable event instead of a guess.
What we'll publish here
When VAD ships, this page will cover:
- VAD output shape and how to consume it alongside the per-chunk enhancement output.
- Configuration options and how they interact with the noise-cancellation model.
- A complete example matching the format on the Examples page.
- Evaluation guidance. VAD is evaluated by per-frame precision/recall against speech-labeled ground truth, not by WER.
If your roadmap needs VAD before the public release, tell us during onboarding.