Voice Activity Detection

Coming soon. Voice Activity Detection (VAD) is in development and will be available in a future release of Orpheus.

What it is

Voice Activity Detection identifies, in real time, whether the current audio chunk contains speech. It's the foundation for downstream behaviors like utterance segmentation, silence trimming, and barge-in.

Who it's for

Teams building real-time voice products (voice agents, conversational AI, transcription pipelines) where reliable speech-vs-silence decisions matter. VAD is what makes "the user finished talking" a measurable event instead of a guess.

What we'll publish here

When VAD ships, this page will cover:

VAD output shape and how to consume it alongside the per-chunk enhancement output.
Configuration options and how they interact with the noise-cancellation model.
A complete example matching the format on the Examples page.
Evaluation guidance. VAD is evaluated by per-frame precision/recall against speech-labeled ground truth, not by WER.

If your roadmap needs VAD before the public release, tell us during onboarding.

Voice Activity Detection

What it is

Who it's for

What we'll publish here

On this page