Overview

Hermes is Hecttor's speech-enhancement SDK for human listeners. Use it anywhere a person is on the receiving end: voice and video calls, voice interfaces, hearing-assist features, recorded audio cleanup.

Internally, Hermes is exposed as HumanSpeechEnhancer across all SDK language bindings.

What Hermes does

Hermes combines three independent functionalities. Each can be enabled on its own or together with the others. Omit a feature and that step is skipped, including the model load when applicable.

Noise Cancellation: AI-powered denoising that removes background noise while preserving natural speech quality. Two models are available, tuned for different input audio quality scenarios.
Speech Speed Adjustment: phoneme-aware speed control that compresses silence and slows down speech for improved intelligibility. Useful for fast or accented speakers and for voice interfaces where listener comprehension matters.
Voice Boost: additional voice clarity enhancement, useful in noisy environments or for quiet or distant speakers.

Each functionality is optional, and the configuration you pass at initialization determines which ones run. Anything you leave out is skipped: its model isn't loaded, initialization is faster, and the runtime memory footprint is smaller.

Noise Cancellation models

Hermes supports two noise cancellation model families. They use different architectures suited to different input audio conditions, so choose the one that best matches your source material. Available models, default weights, and supported chunk sizes are shared during onboarding.

A blend parameter is available to mix original and enhanced audio: 0.0 is fully original (pass-through) and 1.0 is fully enhanced. Omit it to use the model's default.

Speech Speed Adjustment

Speech Speed Adjustment analyzes speech content in each audio chunk and adjusts playback speed, compressing silence and slowing down speech for improved intelligibility. It requires a 20 ms chunk size, and when enabled, the per-chunk processing call returns variable-length output. Your playback pipeline must handle buffers of varying size (typically via a ring buffer or jitter buffer).

Two parameters control behavior:

A speed intensity multiplier (roughly 0.7–1.5, recommended around 1.0): base multiplier applied to slowdown speeds. Lower values deepen the slowdown the algorithm asks for; higher values pull speed back toward 1.0.
A slowdown capacity in milliseconds (roughly 100–5000, recommended 400–1000): how much flexibility the algorithm has when adapting fast speech. This is an algorithmic parameter and should not be interpreted as direct playback delay; runtime behavior depends on the input audio and speech patterns.

If your playback pipeline has its own latency tracking, the SDK exposes increment- and decrement-delay calls so you can keep the speed controller in sync with the external buffer.

Voice Boost

Voice Boost applies an additional voice clarity enhancement on top of (or independently of) noise cancellation. It's a boolean flag on the Hermes configuration and is off by default; opt in when you want it.

Voice Boost can be combined with Noise Cancellation, Speech Speed Adjustment, or used on its own. When used alone (without a noise-cancellation configuration), the denoising model isn't loaded.

Where to go next

Getting Started: install the SDK and run your first enhancement.
Examples: common configurations for full and partial functionality.
Bandwidth Extension: coming soon.