Silero VAD
The Silero VAD plugin provides AI-powered voice activity detection for barge-in (interruption) detection. It uses the Silero VAD machine learning model running via ONNX Runtime WebAssembly to distinguish real speech from background noise, coughs, and other non-speech sounds.
Installation
pnpm add @ariontalk/plugin-silero-vadThe package exports SileroVadDetector and the SileroVadOptions type.
Usage with the widget
Register the Silero VAD plugin alongside other barge-in strategies by setting the bargeInPlugins property on the widget element:
<ariontalk-widget id="widget" settings></ariontalk-widget>
<script type="module"> import '@ariontalk/widget'; import { EnergyBargeInDetector } from '@ariontalk/core'; import { SileroVadDetector } from '@ariontalk/plugin-silero-vad';
document.querySelector('#widget').bargeInPlugins = [ { id: 'energy', label: 'Energy', tooltip: 'Interrupt by speaking — uses mic energy detection', create: () => new EnergyBargeInDetector(), }, { id: 'silero-vad', label: 'Smart VAD', tooltip: 'AI-powered speech detection — more accurate, fewer false triggers', create: () => new SileroVadDetector({ onnxWASMBasePath: '/' }), }, ];</script>Configuration options
Pass a SileroVadOptions object to the constructor to customize detection behavior.
| Option | Type | Default | Description |
|---|---|---|---|
baseAssetPath | string | './' | Base path where vad.worklet.bundle.min.js and silero_vad.onnx are served. |
onnxWASMBasePath | string | './' | Base path where ONNX Runtime WebAssembly files are served. |
positiveSpeechThreshold | number | 0.7 | Probability threshold for speech detection (0-1). Higher values reduce false positives but are slower to trigger. |
negativeSpeechThreshold | number | 0.55 | Probability below which speech is considered absent (0-1). Should be lower than positiveSpeechThreshold (Silero recommends ~0.15 less). |
minSpeechMs | number | 500 | Minimum sustained speech duration in milliseconds before triggering barge-in. Shorter sounds (coughs, thumps) are discarded. |
redemptionMs | number | 1400 | Grace period in milliseconds after speech drops below negativeSpeechThreshold before considering speech ended. Bridges brief pauses mid-sentence. |
Example with custom options
create: () => new SileroVadDetector({ onnxWASMBasePath: '/assets/wasm/', baseAssetPath: '/assets/vad/', positiveSpeechThreshold: 0.8, negativeSpeechThreshold: 0.6, minSpeechMs: 400, redemptionMs: 1000,})Asset serving
Silero VAD requires several files to be served from your web server:
- ONNX Runtime WASM files — The WebAssembly runtime for running the ML model. Set their location with
onnxWASMBasePath. - VAD worklet and model — The
vad.worklet.bundle.min.jsaudio worklet andsilero_vad.onnxmodel file. Set their location withbaseAssetPath.
These files come from the @ricky0123/vad-web package (a dependency of the plugin). Copy them to a publicly served directory or configure your bundler to serve them from node_modules.
The model and WASM files are loaded lazily — they are only downloaded when the user starts a session with Smart VAD selected, not at page load.
Energy vs. Silero VAD comparison
| Aspect | Energy (built-in) | Silero VAD |
|---|---|---|
| Detection method | RMS volume threshold | ML-based speech probability |
| Accuracy | Good — may trigger on loud non-speech sounds | High — distinguishes speech from noise |
| False positives | More likely (keyboard typing, music, etc.) | Rare — trained to detect human speech |
| CPU usage | Minimal (simple math on audio samples) | Moderate (ONNX model inference via WASM) |
| Bundle size | Zero (included in @ariontalk/core) | ~2 MB (ONNX Runtime WASM + Silero model) |
| Setup | None — works out of the box | Requires serving WASM and ONNX assets |
| Dependencies | None | @ricky0123/vad-web, onnxruntime-web |
| Latency to trigger | ~250ms of sustained volume | ~500ms of sustained speech (configurable) |