Skip to content

Silero VAD

The Silero VAD plugin provides AI-powered voice activity detection for barge-in (interruption) detection. It uses the Silero VAD machine learning model running via ONNX Runtime WebAssembly to distinguish real speech from background noise, coughs, and other non-speech sounds.

Installation

Terminal window
pnpm add @ariontalk/plugin-silero-vad

The package exports SileroVadDetector and the SileroVadOptions type.

Usage with the widget

Register the Silero VAD plugin alongside other barge-in strategies by setting the bargeInPlugins property on the widget element:

<ariontalk-widget id="widget" settings></ariontalk-widget>
<script type="module">
import '@ariontalk/widget';
import { EnergyBargeInDetector } from '@ariontalk/core';
import { SileroVadDetector } from '@ariontalk/plugin-silero-vad';
document.querySelector('#widget').bargeInPlugins = [
{
id: 'energy',
label: 'Energy',
tooltip: 'Interrupt by speaking — uses mic energy detection',
create: () => new EnergyBargeInDetector(),
},
{
id: 'silero-vad',
label: 'Smart VAD',
tooltip: 'AI-powered speech detection — more accurate, fewer false triggers',
create: () => new SileroVadDetector({ onnxWASMBasePath: '/' }),
},
];
</script>

Configuration options

Pass a SileroVadOptions object to the constructor to customize detection behavior.

OptionTypeDefaultDescription
baseAssetPathstring'./'Base path where vad.worklet.bundle.min.js and silero_vad.onnx are served.
onnxWASMBasePathstring'./'Base path where ONNX Runtime WebAssembly files are served.
positiveSpeechThresholdnumber0.7Probability threshold for speech detection (0-1). Higher values reduce false positives but are slower to trigger.
negativeSpeechThresholdnumber0.55Probability below which speech is considered absent (0-1). Should be lower than positiveSpeechThreshold (Silero recommends ~0.15 less).
minSpeechMsnumber500Minimum sustained speech duration in milliseconds before triggering barge-in. Shorter sounds (coughs, thumps) are discarded.
redemptionMsnumber1400Grace period in milliseconds after speech drops below negativeSpeechThreshold before considering speech ended. Bridges brief pauses mid-sentence.

Example with custom options

create: () => new SileroVadDetector({
onnxWASMBasePath: '/assets/wasm/',
baseAssetPath: '/assets/vad/',
positiveSpeechThreshold: 0.8,
negativeSpeechThreshold: 0.6,
minSpeechMs: 400,
redemptionMs: 1000,
})

Asset serving

Silero VAD requires several files to be served from your web server:

  • ONNX Runtime WASM files — The WebAssembly runtime for running the ML model. Set their location with onnxWASMBasePath.
  • VAD worklet and model — The vad.worklet.bundle.min.js audio worklet and silero_vad.onnx model file. Set their location with baseAssetPath.

These files come from the @ricky0123/vad-web package (a dependency of the plugin). Copy them to a publicly served directory or configure your bundler to serve them from node_modules.

The model and WASM files are loaded lazily — they are only downloaded when the user starts a session with Smart VAD selected, not at page load.

Energy vs. Silero VAD comparison

AspectEnergy (built-in)Silero VAD
Detection methodRMS volume thresholdML-based speech probability
AccuracyGood — may trigger on loud non-speech soundsHigh — distinguishes speech from noise
False positivesMore likely (keyboard typing, music, etc.)Rare — trained to detect human speech
CPU usageMinimal (simple math on audio samples)Moderate (ONNX model inference via WASM)
Bundle sizeZero (included in @ariontalk/core)~2 MB (ONNX Runtime WASM + Silero model)
SetupNone — works out of the boxRequires serving WASM and ONNX assets
DependenciesNone@ricky0123/vad-web, onnxruntime-web
Latency to trigger~250ms of sustained volume~500ms of sustained speech (configurable)