Skip to content

@ariontalk/core

@ariontalk/core is the headless voice engine that powers ArionTalk. It provides speech recognition, speech synthesis, on-device AI (Gemini Nano), and page-context extraction with no UI dependency. Use it directly when you need full programmatic control or want to build a custom interface.

Installation

Terminal window
pnpm add @ariontalk/core

Types

VoiceEngineInterface

The contract every engine must implement. Both the built-in VoiceEngine (local) and GeminiEngine (cloud) conform to this interface.

interface VoiceEngineInterface {
// Lifecycle
startSession(lang: string): Promise<void>;
endSession(): void;
destroy(): void;
// Runtime
switchLanguage(lang: string): void;
setMuted(muted: boolean): void;
// Voice settings
applyVoiceSettings(settings: VoiceSettings): void;
getVoiceOverrides(): VoiceSettings | null;
getVoices(): VoiceInfo[];
// State
readonly state: VoiceEngineState;
onStateChange: ((state: VoiceEngineState) => void) | null;
// Capabilities (engines declare what they support)
readonly capabilities: EngineCapabilities;
}

VoiceEngineState

Reactive state snapshot emitted via onStateChange.

interface VoiceEngineState {
status: EngineStatus;
currentLang: SupportedLang;
elapsedSeconds: number;
interimTranscript: string;
error: string | null;
downloadProgress: number;
}

EngineStatus

Union of all possible engine statuses.

type EngineStatus = 'idle' | 'loading' | 'listening' | 'thinking' | 'speaking' | 'error';

EngineCapabilities

Declares what an engine supports so the UI can adapt.

interface EngineCapabilities {
supportedLanguages: readonly string[];
supportsVoiceSelection: boolean;
supportsRatePitchVolume: boolean;
supportsBargeInPlugins: boolean;
supportsOffline: boolean;
maxSessionDurationSec: number | null; // null = unlimited
requiresTokenServer: boolean;
}

VoiceSettings

User-configurable voice parameters persisted across sessions.

interface VoiceSettings {
voiceId: string | null;
rate: number;
pitch: number;
volume: number;
}

VoiceInfo

Describes a single voice available for synthesis.

interface VoiceInfo {
id: string; // Unique identifier (voiceURI for local, name for Gemini)
name: string; // Display name
lang: string; // BCP-47 language code
local: boolean; // True for on-device voices
}

SupportedLang / WellKnownLang

Language code types. WellKnownLang enumerates known codes; SupportedLang also accepts arbitrary BCP-47 strings.

type WellKnownLang = 'en' | 'es' | 'ja' | 'fr' | 'de' | 'pt'
| 'it' | 'zh' | 'ko' | 'hi' | 'ar' | 'ru';
type SupportedLang = WellKnownLang | (string & {});

BargeInDetector

Plugin interface for barge-in (interruption) detection strategies.

interface BargeInDetector {
/** Acquire resources (mic stream, models, etc.). Called once per session. */
init(): Promise<void>;
/** Start monitoring. Call onBargeIn once when interruption is detected. */
startMonitoring(onBargeIn: () => void): void;
/** Stop monitoring without releasing resources. */
stopMonitoring(): void;
/** Release all resources. Called when the engine is destroyed. */
destroy(): void;
}

ImageContext

Metadata for images extracted from the host page and sent to the AI model.

interface ImageContext {
blob: Blob;
alt: string;
src: string;
}

Classes

VoiceEngine

The default, browser-native engine. Uses the Web Speech API for recognition and synthesis and the on-device Prompt API (Gemini Nano) for AI responses.

import { VoiceEngine } from '@ariontalk/core';
const engine = new VoiceEngine({ bargeInDetector });

Constructor options:

interface VoiceEngineOptions {
bargeInDetector?: BargeInDetector;
}

Implements VoiceEngineInterface.

Capabilities (default values):

CapabilityValue
supportedLanguages['en', 'es']
supportsVoiceSelectiontrue
supportsRatePitchVolumetrue
supportsBargeInPluginstrue
supportsOfflinetrue
maxSessionDurationSecnull (unlimited)
requiresTokenServerfalse

Services

These lower-level services are exported for advanced consumers who want to compose their own engine or extend individual capabilities.

ServiceDescription
SpeechRecognitionServiceWraps the Web Speech Recognition API with start/stop/pause/resume and interim/final result callbacks.
SpeechSynthesisServiceQueues utterances via the Web Speech Synthesis API with per-utterance voice, rate, pitch, and volume overrides.
AISessionServiceManages the on-device Prompt API (Gemini Nano) session — availability check, model download progress, streaming prompt responses.
PageExtractorServiceExtracts visible text and image context from the host page for use as AI grounding context.
EnergyBargeInDetectorA BargeInDetector implementation that uses microphone audio energy levels to detect when the user speaks during AI playback.

Utilities

isVoiceChatSupported(engine?)

Async function that checks whether all required browser APIs are available.

import { isVoiceChatSupported } from '@ariontalk/core';
const ok = await isVoiceChatSupported(); // checks local engine APIs
const ok = await isVoiceChatSupported('gemini'); // checks Gemini engine APIs

Checks performed:

EngineRequired APIs
'local' (default)SpeechRecognition, speechSynthesis, LanguageModel (Prompt API) with model availability
'gemini'navigator.mediaDevices.getUserMedia, AudioContext

Returns true if all required APIs are present and functional, false otherwise.

SessionTimer

Tracks elapsed session time in seconds. Accepts a callback that fires every second with the current count. Provides start(), reset(), and the static SessionTimer.format(seconds) helper to produce MM:SS display strings.

createLogger(scope) / setLogLevel(level) / LogLevel

Scoped, level-gated console logging.

import { createLogger, setLogLevel, LogLevel } from '@ariontalk/core';
setLogLevel(LogLevel.Debug); // Enable all logs globally
const log = createLogger('my-mod'); // Prefixed as [ariontalk:my-mod]
log.debug('verbose info');
log.info('general info');
log.warn('warning');
log.error('error');

LogLevel enum:

enum LogLevel {
Disabled = 'disabled',
Errors = 'error',
Warnings = 'warning',
Info = 'info',
Debug = 'debug',
}

setLogLevel updates every logger created via createLogger, both past and future.