Bearer authentication of the form Bearer <token>, where token is your auth token.
Speech-to-text mode. ‘fast’ optimizes for latency, ‘accurate’ optimizes for transcription accuracy (~200ms additional latency).
Background ambience to mask synthetic silence between turns. ‘none’ disables; ‘office’, ‘coffee-shop’, ‘outdoor’ enable a quiet bed.
Audio denoising. ‘noise-cancellation’ (default) handles general noise. ‘noise-and-background-speech-cancellation’ is more aggressive for callers in cars, cafes, or near TVs ($0.005/min surcharge).
Hang up the call after this many milliseconds of caller silence. Range 10000 (10s) to 3600000 (1 hour). Default 600000 (10 min).
When true, the agent interjects short filler words like ‘uh-huh’ or ‘mhmm’ during longer caller utterances. Defaults to true.
BCP-47 locale that drives the agent’s speech recognition and pronunciation. Defaults to ‘en-US’.