Documentation Index
Fetch the complete documentation index at: https://daily-docs-pr-4424.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Soniox provides real-time text-to-speech synthesis using a WebSocket-based streaming API.SonioxTTSService streams text incrementally to the Soniox TTS endpoint and receives audio back as base64-encoded chunks. Multiple concurrent streams (up to 5) are multiplexed over a single WebSocket connection, making it efficient for interactive voice applications.
Soniox TTS API Reference
Pipecat’s API methods for Soniox TTS integration
Example Implementation
Complete example with Soniox STT and TTS
Soniox Documentation
Official Soniox TTS WebSocket API documentation
Supported Languages
Browse supported languages (60+)
Installation
To use Soniox TTS, install the required dependencies:Prerequisites
Soniox Account Setup
Before using Soniox TTS, you need:- Soniox Account: Sign up at Soniox Console
- API Key: Generate an API key from your console dashboard
- Voice Selection: Choose from available voices
Required Environment Variables
SONIOX_API_KEY: Your Soniox API key for authentication
Configuration
Soniox API key for authentication. Create API keys at
Soniox Console.
WebSocket endpoint URL for Soniox TTS.
Output sample rate in Hz. Must be one of
{8000, 16000, 24000, 44100, 48000}
when using a raw PCM audio format. When None, inherits from the pipeline’s
configured sample rate.Output audio format. Defaults to
"pcm_s16le", which matches Pipecat’s
downstream audio pipeline.Controls how incoming text is aggregated before synthesis.
SENTENCE
(default) buffers text until sentence boundaries, producing more natural
speech. TOKEN streams tokens directly for lower latency. Import from
pipecat.services.tts_service.Settings
Runtime-configurable settings passed via thesettings constructor argument using SonioxTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | tts-rt-v1-preview | TTS model identifier. (Inherited from base settings.) |
voice | str | Adrian | Voice identifier. (Inherited from base settings.) |
language | Language | str | Language.EN | Language for synthesis. (Inherited from base settings.) See supported languages. |
Usage
Basic Setup
With Custom Voice and Model
With Custom Sample Rate
Notes
- WebSocket streaming: Soniox uses a persistent WebSocket connection for streaming text-in and audio-out, enabling low-latency real-time synthesis.
- Concurrent streams: The service supports up to 5 concurrent streams multiplexed over a single WebSocket connection via Pipecat’s audio-context mechanism.
- Sample rates: When using raw PCM audio formats, the sample rate must be one of
{8000, 16000, 24000, 44100, 48000}. - Keepalive: The service automatically sends keepalive messages every 20 seconds to prevent Soniox’s idle timeout (20-30s).
- Text aggregation: Sentence aggregation is enabled by default (
text_aggregation_mode=TextAggregationMode.SENTENCE). Buffering until sentence boundaries produces more natural speech. Settext_aggregation_mode=TextAggregationMode.TOKENto stream tokens directly for lower latency. - Language support: Soniox supports 60+ languages. See the language documentation for the complete list.
Event Handlers
Soniox TTS supports the standard service connection events:| Event | Description |
|---|---|
on_connected | Connected to Soniox WebSocket |
on_disconnected | Disconnected from Soniox WebSocket |
on_connection_error | WebSocket connection error occurred |