Soniox

Overview

Soniox provides real-time text-to-speech synthesis using a WebSocket-based streaming API. SonioxTTSService streams text incrementally to the Soniox TTS endpoint and receives audio back as base64-encoded chunks. Multiple concurrent streams (up to 5) are multiplexed over a single WebSocket connection, making it efficient for interactive voice applications.

Soniox TTS API Reference

Pipecat’s API methods for Soniox TTS integration

Example Implementation

Complete example with Soniox STT and TTS

Soniox Documentation

Official Soniox TTS WebSocket API documentation

Supported Languages

Browse supported languages (60+)

Installation

To use Soniox TTS, install the required dependencies:

uv add "pipecat-ai[soniox]"

Prerequisites

Soniox Account Setup

Before using Soniox TTS, you need:

Soniox Account: Sign up at Soniox Console
API Key: Generate an API key from your console dashboard
Voice Selection: Choose from available voices

Required Environment Variables

SONIOX_API_KEY: Your Soniox API key for authentication

Configuration

api_key

str

required

Soniox API key for authentication. Create API keys at Soniox Console.

url

str

default:"wss://tts-rt.soniox.com/tts-websocket"

WebSocket endpoint URL for Soniox TTS.

sample_rate

int

default:"None"

Output sample rate in Hz. Must be one of {8000, 16000, 24000, 44100, 48000} when using a raw PCM audio format. When None, inherits from the pipeline’s configured sample rate.

audio_format

str

default:"pcm_s16le"

Output audio format. Defaults to "pcm_s16le", which matches Pipecat’s downstream audio pipeline.

text_aggregation_mode

TextAggregationMode

default:"TextAggregationMode.SENTENCE"

Controls how incoming text is aggregated before synthesis. SENTENCE (default) buffers text until sentence boundaries, producing more natural speech. TOKEN streams tokens directly for lower latency. Import from pipecat.services.tts_service.

settings

SonioxTTSService.Settings

default:"None"

Runtime-configurable settings. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using SonioxTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`tts-rt-v1-preview`	TTS model identifier. (Inherited from base settings.)
`voice`	`str`	`Adrian`	Voice identifier. (Inherited from base settings.)
`language`	`Language \| str`	`Language.EN`	Language for synthesis. (Inherited from base settings.) See supported languages.

Usage

Basic Setup

import os
from pipecat.services.soniox.tts import SonioxTTSService

tts = SonioxTTSService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxTTSService.Settings(
        voice="Maya",
    ),
)

With Custom Voice and Model

tts = SonioxTTSService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxTTSService.Settings(
        model="tts-rt-v1-preview",
        voice="Adrian",
        language="en",
    ),
)

With Custom Sample Rate

tts = SonioxTTSService(
    api_key=os.getenv("SONIOX_API_KEY"),
    sample_rate=16000,
    settings=SonioxTTSService.Settings(
        voice="Maya",
    ),
)

Notes

WebSocket streaming: Soniox uses a persistent WebSocket connection for streaming text-in and audio-out, enabling low-latency real-time synthesis.
Concurrent streams: The service supports up to 5 concurrent streams multiplexed over a single WebSocket connection via Pipecat’s audio-context mechanism.
Sample rates: When using raw PCM audio formats, the sample rate must be one of {8000, 16000, 24000, 44100, 48000}.
Keepalive: The service automatically sends keepalive messages every 20 seconds to prevent Soniox’s idle timeout (20-30s).
Text aggregation: Sentence aggregation is enabled by default (text_aggregation_mode=TextAggregationMode.SENTENCE). Buffering until sentence boundaries produces more natural speech. Set text_aggregation_mode=TextAggregationMode.TOKEN to stream tokens directly for lower latency.
Language support: Soniox supports 60+ languages. See the language documentation for the complete list.

Event Handlers

Soniox TTS supports the standard service connection events:

Event	Description
`on_connected`	Connected to Soniox WebSocket
`on_disconnected`	Disconnected from Soniox WebSocket
`on_connection_error`	WebSocket connection error occurred

@tts.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Soniox TTS")

Pipecat Server

Pipecat Subagents

Client SDKs

Pipecat Flows

Pipecat Cloud

CLI

Overview

Soniox TTS API Reference

Example Implementation

Soniox Documentation

Supported Languages

Installation

Prerequisites

Soniox Account Setup

Required Environment Variables

Configuration

Settings

Usage

Basic Setup

With Custom Voice and Model

With Custom Sample Rate

Notes

Event Handlers

Pipecat Server

Pipecat Subagents

Client SDKs

Pipecat Flows

Pipecat Cloud

CLI

Documentation Index

​Overview

Soniox TTS API Reference

Example Implementation

Soniox Documentation

Supported Languages

​Installation

​Prerequisites

​Soniox Account Setup

​Required Environment Variables

​Configuration

​Settings

​Usage

​Basic Setup

​With Custom Voice and Model

​With Custom Sample Rate

​Notes

​Event Handlers

Overview

Installation

Prerequisites

Soniox Account Setup

Required Environment Variables

Configuration

Settings

Usage

Basic Setup

With Custom Voice and Model

With Custom Sample Rate

Notes

Event Handlers