Skip to main content

Documentation Index

Fetch the complete documentation index at: https://daily-docs-pr-4424.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Overview

xAI provides two text-to-speech services:
  • XAIHttpTTSService: Batch synthesis via HTTP API. Sends complete text and receives the full audio response.
  • XAITTSService: Streaming synthesis via WebSocket. Streams text incrementally and receives audio chunks as they’re synthesized, reducing latency.
Both support multiple languages and audio encoding formats.

xAI TTS API Reference

Complete API reference for all parameters and methods

WebSocket Example

Streaming WebSocket example with interruption handling

HTTP Example

Batch HTTP example

xAI Documentation

Official xAI voice API documentation

Installation

uv add "pipecat-ai[xai]"

Prerequisites

  1. xAI Account: Sign up at xAI
  2. API Key: Generate an API key from your account dashboard (also works with Grok API keys)
Set the following environment variable:
export GROK_API_KEY=your_api_key

Configuration

XAIHttpTTSService

api_key
str
required
xAI API key for authentication.
base_url
str
default:"https://api.x.ai/v1/tts"
xAI TTS endpoint URL. Override for custom or proxied deployments.
sample_rate
int
default:"None"
Output audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
encoding
str
default:"pcm"
Output audio encoding format. Supported formats: "pcm", "mp3", "wav", "mulaw", "alaw".
aiohttp_session
aiohttp.ClientSession
default:"None"
Optional shared aiohttp session for HTTP requests. If None, the service creates and manages its own session.
settings
XAIHttpTTSService.Settings
default:"None"
Runtime-configurable settings. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using XAIHttpTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneModel identifier. (Inherited from base settings.)
voicestr"eve"Voice identifier. (Inherited from base settings.)
languageLanguage | strLanguage.ENLanguage code. (Inherited from base settings.)

XAITTSService

api_key
str
required
xAI API key for authentication.
base_url
str
default:"wss://api.x.ai/v1/tts"
xAI TTS WebSocket endpoint URL. Override for custom or proxied deployments.
sample_rate
int
default:"None"
Output audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
codec
str
default:"pcm"
Output audio codec. Supported codecs: "pcm", "wav", "mulaw", "alaw". Defaults to "pcm" so emitted TTSAudioRawFrame objects need no decoding downstream.
settings
XAITTSService.Settings
default:"None"
Runtime-configurable settings. Uses the same settings structure as XAIHttpTTSService. Changing voice or language settings at runtime reconnects the WebSocket with new query parameters.

Supported Languages

xAI TTS supports 20 languages. Use the Language enum from pipecat.transcriptions.language:
  • Arabic (Egyptian, Saudi, UAE): Language.AR, Language.AR_EG, Language.AR_SA, Language.AR_AE
  • Bengali: Language.BN
  • Chinese: Language.ZH
  • English: Language.EN
  • French: Language.FR
  • German: Language.DE
  • Hindi: Language.HI
  • Indonesian: Language.ID
  • Italian: Language.IT
  • Japanese: Language.JA
  • Korean: Language.KO
  • Portuguese (Brazil, Portugal): Language.PT, Language.PT_BR, Language.PT_PT
  • Russian: Language.RU
  • Spanish (Spain, Mexico): Language.ES, Language.ES_ES, Language.ES_MX
  • Turkish: Language.TR
  • Vietnamese: Language.VI

Usage

WebSocket Streaming (XAITTSService)

Basic Setup

import os
from pipecat.services.xai.tts import XAITTSService

tts = XAITTSService(
    api_key=os.getenv("GROK_API_KEY"),
    settings=XAITTSService.Settings(
        voice="eve",
    ),
)

With Custom Language

from pipecat.transcriptions.language import Language

tts = XAITTSService(
    api_key=os.getenv("GROK_API_KEY"),
    settings=XAITTSService.Settings(
        voice="eve",
        language=Language.ES,
    ),
)

With Custom Sample Rate and Codec

tts = XAITTSService(
    api_key=os.getenv("GROK_API_KEY"),
    sample_rate=24000,
    codec="wav",
    settings=XAITTSService.Settings(
        voice="eve",
    ),
)

HTTP Batch (XAIHttpTTSService)

Basic Setup

import os
from pipecat.services.xai.tts import XAIHttpTTSService

tts = XAIHttpTTSService(
    api_key=os.getenv("GROK_API_KEY"),
    settings=XAIHttpTTSService.Settings(
        voice="eve",
    ),
)

With Custom Encoding

tts = XAIHttpTTSService(
    api_key=os.getenv("GROK_API_KEY"),
    encoding="mp3",
    settings=XAIHttpTTSService.Settings(
        voice="eve",
    ),
)

With Shared HTTP Session

import aiohttp

async with aiohttp.ClientSession() as session:
    tts = XAIHttpTTSService(
        api_key=os.getenv("GROK_API_KEY"),
        aiohttp_session=session,
        settings=XAIHttpTTSService.Settings(
            voice="eve",
        ),
    )

Updating Settings at Runtime

Voice settings can be changed mid-conversation using TTSUpdateSettingsFrame. This works for both services:
from pipecat.frames.frames import TTSUpdateSettingsFrame
from pipecat.services.xai.tts import XAITTSSettings
from pipecat.transcriptions.language import Language

await task.queue_frame(
    TTSUpdateSettingsFrame(
        delta=XAITTSSettings(
            language=Language.FR,
        )
    )
)
Note: For XAITTSService, changing voice or language settings reconnects the WebSocket with updated query parameters.

Notes

  • Service choice:
    • Use XAITTSService (WebSocket) for lower latency streaming synthesis where audio begins playing before the full utterance finishes.
    • Use XAIHttpTTSService (HTTP) for simpler batch synthesis or when WebSocket connections are not available.
  • Default audio format: Both services default to raw PCM output, which matches Pipecat’s downstream expectations without extra decoding.
  • Encoding/codec options: When using non-PCM formats (mp3, wav, mulaw, alaw), ensure your audio pipeline can handle the selected format.
  • Session management:
    • XAIHttpTTSService: If you don’t provide an aiohttp_session, the service creates and manages its own session lifecycle automatically.
    • XAITTSService: WebSocket connection is managed automatically; settings changes that affect URL parameters (voice, language) trigger a reconnection.