Documentation Index
Fetch the complete documentation index at: https://daily-docs-pr-4424.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
MistralSTTService provides real-time speech recognition using Mistral’s Voxtral Realtime API. It uses the Mistral SDK’s RealtimeConnection to stream audio and receive transcription events over WebSocket.
Key features include:
- Streaming transcription with interim results
- Automatic language detection
- VAD-driven utterance lifecycle management
- Built-in metrics support
Mistral STT API Reference
Pipecat’s API methods for Mistral STT
Example Implementation
Complete example with Mistral STT and TTS
Transcription Example
Transcription-only example
Mistral Documentation
Official Mistral API documentation
Installation
To use Mistral STT service, install the required dependencies:Prerequisites
Before usingMistralSTTService, you need:
- Mistral Account: Sign up at Mistral AI
- API Key: Generate an API key from your account dashboard
- Model Access: Ensure you have access to the Voxtral Realtime API
Required Environment Variables
MISTRAL_API_KEY: Your Mistral API key for authentication
Configuration
Mistral API key for authentication.
Custom API endpoint URL. Leave empty for the default Mistral endpoint.
Audio sample rate in Hz. When
None, uses the pipeline’s configured sample
rate.Streaming delay for accuracy/latency tradeoff. Higher values may improve
accuracy at the cost of latency.
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
Runtime-configurable settings for the STT service. See Settings
below.
Settings
Runtime-configurable settings passed via thesettings constructor argument using MistralSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | "voxtral-mini-transcribe-realtime-2602" | Mistral STT model to use. (Inherited from base STT settings.) |
language | Language | str | None | Language hint for transcription. (Inherited from base STT settings.) |
Usage
Basic Setup
With Custom Settings
Notes
- SDK-managed WebSocket: The service extends
STTServicedirectly (rather thanWebsocketSTTService) because the Mistral SDK manages the WebSocket connection internally. - Language detection: When
languageis not specified in settings, the service automatically detects the spoken language and includes it in the transcription frames. - VAD integration: The service works with VAD (Voice Activity Detection) to manage utterance lifecycle. When the user starts speaking, it begins accumulating interim transcripts. When the user stops, it flushes remaining audio for final transcription.
Event Handlers
Supports the standard service connection events:| Event | Description |
|---|---|
on_connected | Transcription session created |
on_disconnected | Connection closed |
on_connection_error | Transcription error occurred |