User Input Muting

Overview

In conversational applications, there are moments when you don’t want to process user speech, such as during bot introductions or while executing function calls. Pipecat’s user mute strategies let you selectively “mute” user input based on different conversation states.

When to Use Mute Strategies

Common scenarios for muting user input include:

During introductions: Prevent the bot from being interrupted during its initial greeting
While processing functions: Block input while the bot is retrieving external data
During bot speech: Reduce false transcriptions while the bot is speaking
For guided conversations: Create more structured interactions with clear turn-taking

How It Works

User mute strategies work by blocking specific user-related frames from flowing through your pipeline. When muted, the following frames are filtered:

Voice activity detection (VAD) events
Interruption signals
Raw audio input frames
Transcription frames (both interim and final)

This prevents user speech from being processed during muted periods.

Mute strategies are configured on the LLMUserAggregator via the user_mute_strategies parameter.

Mute Strategies

Pipecat provides several built-in strategies for determining when to mute user input:

FirstSpeechUserMuteStrategy

Mute only during the bot’s first speech utterance. Useful for introductions when you want the bot to complete its greeting before the user can speak.

MuteUntilFirstBotCompleteUserMuteStrategy

Start muted and remain muted until the first bot utterance completes. Ensures the bot’s initial instructions are fully delivered.

FunctionCallUserMuteStrategy

Mute during function calls. Prevents users from speaking while the bot is processing external data requests.

AlwaysUserMuteStrategy

Mute whenever the bot is speaking. Creates a strict turn-taking conversation pattern.

The FirstSpeechUserMuteStrategy and MuteUntilFirstBotCompleteUserMuteStrategy strategies should not be used together as they handle the first bot speech differently.

Basic Implementation

Import and configure the mute strategies you need:

from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
)
from pipecat.turns.user_mute import AlwaysUserMuteStrategy

# Configure with one or more strategies
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    user_params=LLMUserAggregatorParams(
        user_mute_strategies=[AlwaysUserMuteStrategy()],
    ),
)

Combining Multiple Strategies

Multiple strategies can be combined. They use OR logic—if any strategy indicates the user should be muted, input is suppressed:

from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
)

from pipecat.turns.user_mute import (
    MuteUntilFirstBotCompleteUserMuteStrategy,
    FunctionCallUserMuteStrategy,
)

user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    user_params=LLMUserAggregatorParams(
        user_mute_strategies=[
            MuteUntilFirstBotCompleteUserMuteStrategy(),  # Mute until first response
            FunctionCallUserMuteStrategy(),               # Mute during function calls
        ],
    ),
)

Building Custom Strategies

Subclass BaseUserMuteStrategy (in pipecat.turns.user_mute) when none of the built-in strategies fit. A strategy only needs to answer one question per frame: should the user be muted right now? Override process_frame(self, frame: Frame) -> bool to update internal state and return the current mute decision.

Which frames reach a strategy

Each strategy’s process_frame is called for every frame that passes through the user aggregator, except StartFrame, EndFrame, and CancelFrame. This includes:

User-direction frames from the input transport and STT: TranscriptionFrame, InterimTranscriptionFrame, UserStartedSpeakingFrame, UserStoppedSpeakingFrame, VADUserStartedSpeakingFrame, VADUserStoppedSpeakingFrame, InputAudioRawFrame, InterruptionFrame
Bot and function-calling lifecycle frames from elsewhere in the pipeline: BotStartedSpeakingFrame, BotStoppedSpeakingFrame, FunctionCallsStartedFrame, FunctionCallResultFrame, FunctionCallCancelFrame

Frames that don’t naturally reach the user aggregator (for example LLMTextFrame or TTSTextFrame, which flow downstream from the LLM or TTS) won’t be seen by a strategy directly. To react to those signals, place a companion FrameProcessor where the frames do flow and have it toggle state on your strategy. See Toggling a strategy at runtime below.

Which frames get suppressed when muted

Returning True from your strategy sets the aggregator’s mute state. While muted, only these frame types are actually dropped:

InterruptionFrame
VADUserStartedSpeakingFrame, VADUserStoppedSpeakingFrame
UserStartedSpeakingFrame, UserStoppedSpeakingFrame
InputAudioRawFrame
InterimTranscriptionFrame, TranscriptionFrame

All other frames continue to flow so the rest of the pipeline keeps functioning.

Toggling a strategy at runtime

Strategies are plain Python objects. Anything that holds a reference to one can flip its state between frames, which means a companion processor placed elsewhere in the pipeline can drive the mute decision based on signals the strategy can’t observe directly (LLM text, tool results, external events). This example strategy adds its own enable/disable methods (not part of the base contract) and returns their state from process_frame:

from pipecat.frames.frames import Frame
from pipecat.turns.user_mute import BaseUserMuteStrategy


class ToggleableUserMuteStrategy(BaseUserMuteStrategy):
    def __init__(self):
        super().__init__()
        self._muted = False

    def enable(self):
        self._muted = True

    def disable(self):
        self._muted = False

    async def process_frame(self, frame: Frame) -> bool:
        await super().process_frame(frame)
        return self._muted

A companion processor watches for the trigger and toggles the strategy:

from pipecat.frames.frames import (
    BotStartedSpeakingFrame,
    BotStoppedSpeakingFrame,
    Frame,
    LLMTextFrame,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor


class DisclaimerGuardProcessor(FrameProcessor):
    def __init__(self, strategy: ToggleableUserMuteStrategy, trigger_phrase: str, **kwargs):
        super().__init__(**kwargs)
        self._strategy = strategy
        self._trigger = trigger_phrase
        # Keep a small sliding window so cross-frame matches work without
        # the buffer growing unbounded if the trigger never appears.
        self._max_buffer = max(len(trigger_phrase) * 4, 512)
        self._buffer = ""
        self._active = False

    async def process_frame(self, frame: Frame, direction: FrameDirection):
        await super().process_frame(frame, direction)

        if isinstance(frame, BotStartedSpeakingFrame):
            # Start each bot turn with a fresh buffer.
            self._buffer = ""
        elif isinstance(frame, LLMTextFrame) and direction == FrameDirection.DOWNSTREAM:
            self._buffer = (self._buffer + frame.text)[-self._max_buffer :]
            if not self._active and self._trigger in self._buffer:
                self._active = True
                self._strategy.enable()
        elif isinstance(frame, BotStoppedSpeakingFrame) and self._active:
            self._active = False
            self._buffer = ""
            self._strategy.disable()

        await self.push_frame(frame, direction)

Wire them together by passing the same strategy instance to both the aggregator and the processor:

mute_strategy = ToggleableUserMuteStrategy()

user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    user_params=LLMUserAggregatorParams(user_mute_strategies=[mute_strategy]),
)

disclaimer_guard = DisclaimerGuardProcessor(
    strategy=mute_strategy,
    trigger_phrase="Please read the following disclosure",
)

pipeline = Pipeline([
    transport.input(),
    stt,
    user_aggregator,
    llm,
    disclaimer_guard,   # positioned where LLMTextFrame flows downstream
    tts,
    transport.output(),
    assistant_aggregator,
])

Responding to Mute Events

You can register event handlers to be notified when muting starts or stops. This is particularly useful for providing visual feedback to users:

@user_aggregator.event_handler("on_user_mute_started")
async def on_user_mute_started(aggregator):
    logger.info("User mute started")
    # Send a visual indicator to your client
    # e.g., show a "Bot is speaking" indicator

@user_aggregator.event_handler("on_user_mute_stopped")
async def on_user_mute_stopped(aggregator):
    logger.info("User mute stopped")
    # Update your client UI
    # e.g., show a "You can speak now" indicator

These events fire whenever the mute state changes, allowing you to keep your UI synchronized with the bot’s state.

RTVI Events

When mute strategies activate or deactivate, the server automatically sends RTVI messages (user-mute-started and user-mute-stopped) to the client. You can listen for these in the JavaScript client to update your UI:

The client should continue sending audio normally during mute. These events are purely informational — muting happens server-side.

import { PipecatClient, RTVIEvent } from "@pipecat-ai/client-js";

const pcClient = new PipecatClient({
  callbacks: {
    onUserMuteStarted: () => {
      // Show a visual indicator that the bot is not listening
      // e.g., disable a microphone button or show "Bot is speaking..."
    },
    onUserMuteStopped: () => {
      // Remove the indicator, show the user they can speak
    },
  },
});

// Or using event listeners
pcClient.on(RTVIEvent.UserMuteStarted, () => {
  console.log("Server is ignoring user audio");
});
pcClient.on(RTVIEvent.UserMuteStopped, () => {
  console.log("Server is listening to user audio again");
});

Best Practices

Choose strategies wisely: Select the minimal set of strategies needed for your use case
Test user experience: Excessive muting can frustrate users; balance control with usability
Provide feedback: Use the mute event handlers to show visual cues when the user is muted to improve the experience

Next Steps

User Mute Strategies Reference

Read the complete API reference documentation for all available mute strategies and their behavior.

User Turn Strategies

Learn how to configure turn detection behavior for more control over conversation flow.

Experiment with different muting strategies to find the right balance for your application.

Get Started

Learning Pipecat

Fundamentals

Features

Migration

Telephony

Deployment

Examples & Recipes

User Input Muting

Overview

When to Use Mute Strategies

How It Works

Mute Strategies

FirstSpeechUserMuteStrategy

MuteUntilFirstBotCompleteUserMuteStrategy

FunctionCallUserMuteStrategy

AlwaysUserMuteStrategy

Basic Implementation

Combining Multiple Strategies

Building Custom Strategies

Which frames reach a strategy

Which frames get suppressed when muted

Toggling a strategy at runtime

Responding to Mute Events

RTVI Events

Best Practices

Next Steps

User Mute Strategies Reference

User Turn Strategies

Get Started

Learning Pipecat

Fundamentals

Features

Migration

Telephony

Deployment

Examples & Recipes

Documentation Index

​Overview

​When to Use Mute Strategies

​How It Works

​Mute Strategies

FirstSpeechUserMuteStrategy

MuteUntilFirstBotCompleteUserMuteStrategy

FunctionCallUserMuteStrategy

AlwaysUserMuteStrategy

​Basic Implementation

​Combining Multiple Strategies

​Building Custom Strategies

​Which frames reach a strategy

​Which frames get suppressed when muted

​Toggling a strategy at runtime

​Responding to Mute Events

​RTVI Events

​Best Practices

​Next Steps

User Mute Strategies Reference

User Turn Strategies

Overview

When to Use Mute Strategies

How It Works

Mute Strategies

Basic Implementation

Combining Multiple Strategies

Building Custom Strategies

Which frames reach a strategy

Which frames get suppressed when muted

Toggling a strategy at runtime

Responding to Mute Events

RTVI Events

Best Practices

Next Steps