Release Overview
OpenClaw 2026.5.4 is all about interaction and efficiency. With a new realtime voice bridge for Google Meet/Twilio,audio support for Codex, and workspace-scoped caching, this release makes OpenClaw faster to respond and more pleasant to use.
Voice: The Gemini Realtime Bridge
The voice experience for telephony and meetings has been completely overhauled for speed and responsiveness.
- Realtime Audio: Twilio dial-ins now use the Gemini voice bridge with paced streaming.
- No More Lag: Backpressure-aware buffering and barge-in queue clearing ensure that the agent stops talking immediately when you interrupt.
- Snappy Interaction: By bypassing TwiML fallbacks during realtime speech, the delay between user input and agent response is drastically reduced.
Coding with Voice: Codex Audio
Coding agents just got more accessible:
- Transcription Support: Active Codex chat models are now automatically routed to the OpenAI transcription default.
- Metadata Awareness: Codex audio capabilities are now advertised in runtime and manifest metadata.
Performance: Workspace-Scoped Caching
Managing large plugin libraries is now much faster thanks to intelligent caching.
- Metadata Snapshots: Explicit agent refreshes and plugin discovery can now reuse existing snapshots.
- Cold Scan Avoidance: The system avoids repeated scans of the entire plugin directory on hot control-plane paths.
- Startup Gains: Gateway startup continues to benefit from deferred sidecar loading and bundled plugin fast-paths.
Control UI Refinements
Clean & Responsive Chat
The dashboard chat experience has been polished for better usability.
- Smart Bubbles: Consecutive duplicate heartbeat messages are now collapsed into a single bubble with a count.
- Better Filtering: A new agent-first filter in the chat session picker helps you find your conversations faster.
- Device Harmony: Chat controls and the composer are now fully responsive across phone, tablet, and desktop widths.
Windows & Networking
Windows users will notice improved reliability when connecting to the local gateway. The system now binds the default loopback listener strictly to 127.0.0.1, preventing IPv6 dual-stack issues from wedging localhost requests.
Upgrade Guide
Who should upgrade?
- Operators using OpenClaw via Twilio or Google Meet
- Developers using Codex who want voice-to-text support
- Power users with dozens of installed plugins
- Windows users experiencing 'localhost' connection issues
How to Upgrade
# Update to 2026.5.4
openclaw update
# Verify your gateway status
openclaw status --deepFor the complete list of technical changes and contributors, visit the official release page on GitHub.
FAQ
How much faster is the new voice bridge?
The new Gemini-powered voice bridge for Twilio dial-ins is significantly snappier. It uses paced audio streaming, backpressure-aware buffering, and barge-in queue clearing to eliminate the lag common in older TwiML fallback setups.
What is Codex Audio Transcription?
OpenClaw now routes active Codex chat models to the OpenAI audio transcription default. This means you can now use voice-to-text features even when working with Codex-based coding agents.
What does workspace-scoped caching do?
It allows agent refreshes and plugin discovery to reuse existing plugin metadata snapshots instead of performing slow 'cold' scans of the entire plugin directory. This significantly speeds up operations in environments with many plugins.
What's new in the Control UI chat?
The chat interface is now more responsive across all devices. It also features a new session picker filter and automatically collapses duplicate heartbeat messages into a single bubble to keep your transcript clean.
Need help from people who already use this stuff?
Build with OpenClaw
From realtime voice bridges to intelligent performance caching, OpenClaw 2026.5.4 is built for speed. Join our community to share your workflows and learn from other operators.