🤖 24 AI
🟢 🤖 Models Wednesday, April 22, 2026 · 3 min read

xAI Speech-to-Text API exits beta: general availability for 25 languages

Editorial illustration: Microphone and audio wave streams converting into transcripts in 25 languages through the Grok API

Why it matters

xAI has announced that its Speech-to-Text (STT) API is moving from beta to general availability. The service supports 25 languages, offers batch and streaming modes, and is available without a waitlist — completing the voice stack alongside the previously GA-released Grok Voice Agent.

xAI Speech-to-Text API moves from beta to general availability

In April 2026, xAI announced in its release notes that its Speech-to-Text (STT) API is leaving beta and reaching general availability (GA). The service transcribes audio to text, supports 25 languages, and offers both operating modes — batch and streaming.

What exactly does the xAI STT API offer?

The key message from the documentation: “Transcribe audio to text in 25 languages with batch and streaming modes.” Batch mode is intended for processing entire audio files — meeting recordings, podcast episodes, interviews — where the entire file is sent to the API and the result is returned when transcription is complete.

Streaming mode, on the other hand, processes audio in real time. As the user speaks, partial transcripts are returned with low latency, which is essential for voice assistants, live subtitling, or dictation in applications.

Support for 25 languages puts xAI in competitive territory with OpenAI Whisper and Google Cloud Speech-to-Text, although the exact list of languages is not specified in the published release notes.

What does GA status mean for developers?

The move from beta to GA carries several practical consequences. First, the API is available without a waitlist — any xAI user with an API key can immediately start sending requests. Second, GA typically means more stable SLA guarantees and a lower probability of breaking changes in the API contract.

Third, GA signals that xAI is ready to support production workloads, which matters for developers building commercial voice products. Specific pricing per minute of audio processing is not detailed in the published release notes, so developers need to check the current pricing in the xAI console.

How does it fit with Grok and the Voice Agent?

The Grok Voice Agent API has been in GA since December 2025, meaning xAI has now closed the loop of a complete voice stack — STT for input (speech recognition), Grok LLM for reasoning, and the Voice Agent for output (speech synthesis and conversation management).

This integration means developers building voice products can use a single provider instead of mixing STT from one team (e.g., Whisper), an LLM from another, and TTS from a third. The advantage is unified latency, a unified SDK, and unified billing.

For xAI, this is strategically important because competing offerings like OpenAI’s Realtime API already offer an integrated voice stack. The STT GA closes the gap and makes xAI a serious option for production voice assistant development.

🤖

This article was generated using artificial intelligence from primary sources.