Google: Gemini 3.1 Flash-Lite enters general availability
Gemini 3.1 Flash-Lite has been generally available through the Gemini API since May 7, 2026, as a stable production endpoint. The model is optimized for speed, scale, and cost efficiency, and the preview version will be discontinued on May 25, 2026.
This article was generated using artificial intelligence from primary sources.
Google announced on May 7, 2026, that the model Gemini 3.1 Flash-Lite has moved from preview to general availability (GA) through the Gemini API. The stable endpoint carries the identifier gemini-3.1-flash-lite and can be used in production workloads without the limitations of preview status.
What does the GA version bring?
According to the official changelog, the model is “optimized for speed, scale, and cost efficiency.” It is the lowest-cost endpoint within the Gemini 3.1 generation, aimed at scenarios where development teams send large volumes of requests and where per-call latency is a critical parameter. Typical usage profiles include classification, structured data extraction, lightweight chat applications, and preprocessing of large corpora.
Preview version ends in two weeks
Development teams that have been using gemini-3.1-flash-lite-preview during the preceding weeks must migrate to the stable identifier. Google states that the preview version “enters deprecation on May 11, 2026, and will be shut down on May 25, 2026.” Migration in practice means replacing a single string in the client configuration — model behavior should be consistent between the preview and GA version.
Positioning within the Gemini 3.1 family
Flash-Lite fills the lower end of the pricing ladder within the Gemini 3.1 family, below the standard Flash and Pro variants. GA status means Google assumes formal SLA commitments for API contract and model behavior stability, which is a prerequisite for inclusion in business agreements and regulated sectors.
What does this mean for development teams?
Teams that were waiting for GA before a more serious production rollout now have a stable contract to build on. Those already using the preview have fewer than three weeks until the endpoint is fully shut down and must update their configuration.
Frequently Asked Questions
- What is the exact model identifier?
- The stable endpoint is `gemini-3.1-flash-lite`, available through the standard Gemini API interface.
- How long does the preview version remain active?
- The preview endpoint `gemini-3.1-flash-lite-preview` enters deprecation on May 11, 2026, and is fully shut down on May 25, 2026.
- What is the model designed for?
- Google positions it as a cost-efficient production option for high-traffic applications where speed and per-unit cost are critical.
Sources
Related news
Allen Institute: EMO — MoE language model with natural semantic modularity from data
OpenAI: three new realtime voice models in the API with reasoning and translation
arXiv:2605.03195: Terminus-4B — 4 billion parameters for terminal execution matches Claude Opus and GPT-5.3-Codex on SWE-Bench Pro with ~30% fewer main agent tokens