Anthropic: Claude Opus 4.8 arrives with 1M token context and adaptive thinking
Claude Opus 4.8 is Anthropic's new most capable generally available model, launched on May 28, 2026. It brings a one-million (1M) token context window by default on the Claude API, Amazon Bedrock and Vertex AI, 128k maximum output tokens, and adaptive thinking that triggers reasoning only when needed.
This article was generated using artificial intelligence from primary sources.
On May 28, 2026, Anthropic launched Claude Opus 4.8 (model ID claude-opus-4-8), its most capable generally available (GA — General Availability) model. Alongside the model itself, a set of accompanying API features was released that expand control over the model’s behavior in production applications.
How much context and output does Claude Opus 4.8 offer?
Claude Opus 4.8 supports a 1 million (1M) token context window by default on the Claude API, Amazon Bedrock and Google Vertex AI. On Microsoft Foundry the limit is 200k tokens. The maximum number of output tokens is 128k. The model retains the same set of tools and platform features as the previous Claude Opus 4.7.
The context window denotes the total amount of text — both the query and the response — that the model can hold simultaneously in its “working memory” during a single conversation.
How does Claude Opus 4.8 use adaptive thinking?
With adaptive thinking enabled, Claude Opus 4.8 starts the reasoning process only when a specific turn requires it, reducing the number of wastefully spent thinking tokens compared to Claude Opus 4.7 at the same effort level. The effort parameter on Claude Opus 4.8 now defaults to high across all surfaces, including Claude Code and the Messages API.
This combines higher response quality (high effort) with lower token consumption, because the model does not spend reasoning capacity on simple tasks.
What new API features accompany the launch?
Anthropic simultaneously introduced mid-conversation system messages — messages with the role role: "system" that can be sent after a user turn within the messages array (subject to placement rules). They allow changing instructions during a long session without losing prompt cache hits and require no beta header.
The stop_details field on responses that are refusals is now publicly documented. It returns a category (cyber, bio or null) and a human-readable explanation, so an application can route different classes of refusals to the appropriate next step.
The minimum length of a prompt that can be cached for prompt caching has been lowered to 1,024 tokens, lower than on Claude Opus 4.7, extending the cost-effectiveness of caching to shorter queries. Prompt caching stores the already-processed part of a query so it does not have to be recomputed on every call, saving both time and cost.
How does it behave with sampling and constraints?
Setting the sampling parameters temperature, top_p or top_k to a value other than the default returns a 400 error on Claude Opus 4.8, the same as on Claude Opus 4.7. Anthropic points to the migration guide for details when moving from older versions.
In Claude Code, Max plan users now use fast mode by default on Claude Opus 4.8, and Auto mode for long-running tasks has been extended to more users. Workflows are available in Claude Code as a research preview, allowing the definition and execution of multi-step agentic plans.
What does this mean for development teams?
The combination of million-token context, adaptive thinking and finer control over refusals positions Claude Opus 4.8 as a model for demanding agentic and analytical workflows over large documents. The model also supports high-resolution image input (up to 2576 pixels on the longer edge), Task budgets, computer use and the advisor tool, the same as Claude Opus 4.7.
Frequently Asked Questions
- How large is the Claude Opus 4.8 context window?
- Claude Opus 4.8 supports a 1 million token context window by default on the Claude API, Amazon Bedrock and Google Vertex AI, while on Microsoft Foundry the limit is 200k tokens. The maximum number of output tokens is 128k.
- What is adaptive thinking in Claude Opus 4.8?
- Adaptive thinking triggers the reasoning process only when a specific request requires it, reducing the number of unnecessarily spent thinking tokens compared to Claude Opus 4.7 at the same effort level.
- What are mid-conversation system messages?
- These are messages with role: 'system' that can be sent after a user turn within the messages array. They allow changing instructions during a long session while preserving prompt cache hits, with no need for a beta header.
Sources
Related news
arXiv:2606.19808: SEVRA Saves up to 91 Percent of Tokens Through Selective Verification in Model Reasoning
arXiv:2606.20333: SoftSkill Compresses Skill Documents into 32 Latent Tokens and Boosts LiveMath by 42.1 Points
arXiv:2606.19327: Rubric-Conditioned Self-Distillation Outperforms GRPO in Reasoning Model Training