In-context prompting outperforms LangGraph and CrewAI

In-context prompting is an architectural approach in which an entire procedural workflow is embedded directly in the system prompt instead of being orchestrated through a framework. An ArXiv study of 200 conversations per condition shows that this approach outperforms LangGraph, CrewAI, Google ADK, and OpenAI Agents SDK across three domains: travel booking, Zoom technical support, and insurance claims processing.

A team of Simon Dennis, Michael Diamond, Rivaan Patil, Kevin Shabahang, and Hao Guo published a study on ArXiv on April 30, 2026, with a provocative title: “In-Context Prompting Obsoletes Agent Orchestration for Procedural Tasks.” The claim they defend is that the advanced capabilities of today’s frontier models make external orchestration frameworks redundant for a significant class of procedural, multi-turn tasks.

What makes in-context prompting better than orchestration?

The in-context approach embeds the entire procedural workflow — the list of steps, branching conditions, output format, escalation — directly in the system prompt of a single model. Orchestration frameworks (LangGraph, CrewAI, Google ADK, OpenAI Agents SDK) split the same workflow into a graph of nodes and hand off model calls to separate coordination logic. The authors argue that frontier models now have sufficient “self-orchestration” capability to follow complex procedures within a single call, while graph-based context switching introduces losses that increase failure rates.

Three domains and concrete results

The experiment was conducted across three domains: travel booking, Zoom technical support, and insurance claims processing — each with 200 conversations per condition and LLM-as-judge scoring on five quality criteria. The in-context baseline achieved 4.53–5.00, while LangGraph as the closest orchestrator trailed at 4.17–4.84. Failure rates were even more distinctive: in-context at 11.5%, 0.5%, and 5% per domain versus orchestration at 24%, 9%, and 17%.

What this means for building agents

The study’s conclusion does not say that orchestration frameworks are universally redundant — they still have a role in tasks requiring parallel flows, external memory, or multiple independent agents. But for structured procedural tasks with clear steps, this paper suggests that architectural simplicity — one well-written system prompt — is more reliable than a graph of nodes. The implication for the 2026 agent stack is that the first step in agent design should be attempting to solve the problem through an in-context prompt before reaching for a framework.

Frequently Asked Questions

Which frameworks were compared?

LangGraph, CrewAI, Google ADK, and OpenAI Agents SDK were compared against an in-context baseline that embeds the workflow directly in the system prompt.

What is the range of results?

The in-context approach achieves 4.53–5.00 on a 1–5 scale, while orchestration frameworks remain in the range 4.17–4.84. Failure rate differences are even larger: 11.5/0.5/5% vs 24/9/17% per domain.

ArXiv study: in-context prompting outperforms LangGraph, CrewAI, Google ADK, and OpenAI Agents SDK on procedural tasks

What makes in-context prompting better than orchestration?

Three domains and concrete results

What this means for building agents

Frequently Asked Questions

Sources

Related news