LangChain: prompt caching for faster agents

LangChain introduced prompt caching in the Deep Agents framework — a technique for reusing previously computed context between agent steps — with the goal of reducing latency and costs for multi-step agents.

What is prompt caching and why do agents without it waste too much?

Prompt caching is a technique by which an LLM system stores intermediate processing results for a long system prompt or context window — so that each subsequent call in the loop skips reprocessing already-seen content. Without caching, every agent step sends the entire context (tools, history, instructions) anew, multiplying both latency and cost with each iteration.

How does Deep Agents apply caching?

LangChain described an approach for its own Deep Agents framework in which the shared portion of the context — tool definitions, system instructions, the initial step — is set once and shared across all calls within the same agent session. Author Alex Olsen notes that this optimizes context reuse between agent steps and that the benefit is most pronounced for agents with long loops and stable system prompts.

Comparison with the uncached approach

Without caching, every agent step bears the full cost of an LLM call — in multi-step flows this means linearly growing costs and latency. With caching, the cost of new tokens falls only on the delta — the changed part of the context. Exact percentage savings for Deep Agents have not been publicly released, but comparable systems (e.g., the AWS/Stripe production implementation) report up to 60% lower consumption using the same type of technique.

The LangChain blog post is aimed at development teams building multi-step agents and looking for ways to reduce operational costs without sacrificing output quality.

Frequently Asked Questions

What is prompt caching and why does it matter for agents?

Prompt caching is a technique that stores computed intermediate results of a long context window so that each subsequent agent step can skip reprocessing the same content — reducing latency and cost per step.

Which agents does this technique apply to?

It applies to long-running agents that call tools or check results in a loop, especially in the LangChain Deep Agents framework where the context grows with the number of iterations.

LangChain: prompt caching in Deep Agents framework reduces long-running agent latency

What is prompt caching and why do agents without it waste too much?

How does Deep Agents apply caching?

Comparison with the uncached approach

Frequently Asked Questions

Sources

Related news