GitHub: Optimising agentic workflows achieves token savings of 19% to 62%
GitHub instrumented its production agentic workflows and identified three main sources of token waste: unnecessary MCP tools, deterministic data fetching and misconfigured bash rules. Optimisation achieved savings of 19% to 62% per workflow.
This article was generated using artificial intelligence from primary sources.
The GitHub engineering team published on 7 May 2026 an analysis of their own production agentic workflows with concrete figures on token waste and optimisation measures. The post is a rare example of transparent cost reporting and helps teams building similar systems.
Three main sources of token waste
First, unnecessary MCP tool schemas. The full GitHub MCP server with 40 tools adds 10–15 KB of context per turn, yet most workflows use only a handful of tools. Removing unused tools from the MCP configuration reduced context size per call by 8–12 KB, saving thousands of tokens per run. MCP (Model Context Protocol) is the standard by which tools expose their schemas to the language model.
Second, deterministic data fetching. Many agent steps are reads that do not require reasoning — for example fetching issue metadata. Moving such fetches into a pre-agent CLI step, before the model is invoked, takes those calls entirely out of the LLM reasoning loop.
Third, misconfigured rules. A one-line error in the bash allowlist triggered a 64-step fallback loop in which the workflow manually reconstructed compiler output instead of calling the appropriate tool.
Concrete savings per workflow
Five optimised workflows achieved the following results: Auto-Triage Issues reduction 62% (over 109 runs), Security Guard 43%, Smoke Claude 59%, Daily Compiler Quality 19%, Community Attribution 37%. Optimising Auto-Triage alone saved approximately 7.8 million effective tokens over the observation period.
What is the Effective Tokens metric?
GitHub developed the formula ET = m × (1.0 × I + 0.1 × C + 4.0 × O) to normalise costs across different model tiers. I are input tokens, C cache-read tokens, O output tokens, m a model multiplier. Output tokens carry 4× weight as the most expensive type, while cache-read tokens carry only 0.1×. The metric allows direct comparison of workflows using different models and different caching patterns — the team does not need to track dollar cost separately per model.
Frequently Asked Questions
- What is the Effective Tokens metric?
- The formula ET = m × (1.0 × I + 0.1 × C + 4.0 × O) weights token types by cost: input 1×, cache-read 0.1×, output 4×; m is a model multiplier.
- How heavy are MCP tool schemas really?
- The full GitHub MCP server with 40 tools adds 10–15 KB of context per turn; reducing to only the tools in use saves 8–12 KB and several thousand tokens per run.
- What is the example of the 64-step loop?
- Due to a one-line misconfiguration of the bash allowlist, one workflow manually reconstructed compiler output instead of calling the tool, resulting in a 64-step fallback loop.