arXiv: LLM tool-calling linearly steerable without fine-tuning

Researchers from UCL, Holistic AI and Imperial College discovered that LLMs internally represent tool selection linearly. The mean-difference vector — the difference of average activations between two tools — added to activations changes selection with 77-100% accuracy on 12 tested models (270M-27B parameters), without any fine-tuning.

A research team from University College London, Holistic AI and Imperial College London — Zekun Wu, Ze Wang, Seonglae Cho, Yufei Yang, Adriano Koshiyama, Sahan Bulathwela and Maria Perez-Ortiz — published on May 11, 2026 a study showing that LLMs internally represent tool selection linearly and that this selection can be reliably steered without fine-tuning.

What did the researchers discover?

The main finding: tool selection in language models is “linearly readable and steerable” through activation manipulation. By adding the mean-difference vector — the difference of average activations between two tools — researchers achieved “77-100% accuracy on name-only single-turn prompts, 93-100% for models 4B+.” The technique requires no additional training.

Which models were tested?

The study covered 12 instruction-tuned models across the Gemma 3, Qwen 3, Qwen 2.5 and Llama 3.1 families, with parameter counts ranging from 270M to 27B. Consistent results across such diverse architectures suggest the phenomenon is universal, not an artifact of a specific model or training.

What does this reveal about the internal structure of models?

The authors used activation patching and causal analysis and found that the causal effect “concentrates along one direction, the output row of the layer producing the first token of the target tool.” Surprisingly, even base models (before instruction-tuning) encoded correct tool information — cosine readout returns 69-82% on BFCL benchmarks, while base generation achieves only 2-10%. Instruction-tuning apparently just wires existing representations into the output.

What are the practical implications and limitations?

The technique opens new possibilities for lightweight control of agentic systems: switching tools without retraining, A/B testing different tool routing, mitigating model bias toward certain tools. Limitations are significant — the authors stress findings hold in single-turn fixed menu settings, while multi-turn agentic transfer is “more fragile” and requires further research.

Frequently Asked Questions

What is a mean-difference vector?

A mean-difference vector is the difference between the average activation vectors of two classes (e.g. two tools). It is computed by taking the average activations on examples where the model selects tool A, the average on examples where it selects tool B, and subtracting. Adding this difference to activations during inference can 'nudge' the model toward one tool or the other.

Why is the linear representation surprising?

Many assumed that tool selection in LLMs results from complex interactions across multiple layers and components. The study shows the causal effect is concentrated 'along one direction, the output row of the layer producing the first token of the target tool' — indicating a simpler structure than expected, and opening the door to simpler control methods.

Does this hold for multi-turn agentic scenarios?

The authors explicitly warn: findings hold for 'single-turn fixed-menu settings', while 'multi-turn agentic transfer is more fragile'. This means the technique is useful for controlling tool choice in a single step, but reliably steering multiple tools across longer agentic trajectories remains an open problem.

arXiv:2605.07990: LLM tool-calling linearly represented — mean-difference vector changes selection 77-100%

What did the researchers discover?

Which models were tested?

What does this reveal about the internal structure of models?

What are the practical implications and limitations?

Frequently Asked Questions

Sources

Related news