UK AISI: Engineering Playbook Opens Frontier Model Evaluation Infrastructure in Five Layers
The Engineering Playbook is open-source documentation published by the UK AI Safety Institute on June 18, 2026 that opens up the internal infrastructure for evaluating frontier AI models. The Playbook is structured in five layers (Evaluate, Isolate, Connect, Run, Scale) and builds on the previously released Inspect AI tool, which has over 200 ready-made evaluations and 240 contributors.
This article was generated using artificial intelligence from primary sources.
The UK AI Safety Institute (AISI), the British government agency for AI safety, published on June 18, 2026 the Engineering Playbook — open-source documentation of its own infrastructure for evaluating frontier models. Frontier models are the most advanced AI systems, and testing them requires specialized infrastructure for isolation, execution and measurement of model behavior.
Five Layers of Evaluation
The Playbook is structured in five layers: Evaluate (defining tests), Isolate (security isolation), Connect (connecting to models), Run (execution) and Scale (scaling to larger workloads). The structure covers the entire path from designing a test to the compute infrastructure for open-weight models, giving other laboratories and agencies a proven template rather than building their own system from scratch.
What It Builds On
The Engineering Playbook builds on Inspect AI, AISI’s evaluation framework that the institute previously released. Through the Inspect Evals library, over 200 ready-made evaluations are available, and the inspect_ai repository on GitHub has 240 contributors. Unlike the closed internal systems of individual laboratories, this stack is public and can be adopted by any organization that tests models.
Who Is Already Using It
The organization METR, known for measuring autonomous capabilities of models, runs 228 tasks on frontier models using Inspect. The release of the Playbook lowers the barrier to entry for independent safety testing: instead of costly proprietary infrastructure, researchers get a documented, reproducible and open system. The material is available at engineering-playbook.aisi.org.uk.
Frequently Asked Questions
- What is the UK AISI Engineering Playbook?
- It is open-source documentation of internal infrastructure for evaluating frontier models, structured in five layers: Evaluate, Isolate, Connect, Run and Scale.
- What does the Playbook build on?
- It builds on the previously released Inspect AI tool, which through Inspect Evals offers over 200 ready-made evaluations and has 240 contributors on GitHub.
- Who is already using this infrastructure?
- The organization METR runs 228 tasks on frontier models using Inspect.
Related news
arXiv:2606.20517: Multi-LCB Extends LiveCodeBench to 12 Programming Languages and Reveals Python Overfitting in 24 Models
Black Forest Labs: Robin Rombach Calls on G7 Leaders to Support Open AI Development
Allen Institute: Open-Source MolmoMotion Predicts 3D Motion from Video and Sets SOTA in Robotics