NIST CAISI Expands Frontier AI National Security Testing to Google DeepMind, Microsoft and xAI
On May 5, 2026, NIST's Center for AI Standards and Innovation (CAISI) signed expanded agreements with Google DeepMind, Microsoft and xAI for pre-deployment and post-deployment testing of frontier models. CAISI has now conducted more than 40 evaluations, including unreleased state-of-the-art models, with testing routinely performed in classified environments with safeguards removed.
This article was generated using artificial intelligence from primary sources.
NIST’s Center for AI Standards and Innovation (CAISI) announced on May 5, 2026 that it has signed expanded collaborative agreements with Google DeepMind, Microsoft and xAI for frontier AI model testing in the context of national security. The new agreements build on CAISI’s earlier contracts with Anthropic and OpenAI from August 2024, giving the US government formal evaluation arrangements with all five leading frontier labs in the United States.
What do the agreements specifically cover?
The agreements cover pre-deployment evaluations (before a model’s public release) and post-deployment research (analysis of models already on the market). CAISI has to date conducted more than 40 evaluations — including assessments of unreleased state-of-the-art models that labs submit for testing before launch.
The technical framework of the agreements allows labs to deliver models with “reduced or removed safeguards” (e.g., content filters, refusal layers), enabling CAISI to measure the true capability limits of models without interference from safety guardrails. Testing is routinely conducted in classified environments with interagency experts acting through the TRAINS Taskforce — a coordination body established in November 2024 to align AI research with national security.
How does the director’s statement shape the strategic context?
Chris Fall, CAISI’s director, summarized the purpose of the agreements: “Independent, rigorous measurement science is essential to understanding frontier AI and its national security implications.” The quote emphasizes that CAISI’s mandate is metric, not policy-driven — the goal is to objectively measure capability thresholds, not to dictate market access conditions.
The agreements are structured flexibly — they include clauses allowing rapid response to future AI advancements without requiring renegotiation. Test results remain in the classified channel, but NIST collaborates with labs on voluntary product improvements and sharing of information on international competitiveness.
Why is this a turning point for frontier AI regulation?
The consolidation of all five leading US frontier labs (Anthropic, OpenAI, Google DeepMind, Microsoft, xAI) under a single government evaluation framework is a structural change. Until 18 months ago, government evaluations of AI models were ad-hoc and based on voluntary disclosure. CAISI is now becoming the de facto national laboratory for frontier AI assessment.
Practical industry implications: labs must now maintain classified testing pipelines, document capability claims in a manner verifiable through CAISI methodology, and expect pre-release government review for extreme capability bumps. For the EU AI Office and UK AI Safety Institute, this is a reference model — a formal pre-deployment testing obligation with a “removed safeguards” testing mechanism that EU AI Act Article 51 (general-purpose models with systemic risk) has not yet operationalized at this level of detail.
Frequently Asked Questions
- What is CAISI and which companies does it now cover?
- CAISI (Center for AI Standards and Innovation) is NIST's center that, following the new agreements signed on May 5, 2026, now has evaluation arrangements with all five leading US frontier AI labs: Anthropic, OpenAI, Google DeepMind, Microsoft and xAI.
- How many evaluations has CAISI conducted so far?
- CAISI has conducted more than 40 evaluations of frontier models, including unreleased state-of-the-art models submitted by labs with reduced or removed safeguards. Testing is performed in classified environments through the TRAINS Taskforce.
- What is the difference between pre-deployment and post-deployment testing?
- Pre-deployment testing is conducted before a model's public release to evaluate national security implications, while post-deployment research analyzes models already in the market. Both approaches are covered by the new CAISI agreements.
Related news
LangChain and LangSmith target EU AI Act: compliance tools mapped to Articles 9, 10, 12-15, and 72 ahead of the August 2, 2026 deadline
OpenAI receives FedRAMP Moderate authorization: ChatGPT Enterprise and API open for secure adoption by US federal agencies
arXiv:2604.21571 'Separable Expert': architecture for LLM personalization enabling GDPR right to erasure without retraining