U.S. Government Expands Pre-Release Safety Testing for Advanced AI Models

The U.S. government is taking a more assertive role in ensuring the safety of cutting-edge artificial intelligence systems before they reach the public. A newly announced series of agreements between the Center for AI Standards and Innovation (CAISI) — a division of the National Institute of Standards and Technology (NIST) — and several major AI developers marks a significant step toward systematic, pre-deployment evaluation of frontier models.

New Agreements Expand Testing Framework

CAISI has signed formal pacts with Google DeepMind, Microsoft, and xAI, granting the agency authority to test and evaluate these companies' most advanced AI models before they are publicly released. The initiative builds on earlier collaborations with Anthropic and OpenAI, which signed similar agreements nearly two years ago when the agency was known as the U.S. Artificial Intelligence Safety Institute.

U.S. Government Expands Pre-Release Safety Testing for Advanced AI Models — Source: www.computerworld.com

According to a CAISI press release, the agency will "conduct pre-deployment evaluations and targeted research to better assess frontier AI capabilities and advance the state of AI security." The work will include collaboration with the UK AI Safety Institute (AISI), which brings additional international expertise to the evaluation process.

Microsoft, in a blog post announcing its agreement, emphasized that such testing is essential for building trust in advanced AI systems. "As AI capabilities advance, so too must the rigor of the testing and safeguards that underpin them," the company stated.

A Shift Toward Proactive Security

Industry observers see these agreements as a pivotal move toward proactive security for agentic AI — autonomous systems that can act independently. Fritz Jean-Louis, principal cybersecurity advisor at Info-Tech Research Group, noted that the initiative enables government-led testing both before and after deployment.

"This should help strengthen visibility into autonomous behaviors while accelerating the development of standards to mitigate risks," Jean-Louis explained. "By combining early access, continuous evaluation, and cross-sector collaboration, the initiative pushes the industry toward security-by-design for increasingly autonomous AI systems."

However, he acknowledged potential hurdles, particularly around intellectual property. "How would IP be protected under this approach?" he asked. "Regardless, I believe this is a positive step for the industry."

Potential Executive Order on the Horizon

The CAISI announcement comes amid reports that the White House is preparing an executive order that would establish a formal vetting system for all new AI models — with particular attention to Anthropic's Mythos model. According to Bloomberg, "the directive is taking shape weeks after Anthropic revealed that its breakthrough Mythos model was adept at finding network vulnerabilities and could pose a global cybersecurity risk."

This would represent a significant escalation in government oversight, moving from voluntary agreements to mandatory pre-release review for the most powerful systems.

Industry Reactions and Implications

Carmi Levy, an independent technology analyst, described the week's announcement as a clear policy shift. He noted that establishing CAISI as the definitive testing ground for frontier models is directly linked to growing concerns about autonomous, unregulated AI capabilities.

Levy added that while the industry has largely self-regulated its safety practices, the emergence of models with sophisticated cybersecurity attack skills — as seen with Mythos — demonstrates the need for independent, government-backed evaluation.

What This Means Moving Forward

More rigorous testing: AI models will undergo deeper scrutiny before release, potentially catching dangerous capabilities earlier.
International coordination: Collaboration with the UK AISI points toward a global framework for AI safety.
Industry compliance: Major players are now contractually obligated to submit to pre-deployment review.

The combination of voluntary agreements and prospective executive orders signals that AI safety is transitioning from an afterthought to a foundational requirement. As the technology continues to evolve at breakneck speed, these testing mechanisms may prove crucial in preventing unintended consequences — from cybersecurity breaches to more subtle forms of harm.

For now, CAISI is positioning itself as the central hub for this new era of AI accountability. Whether the industry and regulators can keep pace with the technology remains an open question, but the direction is clear: before the most powerful AI models meet the public, they will first meet the government's scrutiny.

Tags:

U.S. Government Expands Pre-Release Safety Testing for Advanced AI Models

New Agreements Expand Testing Framework

A Shift Toward Proactive Security

Potential Executive Order on the Horizon

Industry Reactions and Implications

What This Means Moving Forward

Related Articles

Recommended

Discover More