What regulatory gap does the paper address?

The EU AI Act, NIST Risk Management Framework, and the Council of Europe Convention all require operators of high-risk AI systems to demonstrate safety before deployment, but none specifies what 'acceptable risk' means in quantitative terms, nor provides a technical method for verifying that a deployed system actually meets such a threshold. The authors call this gap a key obstacle to the practical applicability of the regulations.

What are the RoMA and gRoMA tools?

RoMA and gRoMA are statistical verification tools used in the second phase of the framework. They calculate an auditable upper bound on the actual failure rate of a system without requiring access to its internal model structure, enabling certification even for closed commercial models to which the auditor has no architectural access.

What do the two phases of certification look like?

In the first phase, the competent authority formally establishes an acceptable failure probability denoted δ and an operational input domain denoted ε. In the second phase, RoMA and gRoMA calculate an upper bound on the actual failure rate, which can be compared against the threshold δ. The approach is inspired by aviation safety protocols.

arXiv 'Bounding the Black Box': Statistical Certification for EU AI Act

Natan Levy and Gadi Perl published a paper on April 23, 2026 on ArXiv that fills a regulatory gap in the EU AI Act, NIST framework, and Council of Europe Convention. They propose a two-step statistical framework using the RoMA and gRoMA tools, which calculate an auditable upper bound on failure rates without access to the internal structure of the model.

Researchers Natan Levy and Gadi Perl published a paper on April 23, 2026 on ArXiv titled “Bounding the Black Box” (arXiv:2604.21854), directly tackling a problem that has troubled both regulators and industry for two years — how to prove that a high-risk AI system is sufficiently safe when no law specifies what “sufficiently safe” means in numbers.

The paper is 11 pages long and arrives at a moment when the EU AI Act is entering operational application, and organizations across the continent must begin conducting conformity assessments for their AI systems without a clear methodological foundation.

What Exactly Is the Regulatory Gap?

The authors frame the problem sharply and precisely. Three key regulatory instruments — the EU AI Act, the NIST Risk Management Framework (RMF), and the Council of Europe Convention on AI, Human Rights, and the Rule of Law — all require that operators of high-risk systems demonstrate safety before deployment. However, as the authors literally state: “none specifies what ‘acceptable risk’ means in quantitative terms, and none provides a technical method for verifying that a deployed system actually meets such a threshold.”

In other words, the regulator demands proof, but specifies neither what needs to be proved nor how to prove it. This creates legal uncertainty for regulated entities and opens the door to “compliance theater” — paper-based risk assessments without any real measure of quality.

What Does the Proposed Two-Step Framework Look Like?

Levy and Perl propose a framework inspired by aviation safety protocols, where safety is not proven through hope but through measuring failure rates below a pre-defined threshold.

Phase one — political. The competent authority (in the EU context, this would be a national regulatory body or the European AI Office) formally establishes two values: an acceptable failure probability denoted δ (delta) and an operational input domain denoted ε (epsilon). This step is a political and legal decision, not a technical one — whoever has the authority to define “acceptable” sets the threshold.

Phase two — technical. The statistical tools RoMA and gRoMA calculate an auditable upper bound on the actual failure rate of the system over the given domain ε. If the upper bound falls below δ, the system passes certification. If it does not, it fails.

Why Is the RoMA Approach Particularly Important for Closed Models?

The key technical characteristic of the RoMA and gRoMA tools, according to the abstract, is that they work without access to the internal model structure. The auditor does not need weights, gradients, or architectural details — they work with input and output data and compute the statistical failure bound.

This is critical for the European market because the majority of high-risk systems that will fall under the EU AI Act will be closed commercial models (OpenAI, Anthropic, Google, Mistral). Any certification method that requires access to model weights is practically inapplicable. RoMA enables a third party to conduct meaningful verification even on a black-box system.

What Does This Mean for Regulated Entities and Regulators?

For organizations developing or integrating high-risk AI systems (healthcare, finance, HR processes, critical infrastructure), the paper offers a concrete technical template for their own compliance assessments while regulators have not yet published their own guidelines. The approach is also useful as a negotiating position with vendors — it becomes possible to request from model providers that they supply statistical evidence computed in the RoMA style, rather than generic “model card” statements.

For regulatory bodies, the paper provides a methodological starting point that is academically published, peer-reviewed, and technically specific enough to be incorporated into secondary legislation. The abstract does not cite concrete p-value thresholds or case studies, meaning the full paper text must be read before implementation, but the direction is clear: quantitative certification of AI safety is no longer a theoretical but an operational challenge.

arXiv:2604.21854 'Bounding the Black Box': A Statistical Framework for Certifying High-Risk AI Systems Under the EU AI Act

What Exactly Is the Regulatory Gap?

What Does the Proposed Two-Step Framework Look Like?

Why Is the RoMA Approach Particularly Important for Closed Models?

What Does This Mean for Regulated Entities and Regulators?

Frequently Asked Questions

Sources

Related news