What are the attention operators in ONNX v1.22.0?

They are native operators that directly describe attention mechanisms in transformer architectures and LLMs, enabling hardware runtimes to implement specialized optimizations for long sequences and streaming applications.

How does ONNX v1.22.0 improve supply-chain security?

Every release now includes SLSA Level 2 cryptographic attestations about code provenance and an embedded Software Bill of Materials listing all dependencies, versions, and licenses.

What does WebAssembly support in ONNX enable?

ONNX models can now be inspected and validated directly in the browser without a local installation, thanks to integration with Pyodide.

ONNX v1.22.0: Attention Operators for Generative AI

LF AI & Data Foundation has released ONNX v1.22.0 with native attention operators for transformer architectures and LLMs, WebAssembly support for in-browser model inspection, and SLSA Level 2 cryptographic attestations. Twenty-seven contributors participated, 16 of them for the first time.

On June 30, 2026, the LF AI & Data Foundation released ONNX v1.22.0 — a new version of the open standard for exchanging AI models between frameworks and hardware runtimes. This release brings three key additions: native support for attention operators, WebAssembly integration for in-browser model inspection, and enhanced supply-chain security.

Native Attention Operators for Modern LLMs

The most important technical change in ONNX v1.22.0 is the introduction of native attention operators — primitive operators that directly describe the attention mechanisms used in transformer architectures and LLMs. Until now, attention layers in ONNX models were expressed as compositions of lower-level operators — matrix multiplication, softmax, and reshape operations — which made it harder for hardware runtimes to apply specialized optimizations.

With the new operators, hardware vendors can implement kernels tailored to attention mechanisms, directly improving throughput for long sequences and streaming applications. This is especially relevant for LLMs with context windows of hundreds of thousands of tokens, where attention computation dominates total inference cost.

Why Does Interoperability Matter for Generative AI?

ONNX exists so that a model trained in one framework can run in another — without retraining or manual conversion. A PyTorch model becomes an ONNX artifact, and runtimes can then execute it on CPU (Intel OpenVINO), GPU (NVIDIA TensorRT), mobile chips (Qualcomm QNN), or specialized accelerators — all without changing the model.

Up to this version, attention layers had no native representation in ONNX. That was a structural gap between how modern LLMs are described internally and how the ONNX schema could express them. Without native attention operators, a runtime could not recognize the pattern and exploit a specialized hardware path — it had to process attention as a series of generic operations.

ONNX v1.22.0 closes that gap. Modern transformer architectures are now first-class citizens of the ONNX ecosystem, meaning frameworks such as PyTorch, TensorFlow, and scikit-learn can express LLMs in ONNX format without losing information about key computational patterns.

WebAssembly and Supply-Chain Security

Version 1.22.0 introduces WebAssembly support via Pyodide integration. ONNX models can now be inspected and validated directly in the browser, without a local Python or ONNX library installation. Tools for model graph inspection, shape-inference verification, and operator compatibility checking are available to anyone with a URL — no development environment setup required.

On the security side, every ONNX release from this version onward carries SLSA Level 2 cryptographic attestations about code provenance — a reproducible and verifiable record of where and how the artifact was built. In addition, each package now includes an embedded SBOM (Software Bill of Materials) listing all dependencies, versions, and licenses. This is a direct response to growing regulatory and business requirements for supply-chain transparency in open-source AI tooling.

The modernized build system ensures reproducible builds across all three platforms: Linux, macOS, and Windows. For teams that automate CI/CD pipelines with ONNX conversions, build reproducibility means predictable results without dependency on build environment state.

Community, Bug Fixes, and Roadmap

ONNX v1.22.0 brought together 27 contributors, 16 of whom were contributing for the first time. Shape-inference helper functions and the version converter used by frameworks when converting models between ONNX opset versions have been improved. Alongside the attention operators, correctness has been fixed for several key operators across a wider range of inputs, reducing discrepancies between the specification and actual behavior on edge cases.

The roadmap for future versions announces support for probabilistic and Bayesian inference, extended quantization, and further improvements to shape inference.

ONNX v1.22.0 is available on the GitHub repository github.com/onnx/onnx and through standard package managers.

ONNX v1.22.0 Brings Native Attention Operators for LLMs and WebAssembly Support

Native Attention Operators for Modern LLMs

Why Does Interoperability Matter for Generative AI?

WebAssembly and Supply-Chain Security

Community, Bug Fixes, and Roadmap

Frequently Asked Questions

Sources

Related news