PyTorch: ExecuTorch comes to Arm Cortex-A, Cortex-M and Ethos-U85 NPU for edge AI inference
ExecuTorch on Arm is a new PyTorch Foundation initiative published on May 12, 2026, that extends the ExecuTorch runtime to Arm Cortex-A and Cortex-M CPUs and Ethos-U NPU accelerators. The OPT-125M transformer and MobileNetV2 model run on Raspberry Pi 5 and Ethos-U85 with 256 MAC units, and the Arm Education repository brings hands-on labs for edge AI deployment.
This article was generated using artificial intelligence from primary sources.
PyTorch Foundation officially extended the ExecuTorch runtime to Arm Cortex-A, Cortex-M, and Ethos-U NPU platforms on May 12, 2026. The initiative brings efficient PyTorch inference to resource-constrained edge devices — wearables, smart cameras, IoT systems — while preserving the PyTorch developer experience.
Which Arm hardware does ExecuTorch now support?
The CPU side covers Cortex-A platforms such as Raspberry Pi 5 and Cortex-M microcontrollers. ExecuTorch uses KleidiAI microkernels and Neon architectural features via the XNNPACK backend, delivering significant latency reduction compared to PyTorch eager mode. PyTorch documentation warns: without XNNPACK delegation the runtime may have higher latency than eager mode even though it retains a smaller footprint.
The NPU side targets Ethos-U accelerators, which are typically combined with a Cortex-A or Cortex-M CPU. An example hardware configuration is Ethos-U85 with 256 MAC (multiply-accumulate) units, which ExecuTorch treats as a primary deployment target.
Which models does the tutorial cover?
The documentation demonstrates three scenarios. The OPT-125M transformer model runs on Raspberry Pi 5 via the XNNPACK CPU backend. MobileNetV2 INT8 is deployed to an Ethos-U NPU via the EthosUQuantizer and EthosUCompileSpec API. The MobileNetV2 + LRN case shows how the runtime handles unsupported operators.
What do developers get in the Arm Education repository?
The arm-education/executorch_on_arm_labs GitHub repository contains hands-on labs for CPU inference on Raspberry Pi 5, hardware-accelerated inference with Ethos-U NPU, and Model Explorer Adapters (developed by Arm) for visualizing .pte files, backend partitioning, TOSA representation, and VGF files for the Arm ML SDK Vulkan. Lab content was prepared by Marcelo Rovai (UNIFEI University, Edge AI Foundation) and academic reviewers from IIIT Bangalore.
Position in the edge AI ecosystem
The deployment pipeline converts a PyTorch model through quantization, TOSA (Tensor Operator Set Architecture) representation, and Vela optimization into a .pte artifact. A static computation graph ensures predictable execution on constrained systems. ExecuTorch thus positions the PyTorch ecosystem as a serious player for edge inference, where TensorFlow Lite Micro and ONNX Runtime have previously dominated.
Frequently Asked Questions
- Which models can be run via ExecuTorch on Arm hardware?
- The documentation demonstrates the OPT-125M transformer model on Raspberry Pi 5 (Cortex-A) and quantized MobileNetV2 (INT8) models on an Ethos-U NPU; all models must be PyTorch-compatible and exportable to the lightweight .pte (PyTorch Export) format.
- What does XNNPACK backend delegation provide?
- The XNNPACK backend uses KleidiAI microkernels and Neon architectural features and delivers significant latency reduction on a Cortex-A CPU; without XNNPACK delegation, ExecuTorch may have higher latency than PyTorch eager mode even though it retains a smaller runtime footprint.
Related news
Kedro: version 1.2.0 brings the @experimental decorator and a LangGraph agentic starter for GenAI pipelines
Stability AI: Stable Audio 3.0 with open-weight models and generation up to 6 minutes
LangChain: The agent that fixes agents — how LangSmith Engine was built