MolmoAct 2: open-source robotics beats GPT-5

MolmoAct 2 is an open-source robotics foundation model released on May 5 by Allen Institute for AI. The model achieves 63.8/100 on embodied-reasoning benchmarks, outperforms GPT-5 and Gemini 2.5 Pro, accelerates inference 37×, and is the first base model with built-in bimanual capabilities.

Allen Institute for AI (AI2) released MolmoAct 2 on May 5, 2026 — the first open-source robotics foundation model to outperform closed systems such as Physical Intelligence and frontier models GPT-5 and Gemini 2.5 Pro on embodied-reasoning benchmarks.

A robotics foundation model is a large base model trained on a combination of visual and action data, enabling a robot to execute diverse physical tasks from natural language without task-specific training.

What are the three key changes in MolmoAct 2?

The first change is raw performance: the model achieves 63.8/100 on embodied-reasoning benchmarks, placing it ahead of GPT-5 and Gemini 2.5 Pro. The second is a dramatic speedup — by optimizing the KV-cache bridge between the vision model and action expert, inference is accelerated 37×, from 6.7 seconds to 180 milliseconds per action. The third is built-in bimanuality — coordinated two-arm manipulation without per-task fine-tuning, making MolmoAct 2 the first base model of its kind.

The model is built on the Molmo 2-ER base trained on approximately 3 million additional embodied-reasoning examples.

What do the benchmark results look like in practice?

On the LIBERO benchmark, a standard academic test for robot learning, MolmoAct 2 achieves 97.2% success. On real-world tasks with a Franka arm robot, success is 87.1%, while on the new MolmoBot household benchmark (a set of household tasks) it achieves 20.6% — twice the score of the second-place model.

The gap between LIBERO and MolmoBot shows how challenging real messy household conditions remain: even a model solving 97% of academic tasks succeeds in only one-fifth of real household scenarios.

What does AI2 release alongside the model?

In addition to model weights, AI2 releases the YAM Dataset with over 720 hours of bimanual demonstrations — 30 times more than the original MolmoAct dataset — as well as complete training code and a reference hardware setup that other labs can replicate.

All artifacts — weights, dataset, code, and hardware specifications — are publicly available. This makes MolmoAct 2 the first serious open answer to closed robotics foundation models, giving researchers, universities, and smaller companies a foundation to build their own applications without licensing restrictions.

Frequently Asked Questions

What is a robotics foundation model?

A robotics foundation model is a large base model trained on visual and action data that enables robots to perform tasks from natural language instructions without fine-tuning for each new task.

What are bimanual capabilities in robotics?

Bimanual capabilities mean the robot coordinates two arms on a single task — for example, holding a container with one arm while pouring with the other. MolmoAct 2 is the first base model to do this without per-task training.

What is the YAM Dataset?

The YAM Dataset is a new public collection of over 720 hours of bimanual robot demonstrations released by AI2 alongside the model — 30 times more demonstrations than the original MolmoAct dataset.

Allen Institute: MolmoAct 2 is the first open-source robotics foundation model to outperform GPT-5 and Gemini 2.5 Pro

What are the three key changes in MolmoAct 2?

What do the benchmark results look like in practice?

What does AI2 release alongside the model?

Frequently Asked Questions

Sources

Related news