🟡 📦 Open Source Published: · 2 min read ·

Allen Institute: Open-Source MolmoMotion Predicts 3D Motion from Video and Sets SOTA in Robotics

Editorial illustration: predicting 3D object trajectories for robotic manipulation

Allen Institute released MolmoMotion, a fully open-source model that predicts 3D object trajectories from video and natural language instructions such as 'rotate the bowl'. The model achieves state-of-the-art on PointMotionBench with 0.109 m average displacement versus 0.134 m for the previous record, and raises pick-and-place task success in robotics from 56% to 76.3%, a gain of 20.3 percentage points. It was trained on the MolmoMotion-1M dataset of 1.16 million videos with 3D trajectories and action descriptions.

🤖

This article was generated using artificial intelligence from primary sources.

Allen Institute (AI2) released MolmoMotion, a fully open-source model that predicts how objects will move in 3D space based on video and language instructions.

Predicting 3D Trajectories from Video and Language

MolmoMotion predicts 3D object trajectories from video and a natural language instruction — for example “rotate the bowl”. It comes in two variants: autoregressive (AR) for deterministic paths and flow-matching (FM) for situations with uncertainty. Flow matching is a method that models the distribution of possible outcomes rather than a single trajectory, which is useful when motion is ambiguous.

State-of-the-Art Results and Robotics Gains

On the PointMotionBench benchmark, MolmoMotion-AR achieves an average displacement of 0.109 m versus 0.134 m for the previous record holder ObjectForesight — a better result, as a smaller displacement means more precise prediction. In robotics, the model raises pick-and-place task success from 56% to 76.3%, a gain of 20.3 percentage points. It was trained on the MolmoMotion-1M dataset of 1.16 million videos with 3D point trajectories and action descriptions, covering 736 motion types.

Why Does Full Openness Matter?

MolmoMotion was released fully open — model weights, training code, and datasets. For robotics and research this means teams can reproduce results and build upon them without licensing barriers, accelerating progress in a field where high-quality 3D motion data is scarce.

Frequently Asked Questions

What does MolmoMotion do?
It predicts 3D object trajectories from video and natural language instructions; fully open-source (weights, code, datasets).
How much does it improve robotics?
Pick-and-place task success rises from 56% to 76.3%, a gain of 20.3 percentage points over baseline.