World model

A learned internal representation of an environment's dynamics that an AI system uses to predict future states and plan actions without constant real-world trial and error.

A world model is a machine learning system that builds an internal representation of an environment and predicts how that environment changes over time in response to actions. Instead of learning solely from real-world attempts, an agent can “imagine” future states inside the model and use them to plan before acting.

Unlike large language models, which operate over text tokens, world models often process sensory inputs (pixels, depth, lidar) and learn dynamics such as physics, collisions, and causality. Common approaches include model-based reinforcement learning, latent-space predictive architectures (JEPA), and video-generative models such as DeepMind’s Genie 3, which renders interactive environments in real time.

In 2025–2026, world models are seen as a key step toward embodied and agentic AI and robotics, and as a possible path toward AGI, because they let agents be trained across an effectively unlimited curriculum of rich simulated environments.

World model

Sources

See also