World models are also different from video-generation models. The former simulates the dynamics of a real-world environment, which will let agents predict how the world will evolve in response to their actions. Video-gen models synthesize realistic video sequences.
Google is also planning to turn their multimodal foundation model, Gemini 2.5 Pro, into a world model that has the ability to simulate aspects of the human brain. Also worth mentioning is that in December, DeepMind revealed Genie 2, a model that has the power to generate an “endless” variety of playable worlds. The next month, it was also reported that Google was creating a new team that could work on AI models that could simulate the real world.
Others are also working on creating world models – most well-known, AI pioneer Fei Fei Li. This model came last year with Would Labs, a startup that has built its own AI system that generates a video game-like 3D scene from a single image.
Veo 3, which is still in public preview, can create video as well as audio to go along with clips, anything ranging from speech to soundtracks. At the same time, while Veo 3 creates realistic movements by simulating real-world physics, it isn’t quite a world model yet. Instead, it can also be used for cinematic storytelling in games.
The model will still be a “passive output” generative model, and it would also need to shift to a simulator that is more active, interactive as well and productive. Yet, the real challenge with video game production isn’t just impressive visuals; it's more about consistent and controllable situations.