IPO logo

Home
  /  
News
2025-09-04
Share on
Click to share on FacebookClick to share on Line
Google Unveils New AI World Model Genie 3: Capable of Generating 3D Virtual Worlds with Several Minutes of Interaction
Industry TrendsGenerative AINewsAI Agent
Views:118
Illustration:Diverse Interactive Scenarios Generated by Google’s AI World Model Genie 3 (Source: Google DeepMind)

Author: Jeremy Huang

On August 5, Google DeepMind unveiled its latest AI world model, Genie 3. This model can generate a 3D virtual environment based on user instructions, allowing several minutes of interaction within it. Google stated that this technology marks a major milestone toward achieving Artificial General Intelligence (AGI), as it can provide simulation environments for training AI agents and robots.

What is a World Model?
According to NVIDIA, a world model is a generative AI model capable of understanding the rules governing the real world, including physical and spatial properties. These models generate videos using input such as text, images, video, and movement, and learn to represent and predict dynamics like motion, force, and spatial relationships from sensory data, enabling them to understand the physical characteristics of real-world environments. According to The Verge, Google’s Genie model allows users to generate a virtual world—similar to a video game environment where they can freely move—simply by entering a command. However, unlike worlds built from manually crafted 3D objects, these worlds are generated in real time using AI technology. In December 2024, Google had already introduced Genie 2, a world model capable of generating interactive virtual worlds from a single image.

Key Features of Google Genie 3

Diverse Scene Generation Capabilities
Genie 3 can simulate real-world physical properties such as water and lighting effects, as well as vibrant natural environments, including animal behaviors and plant growth processes. It can also create animated and fictional scenarios, constructing fantastical settings and expressive characters. Users can explore a wide range of environments, crossing geographical and temporal boundaries to experience both realistic and imaginary scenes.

Improved Resolution and Expanded Applications
Genie 3 upgrades the resolution from Genie 2’s 360p to 720p, delivering clearer, more detailed visuals for a more immersive experience. Its application scope has also expanded from being focused on 3D environments to general-purpose scenarios, making it suitable for a broader variety of interactive worlds and contexts.

Real-Time Interaction and Longer Duration
Genie 3 supports real-time interaction, enabling users to navigate in smooth 24 FPS graphics and offering richer text-based interaction through promptable world events. Beyond navigation, this feature allows users to alter the generated world via text commands, such as changing weather conditions or adding new objects and characters. It also expands counterfactual scenarios, allowing AI agents to simulate “what if” situations for handling unexpected events. Compared to Genie 2’s 10–20 seconds of interaction, Genie 3 environments can last several minutes, offering more complete and varied exploration opportunities.

High Controllability and Instant Response
During auto-regressive generation, Genie 3 continually references previously generated trajectories, enabling it to recall and recreate relevant details even if a user revisits the same location a minute later. This breakthrough allows the system to process multiple computations per second in response to new inputs, significantly improving flexibility and responsiveness in interaction.

Genie 3 in Action
In technical demos shared on Google’s blog and official videos, Genie 3 can simulate realistic natural environments such as deserts and forest lakesides, recreate ancient Japanese streets, and even render game-like scenes featuring anthropomorphic animal characters. The demonstrations also show its ability to simulate real-world physical interactions, including skiing, painting walls with a roller, and piloting a helicopter from a first-person perspective.

A Key Step Toward AGI
Google DeepMind is using Genie 3 to advance research on embodied agents, testing its suitability for training them in the future. The team demonstrated this with SIMA, a generalist embodied agent for 3D virtual environments, which completed various assigned tasks and interacted within worlds generated by Genie 3. Genie 3 is unaware of an agent’s ultimate goal but simulates upcoming situations based on the agent’s actions, enabling the agent to learn how to accomplish objectives in dynamic environments. Its ability to maintain environmental consistency over time allows agents to execute longer action sequences and achieve more complex goals. Google believes that world models’ capacity to understand and simulate environments—allowing agents to predict both environmental changes and the consequences of their own actions—is a critical foundation for achieving AGI.

According to The Guardian, Google noted that Genie 3’s world model could train embodied agents and autonomous vehicles in accurately recreated real-world settings, such as warehouses. Professor Subramanian Ramamoorthy, Chair of Robot Learning and Autonomy at the University of Edinburgh, emphasized that world models are essential for developing robots capable of flexible decision-making, as they need to anticipate the consequences of different actions to select the optimal one. Google’s research last year also pointed out that while large language models (LLMs) excel at tasks like planning, they are less adept at taking actions on behalf of humans—making world models crucial in bridging this gap. Andrew Rogoyski, from the University of Surrey’s Institute for People-Centred AI, added that world models give “disembodied” AI a form of embodiment in virtual environments, enabling exploration and skill acquisition. While AI can already be trained on vast internet datasets, interacting with realistic or highly realistic worlds could make them more powerful and intelligent.

Limitations of Genie 3
Despite its breakthroughs, Genie 3 still has limitations. The range of actions that can be performed directly by embodied agents is currently limited; although promptable world events enable diverse environmental changes, they are not always executed by the agent itself. Accurately modeling complex interactions between multiple independent agents in the same environment remains an ongoing research challenge. Genie 3 cannot yet perfectly replicate real-world locations with geographic accuracy. Text rendering often lacks clarity and legibility unless explicitly provided in the input description. Finally, interaction duration is still limited to a few continuous minutes, rather than extending to hours-long sessions.

The Future of World Models
Google acknowledges that Genie 3’s open-ended and real-time capabilities bring unprecedented potential but also introduce new safety concerns. To maximize benefits while mitigating risks, Google DeepMind has worked closely with its Responsible Development & Innovation Team. The company has released Genie 3 as a limited research preview, granting early access to selected academics and creators to collect valuable feedback and cross-disciplinary insights, gradually building an understanding of potential risks and mitigation strategies.

Google sees Genie 3 as a pivotal moment in the evolution of world models, with the potential to impact AI research and generative media across many domains. Beyond training embodied agents such as robots and automated systems, Genie 3 could serve as a powerful educational and training tool, helping students learn, enabling experts to gain experience, and providing a platform to evaluate agent performance and explore weaknesses.

Disclaimer
The information in this article regarding Google DeepMind’s Genie 3 world model, including its features, applications, and technical details, is based on publicly available sources and media reports and is provided for general informational purposes only. Opinions, statements, and data attributed to Google DeepMind or third parties belong to their respective owners and do not represent the views of this article’s author. This content should not be considered investment advice, a basis for business decisions, or legal or professional guidance. Readers should independently verify information and assess risks before taking any action. The author and information sources assume no responsibility for any direct or indirect losses resulting from the use of or reliance on this content.

Reference

Genie 3: A new frontier for world models - Google DeepMind

What are World Foundation Models? | NVIDIA Glossary

Google’s new AI model creates video game worlds in real time | The Verge

Google says its new ‘world model’ could train AI robots in virtual warehouses | Artificial in