AI generates "Black Mythology" with one click, Tencent launches game video model GameGen-O, industry insiders: ChatGPT moment of game studios

avatar
36kr
09-14
This article is machine translated
Show original

What? The big model may soon be able to generate AAA masterpieces like "Black Myth: Wukong"? !

Just watch a demo, "Journey to the West" is here:

Combined with the BGM, it feels a bit like that (doge).

This is GameGen-O , a Transformer model specifically designed to generate open-world video games, recently launched by Tencent.

Simply put, this model can simulate various game engine functions, generate game characters, dynamic environments, complex actions, etc.

Of course, interactive control is also supported, and users can control the game content through text, operation signals and video prompts.

As soon as the news was announced, the 𝕏 (formerly Twitter) screen was flooded with messages, and netizens lined up to scream:

The co-founder and CTO of game studio Azra Games said bluntly:

GameGen-O will be the ChatGPT moment for game studios .

“Game studios have their ChatGPT moment”

Specifically, this project was launched by Tencent Photon Studio (which created Peace Elite) in collaboration with Hong Kong University of Science and Technology and University of Science and Technology of China.

It is speculated that what they want to do is to use AI models to replace some game development links , such as the currently announced game character creation, game environment generation, action generation, event generation, and various interactive controls.

Let’s preview them one by one below~

Now, you can use GameGen-O to directly generate various characters, such as cowboys, astronauts, magicians, guards... just one click.

Insufficient funds made it difficult to shoot the actual scene, but we had a plan B!

Show off your cool moves to your teammates, and you can easily master the action generation from various perspectives.

Essential aspects of the game - giving players occasional difficulty levels of hundreds of millions of points, such as tsunamis, tornadoes, and fire events (doge).

At the same time, GameGen-O also supports open domain generation, which means there are no restrictions on style, environment, or scene.

Finally, interaction can be achieved using text, operation signals and video prompts, left, right, towards dawn...

Wow, everyone knows how expensive game development is. Now, ordinary players can also use GameGen-O to make games.

An AI architect netizen even asserted:

Labeling data with GPT-4o

To develop this model, the team reportedly carried out two main tasks :

Build a proprietary dataset OGameData and use GPT-4o to annotate data

The training process is carried out in two stages

Specifically, the team first proposed a dataset construction pipeline .

The team collected 32,000 raw videos from the Internet, from hundreds of open world games, ranging from a few minutes to several hours in length, and in genres including role-playing, first-person shooters, racing, action puzzle games, etc.

These videos were then identified and screened by human experts , resulting in approximately 15,000 usable videos.

Next, the screened videos are cut into segments using scene detection technology, and these video segments are strictly sorted and filtered based on aesthetics, optical flow, and semantic content.

We then used GPT-4o to meticulously annotate over 4,000 hours of high-quality video clips, ranging in resolution from 720p to 4k.

To achieve interactive controllability, the team selected the highest quality clips from the annotated dataset and performed decoupled labeling .

This labeling is designed to describe changes in the state of a fragment's content , ensuring that the dataset for training models is more refined and interactive.

Regarding this form of human experts and GPT-4o working together, some netizens believe:

This is a form of recursive self-improvement . (Human experts ensure the accuracy of the annotations and help GPT-4o improve itself through feedback mechanisms.)

After completing the data preparation, the team trained GameGen-O through two processes : basic pre-training + instruction adjustment .

During the basic training phase , the GameGen-O model uses a 2+1D VAE (Variational Autoencoder, such as Magvit-v2) to compress video clips.

To adapt VAE to the gaming domain, the team performed domain-specific adjustments to the VAE decoder.

The team adopted a mixed training strategy with different frame rates and resolutions to enhance the generalization ability across frame rates and resolutions.

In addition, the overall architecture of the model follows the principles of the Latte and OpenSora V1.2 frameworks.

By using the masked attention mechanism, GameGen-O has the dual capabilities of text-to-video generation and video sequel .

The team said:

This training method, combined with the OGameData dataset, enables the model to generate open-domain video game content stably and with high quality, and lays the foundation for subsequent interactive control capabilities.

After this, the pre-trained model is fixed and then fine-tuned using a trainable InstructNet , which enables the model to generate subsequent frames based on multimodal structured instructions.

InstructNet is mainly designed to accept various multimodal inputs , including structured text, operation signals, and video prompts.

During the adjustment process of the InstructNet branch, the current content is used as a condition, thereby establishing a mapping relationship between the current segment content and the future segment content, which is performed under the multimodal control signal.

As a result, at inference time, GameGen-O allows the user to continuously generate and control the next generated fragment based on the current fragment .

Currently, GameGen-O has created an official GitHub repository , but there is no time to upload the code yet.

Interested friends can collect it first~

Project homepage: https://gamegen-o.github.io/

GitHub official repository: https://github.com/GameGen-O/GameGen-O/

Reference Links:

[1]https://x.com/_akhaliq/status/1834590455226339492

[2]https://x.com/8teapi/status/1834615421728948581?s=46

This article comes from the WeChat public account "Quantum Bit" , the author: Focus on cutting-edge technology, and is authorized to be published by 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments