OpenAI fully opens up GPT-4o's image generation capabilities, and this time free users are the first to get on board!
Available today in ChatGPT and Sora to all Plus, Pro, Team, and Free users.
Overnight, various test results were posted on the screen. The most amazing one was the text processing capability.
For example, 4o can restore text content 100% and specify the text placement.
The man is holding up the word "a few" with his right hand and "words" with his left hand
It can also accurately generate text while changing the character movements, just like a TV series.
Comparing the two pictures carefully, the reflection of the man on the whiteboard in the first picture also corresponds to the second picture.
Last night, OpenAI suddenly announced that it would hold a small live broadcast for the release, and this time Ultraman showed up (previous story: he was absent from the release of GPT-4.5 because he was taking care of his kids).
The live broadcast demonstrated various ways to play, such as making memes, text rendering, multi-round interaction generation, and command following.
Just snap a selfie on the spot and then immediately switch to anime style.
By the way, the official meme was created, requiring the words "feel the agi" to be added to the image. (Yes, when generating it, they also knew to change the lowercase letters to more appropriate uppercase letters)
Now, open ChatGPT and try out these capabilities.
The actual generation speed is very fast (about one per ten seconds), but ordinary users only have three opportunities to experience it every day.
The API is expected to be rolled out gradually over the coming weeks.
This wave focuses on a beautiful and practical
We are finally moving towards this truly integrated multimodal model.
According to the official introduction, 4o, as a multimodal model, has finally completed an important piece of the puzzle - image generation.
And the main focus is on both beauty and practicality .
Without further ado, let’s take a look at the specific performance of its capability upgrade.
Major upgrades in capabilities
First, OpenAI says 4o can now accurately fuse symbols and images.
For example, just give a piece of text and generate a beautifully crafted menu:
It also supports gradual adjustment of image content and style during multiple rounds of conversations.
Similar to the following, provide a cat original picture, and then create a game character step by step:
In addition, great attention is paid to details. Officials say that 4o can handle up to 10-20 different objects , while other models generally have difficulties handling 5-8 objects.
In addition to the above, 4o also performs well in generating realistic images .
There is even a real-life version of “copycat” (doge):
Netizens tested ing
After seeing the official publicity effect, netizens also quickly came to a wave of actual tests~
The classic meme picture was applied for the first time, and the picture really does not feel out of place hhh.
Even reproducing the same writing pattern is no problem:
One More Thing
Speaking of which, the past two days have been too lively, with DeepSeek, OpenAI and Google almost fighting on the same stage.
It is worth mentioning that at 11 o'clock last night (Beijing time), OpenAI suddenly announced that there would be a small release, and DeepSeek had just released the official technical report of DeepSeek-v3-0324.
I wonder if this is a new release forced by DS? (doge)
Reference Links
[1]https://openai.com/index/introducing-4o-image-generation/
[2]https://x.com/chatgpt21/status/1904683763914674208
This article comes from the WeChat public account "Quantum位" , the author is Yishui, and is authorized to be published by 36氪.