ChatGPT Images 2.0 has been released, stunningly surpassing Google Nano Banana. Is design truly doomed?

This article is machine translated

Show original

At 3:00 AM Beijing time, the live stream started on time, and OpenAI released ChatGPT Images 2.0.

According to the introduction, "ChatGPT Images 2.0 is the next step in evolution: a state-of-the-art model capable of handling complex visual tasks and generating accurate, ready-to-use visual content."

Perhaps for this reason, OpenAI's official blog posts offer two versions (image mode and classic mode), with the image mode content being entirely generated by the model!

Blog address: https://openai.com/index/introducing-chatgpt-images-2-0/

In its blog post, OpenAI stated: "Images are a language, not decoration. Good images, like good sentences, are selected, organized, and presented. They can explain mechanisms, create atmosphere, validate ideas, or construct arguments."

ChatGPT Images 2.0 has achieved a qualitative leap in its ability to meticulously follow instructions, accurately placing and associating objects, rendering high-density text, and supporting the generation of various aspect ratios. Its capabilities in composition and visual aesthetics make the output no longer feel like "AI-generated" but rather like "intentionally designed."

Furthermore, it performs accurately in multilingual environments and can use expanded visual and world knowledge to fill in details for you, thus providing smarter images with fewer prompts.

To tackle the most complex tasks, Images 2.0 introduces "thinking ability" for the first time. When selecting the thinking or pro model in ChatGPT, Images 2.0 can connect to the internet to obtain real-time information, generate multiple different images from a single prompt, and review its own output. With "thinking," the model can take on more work between ideas and images, especially when accuracy, timeliness, consistency, and visual uniformity are critical.

By combining the intelligence of OpenAI's inference model with its deep understanding of the visual world, this model elevates image generation from "rendering" to "strategic design," evolving from a tool into a visual system that helps people transform ideas into understandable, shareable, teachable, and constructible results.

This capability is now available to all users of ChatGPT, Codex, and the API starting today.

Higher precision and control

Images 2.0 brings unprecedented detail and fidelity to image creation. It not only enables the creation of more complex images but also effectively brings them to life, strictly adhering to instructions, preserving key details, and rendering fine elements that were previously prone to distortion: small text, icons, UI elements, high-density compositions, and subtle style constraints. The API supports resolutions up to 2K. The results are no longer "close enough" but "ready to use."

Notice that the screenshot below was actually generated by Images 2.0!

Stronger multilingual capabilities

Previous image generation models have been more stable in English and Latin alphabet languages, but have lower accuracy in other languages, especially complex or dense text.

Images 2.0 breaks through this limitation, significantly enhancing multilingual understanding, especially in text rendering for Japanese, Korean, Chinese, Hindi, and Bengali. It not only correctly generates non-English text but also ensures natural and fluent language expression.

This means more than just translating labels; it means making language itself a part of the design, achieving visual and linguistic unity from posters and explanatory diagrams to illustrations and comics. This gives the model greater global applicability, allowing users to create visual content in real-world language environments.

During the live stream, Chen Boyuan, a member of the OpenAI image research team, presented a case study, providing the prompt: "Make an artisan marketing poster for a fictional OpenAI bakery. The poster should be in Japanese."

The resulting poster perfectly matched the prompt and was accurate in detail.

"It excels at following very detailed instructions, so if you have a very specific brand language, design aesthetics—all those things that are essential to creative work—you can use ChatGPT to create and refine your ideas to get the results you want," Chen Boyuan said.

A more mature style of expression and realism

Images 2.0 offers significantly improved fidelity across a variety of visual styles. It excels at capturing key features of photographs, including subtle imperfections that enhance realism, while also consistently rendering cinematic, pixel art, and comic book visuals with greater consistency in texture, lighting, composition, and detail.

Therefore, the model output is closer to the specified style than a mere imitation. This is especially valuable for game prototyping, storyboard creation, marketing ideas, and asset creation for specific media or genres.

Flexible aspect ratio

The new model offers greater flexibility in output format, supporting various aspect ratios from 3:1 to 1:3, and can be directly adapted to different scenarios such as banners, presentations, posters, mobile interfaces, bookmarks, and social media graphics. You can specify the aspect ratio in the prompts, or regenerate existing images to the new size using preset options.

Below are two examples of unconventional aspect ratios:

A stronger understanding of the real world

Images 2.0 incorporates knowledge up to December 2025, further enhancing the relevance and contextual accuracy of the generated results. This is particularly crucial for illustrative diagrams, educational graphics, and visual summaries, where accuracy and clarity are just as important as aesthetics.

Its intelligent capabilities are also reflected in end-to-end task processing: integrating information, writing content, and typifying it with a clear structure, reasonable white space, and good visual flow.

Visual Thinking Partner

After enabling the thinking model in ChatGPT, the system performs deeper understanding and execution in the background. It can retrieve information online, transform uploaded materials into clear visual descriptions, and infer the structure of images before generation.

In this mode, Images 2.0 acts more like a visual thinking partner, helping you transform initial concepts into complete products and significantly reducing workload.

It also supports generating multiple different images at once, a first for ChatGPT image generation. This makes workflows such as multi-page comics, whole-house design schemes, series of posters, or multilingual, multi-size social media content efficient and feasible.

You don't need to generate each image individually and then manually stitch them together; with just one request, you can get up to eight outputs that are consistent in terms of characters and elements and have continuity.

Image generation in Codex

Images capabilities have been integrated into Codex, enabling visual creation, iteration, and delivery to be completed within the same workspace, expanding its applications in areas such as design, marketing, product, sales, and learning.

For example, you can quickly generate multiple UI designs and prototypes, compare solutions, and directly translate the best design into a product or web experience without leaving Codex. It's available through a ChatGPT subscription; no additional API key is required.

Embedding image capabilities into products via API

Developers and enterprises can integrate these capabilities into their own products through the gpt-image-2 API, adding high-quality image generation and editing capabilities to their existing workflows.

With enhanced text rendering, multilingual generation, instruction compliance capabilities, and support for more output formats and aspect ratios, the API makes it easier to build image workflows for real-world business scenarios, such as localized advertising, infographics, explanatory diagrams, educational content, design tools, creative platforms, and web page generation products.

limitation

OpenAI also mentioned the model's limitations in its blog post: While Images 2.0 is a significant advancement, it is still not perfect. For tasks requiring complete physical world modeling (such as origami tutorials, Rubik's Cubes, and other complex structures), as well as precise details of hidden faces, tilted faces, or reversed surfaces, the model may still fall short.

Extremely high density or repetitive details (such as fine sand) can also present challenges. Manual proofreading is still recommended for labels and diagrams, especially when precise arrows or component markings are involved.

These are all important directions for future improvements.

In the API, outputs exceeding 2K are currently in the testing phase and may be unstable.

Pricing and Availability

ChatGPT Images 2.0 is available to all ChatGPT and Codex users starting today. Advanced outputs with "thinking" capabilities are available to ChatGPT Plus, Pro, and Business users.

The gpt-image-2 model is available in the API, and the price varies depending on image quality and resolution.

OpenAI has also published a large number of case studies on its official website, which interested readers can visit for their convenience.

We also conducted some simple tests, such as generating a page 2 of a Chinese college entrance examination mathematics paper, which looked alright:

In actual testing, we can see on the page that ChatGPT Images 2.0 typically goes through several steps to generate an image: creation → draft → first draft → scene setup → detail refinement → finishing touches → final polishing → final fine-tuning.

Next, we will generate a traditional Chinese cursive calligraphy work of the poem "Jiang Jing Jiu" by Li Bai, with a width-to-height ratio of 3:1. The content is the full text of "Jiang Jing Jiu". The signature is ChatGPT Images 2.0.

However, it is clear that the model did not generate a complete version, and it is obviously not cursive script.

Finally, here's a page with illustrations explaining the five consecutive lightning whip techniques:

It's quite interesting.

Overall, we feel that ChatGPT Images 2.0 is indeed much more powerful than the current Nano Banana 2; let's see how Google responds.

Have you tried ChatGPT Images 2.0? What did you think?

This article is from the WeChat public account "Machine Heart" (ID: almosthuman2014) , authored by Panda and Youli, and published with authorization from 36Kr.

Sector:

Layer 1

Telegram Bot

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content