Top in benchmarks, but underperforming in real-world testing, GPT Image 1.5 has been heavily criticized; Ultraman's future looks bleak.

12-17

This article is machine translated

Show original

OpenAI unleashed its trump card late at night: the brand-new GPT Image 1.5, which topped two charts, decisively outperforming Google Nano Banana Pro. However, online testing revealed a barrage of criticism.

Just as Google was about to release Gemini 3.0 Flash, OpenAI countered with a bombshell announcement.

Just now, OpenAI dropped its "Christmas surprise"—the official debut of its new flagship image model, ChatGPT Images.

This time, OpenAI has maximized its raw image capabilities:

Precise control : The ability to understand commands has been greatly improved, truly achieving "point and change exactly".

Detail-obsessed : The image details are preserved intact, and the texture is exquisite.

Rapid generation : The speed is increased by a full 4 times compared to the previous generation.

Moreover, starting today, all free ChatGPT users can get started, and developers can also directly call the GPT Image 1.5 API.

On the LMARaena arena, it seemed invincible:

Wenshengtu topped the charts with 1264 Elo points, surpassing Google Nano Banana Pro (NBP).

Image editing : chatgpt-image-latest narrowly defeated NBP by 3 points to win the championship, while GPT Image 1.5 followed closely behind in 4th place.

In Artificial Analysis, they even won first place in two categories.

However, the reversal came too quickly.

After testing by netizens, it was found that this seems to be another typical case of "high scores but low ability".

In Yuchen Jin's comparison, GPT's raw image processing capabilities are basically on par with Google NBP, but its intelligence is utterly crushed.

Especially when dealing with handwritten notes, GPT Image 1.5 looks decent, but it's completely wrong.

Left: ChatGPT Images; Right: Nano Banana Pro

This discrepancy sparked heated complaints within the community.

Some have bluntly stated: "Google Nano Banana Pro remains king."

Some even criticized it, saying, "This could be yet another embarrassing and meaningless release from OpenAI."

First, GPT-5.2 drew widespread negative reviews online, and then GPT Image 1.5 failed to beat Google's "banana" in a real-world test.

It seems that OpenAI has completely lost this tough battle at the end of the year to Google...

GPT Image 1.5 debuts, an epic evolution.

Let's get back to the main topic.

According to the official blog, ChatGPT is now noticeably more compliant with image editing requirements than before.

Even for minor details, only change the parts that need to be changed, while ensuring that elements such as lighting, composition, and character appearance remain consistent in input, output, and subsequent editing.

A single image from OpenAI researcher Boyuan Chen confirms that the GPT Image, codenamed "Hazelnut," has been released.

In this way, we can obtain the results that truly meet our intentions.

Whether it's useful photo editing, trying on clothes and hairstyles, or style filters and concept transformations that preserve the essence of the original image, ChatGPT can handle it all.

This update transforms ChatGPT into a portable creative studio: a place where you can work diligently and unleash your creativity.

Here comes Ultraman for Christmas...

Detailed editing, multiple rounds of photo editing.

GPT Image 1.5 excels at various "operations," including adding, deleting, combining, merging, and replacing.

Therefore, it can change the image without losing its original "feel".

With just the two people and their dog, GPT Image 1.5 accurately captured the boredom they felt while being forced to "work" at their birthday party.

Prompt: Make a 2000s film-style photo, composite these two men and the dog into it, and capture them looking bored at a kid's birthday party.

Then, based on this image, ChatGPT can be "edited" in a series, such as adding a bunch of naughty kids to the background.

Prompt: Add a bunch of unruly kids to the background, the kind who throw things around and yell, make a mess of things.

Next, the AI transformed the man on the left into an anime character and the puppy into a plush toy, accurately completing the editing.

Prompt: Change the man on the left to a hand-drawn retro Japanese anime style, the dog to a plush toy, and leave the man on the right and the background alone.

Then have them all wear OpenAI custom-made suits and manipulate them all. (PS: Mark Chen's treatment of the neck looks a bit strange.)

Prompt: Put all of them in OpenAI sweaters, and they'll look like this.

Even more amazing is that ChatGPT can change the background with one click, instantly turning a birthday party into an OpenAI live stream.

Prompt: Now remove the two men, leaving only the dog, and then put it into an OpenAI live stream, something that looks similar to the attached image.

Taking skateboarding as another example, let ChatGPT generate a skateboarding scene of Los Angeles, in the style of late 1990s documentary street photography.

Prompt: Los Angeles landscape skateboarding shot, in the following style: late 90s documentary street photography, shot on 35mm color film, Leica M-style rangefinder camera with 35mm lens, Kodak Portra 400 color panel, natural daylight, soft contrast, soft and realistic colors, embedded film grain, slight edge softening, observational candid composition, no HDR, no modern digital sharpening, no cinematic lighting.

Using this picture as a reference, I changed the skateboarder's clothes to "red" instantly.

Prompt: Change the skateboarder's clothes to red and his hat to yellow. The speed limit sign should say 15, and that truck should be a fire truck.

Not lively enough? A group of people came from the left, an eagle came from the right, and an airship was added in the air. You could have anything you wanted.

Prompt: On the left, a group of onlookers; on the right, an eagle perched on the road; and in the distance, an airship flying overhead.

Next, this image will be printed directly onto a T-shirt using ChatGPT.

Prompt: Get a T-shirt that hangs on a clothesline and print the entire image I just mentioned on it, a full-coverage print.

Finally, ChatGPT can also get the skateboarder to wear this T-shirt.

Prompt: Could you please give that T-shirt that was hanging on the clothesline to that skateboarder?

From the two demos above, it's easy to see ChatGPT's ability to precisely edit images with the support of GPT Image 1.5.

It allows you to make precise changes wherever you point to it, and it maintains consistency even after multiple rounds of editing.

Exceptional creativity, mastery of detail

Editing is fundamental, but GPT Image 1.5’s creativity shines especially bright during its “major transformation.”

It can bring ideas to life by changing and adding elements—such as text and typography—while retaining important details.

These transformations work for both simple concepts and complex ideas. And with the new ChatGPT Images feature, you can get started right away using preset styles and ideas, without needing prompts.

For example, upload a photo of the two of you to create a ChatGPT movie-style poster.

Prompt: Use these two photos to create an old-school Hollywood Golden Age style movie poster for the film *Codex*. Change the costumes as you like, as long as they fit the era. Rename the actors Wojciech Zaremba (left) and Greg Brockman (right). Directed by Sam Altman, produced by Fidji Simo. The production company should be listed as: A Feel the AGI Pictures Production.

As you can see, the generated creative image immediately exudes a powerful aura. Moreover, the details of the text in the instructions are perfectly reproduced in the image.

Then let Ultraman become an 80s fitness instructor, with fluffy hair, a headband, and wristbands.

Prompt: Transform me into a fitness instructor in the iconic 80s VHS videotape style, preserving my original facial structure and expressions. Apply authentic 80s photo and video effects directly to my face: soft glow, slight blur, a touch of noise, subtle color bleeding, and those faint scan lines that affect skin tone and edge details. The styling should be vibrant 80s workout attire, with a terry cloth headband, wristbands, and neon-colored sportswear. The hair should be styled in the 80s-style voluminous way, following its natural length and texture. Bright, retro makeup can be added if it complements the overall look. Use soft, pastel studio lighting, combined with a slightly degraded VHS aesthetic, to create an analog signal atmosphere for both face and body. The visuals should depict me leading a group in aerobic exercise. Add relevant text on the screen.

It must be said that ChatGPT has captured the essence of the 1980s.

ChatGPT can even generate "glam doll" style data with a single click.

Prompt: Create a highly stylized 3D floating head, portraying the protagonist as spoiled, charming, and utterly indifferent: half-closed eyes, raised eyebrows, and a slight upturn at the corners of the mouth, exuding the classic "sassy babe" vibe. The skin should have a smooth, glossy, gel-like texture, with strong highlights on the cheekbones and nose, capturing soft studio lighting. The eyeshadow should be holographically polarized, grading from purple to blue with clear reflective points. The hair should be thick, smooth, and glossy, styled into sculpted waves or a sleek updo, reflecting light like polished acrylic. Add a small chrome nose ring (stud or ring) with a brushed metallic sheen. The head should float on a pure white, neutral background, tilted at a 15-degree angle, like a high-end product rendering. Use bright, diffused studio lighting, avoiding hard shadows, emphasizing gloss, a plastic-like feel, and subsurface scattering, achieving realistic depth. The emotion should be spoiled, fashionable, and detachedly cool. The camera position should be a close-up portrait, looking directly at the lens, with an 85mm focal length. The material should be ultra-smooth, high-gloss, cartoonish-style plastic-like skin, lips, and hair.

Ultraman instantly transformed into a "sharp and sassy girl"!

Even more interestingly, the character (Ultraman) can also be turned into a useful charm.

Prompt: Transform me into a pendant sculpted from glossy molded glass, with a high-gloss lacquer finish that reflects light from every angle. The touch should be incredibly smooth and cool, its weight conveying both fragility and timelessness. The coating should have a glossy enamel quality, somewhere between ceramic and candy icing—highly reflective, allowing tiny highlights to bloom like soft sparks on the curved surfaces. Use delicate metal embellishments to outline the contours and textures—dots, lines, and gold and silver filigree patterns that shimmer in motion. These should be done with fine glitter or microbeads to create a handcrafted feel; the shimmer should be layered, not flat, allowing light to seem to dance on raised details. The entire piece should exude a vintage, kitschy luxury: vibrant blocks of color clashing with shimmering accents, both playful and deliberate. The curved areas receiving light should have a subtle iridescent sheen—tinged with gold, rose, and pearl. The surface texture should look edible, like glazed candy or melted candy poured into a mold. Suspended by delicate gold rings and fine cords, this ornament should hover with a gentle drama, both festive and sculptural. It should be an iconic yet humorous piece, a statement item that sparkles under Christmas tree lights or studio lights—embodying camp elegance and handcrafted nostalgia.

It's a Santa Claus themed one.

Let's take out Ultraman's iconic pink outfit from his WWDC speech at Apple and see how it looks on him now.

Prompt: Use the uploaded image as the main reference. Transform it into a minimalist 90s American fashion advertisement. Completely preserve the subject's facial features, proportions, pose, and expression. Retain the original color of the double-collar polo shirt. Style: Clean, understated, high-end fashion magazine style. Outfit: Double-layered polo look (one polo layered over another), classic cut, neutral or slightly soft colors. Scene: Seamless studio backdrop, simple composition. Lighting: Soft, even studio lighting with soft shadows; natural skin tone. Mood: Confident, relaxed, and timeless. Brand: GPT-Shirt. Photography Style: Medium format film quality, subtle grain, restrained contrast.

As for aura, it depends on the person themselves.

Some of the most popular "Barbie dress-up" games among girls can now be played on ChatGPT.

Prompt: Put him in a 2000s dress-up game interface, and make the whole environment pink. Make sure the sunglasses are in the outfit too.

Ultraman's wardrobe perfectly matches his character. Even the sunglasses he requested are shown in the picture.

ChatGPT can also turn Ultraman into a classic – "Girl with a Pearl Earring".

Prompt: Put me in the painting "Girl with a Pearl Earring".

He can also endorse beverage commercials.

Prompt: Use the face of the man in the attachment to create a retro soda ad for a new beverage called "SOTA" (a new type of soda). The slogan should be: "Nothing artificial about it." The style should faithfully recreate that era.

After one round of demonstrations, ChatGPT Image demonstrated even more imaginative creativity. Compared to GPT-4o's main Ghibli-style design, the new image model is far more innovative.

More responsive to human commands, perfectly reproducing the 6x6 grid.

Compared to the initial version of GPT Image, version 1.5 has a stronger ability to follow commands.

This makes more precise editing and more complex original compositions possible, while the relationships between elements are preserved as expected.

Let's jump straight to hellish difficulty—a 6x6 grid with 36 elements, using GPT Image 1.5 with perfect smoothness and accuracy, not a single element missing.

Prompt:

Create a 6-column, 6-row grid chart with the following content:

Row 1: Greek letter β, beach ball, lemon, robot, fish tank, frog
Row 2: Mantis, luxury watch, bathtub, sunglasses, colorful butterfly, envelope
Row 3: Stamps, picture frames, steaming dumplings, the word "miracle," skis, the letter Z
Row 4: Toilet, Metro token, Mute icon, Perfume, Dragonfly, Skateboard helmet
Row 5: Bluetooth icon, number 13, green heart, Rubik's Cube, Canada Goose, soldier's helmet
Row 6: White dog, life jacket, knot, keyboard, tissue box, number 14

Left: New model; Right: Old model

Clear text rendering, direct output programming

The new model has taken another step forward in text rendering, capable of handling denser text with smaller font sizes.

The following image illustrates GPT-5.2 and ChatGPT's terrifying Markdown rendering capabilities.

Prompt:

The calorie infographic below is incredibly detailed.

Prompt:

ChatGPT can even compile complex programming interfaces.

Prompt:

Further improvements

The new model also features improvements in other dimensions, making the output more direct and usable.

For example, it can draw many faces well, and they look more natural.

ChatGPT Image generates a picture of London in the 1970s. The difference between the new (left) and old (right) versions is obvious.

Version 1.5 is more detailed and realistic in terms of facial features.

Prompt: Create a street scene of Chelsea, London in the 1970s, with photorealistic, full-focus, and incredibly detailed imagery. The street should be packed with people, and there should be a bus with an advertisement for "ImageGen 1.5," along with the OpenAI logo and subtitle "Create what you imagine." The overall style should be hyper-realistic amateur photography, like a snapshot taken with an iPhone…

For example, regarding the "grand scene" of a huge crowd, the new version (bottom left) is more realistic and natural, while the old version (bottom right) looks outdated at first glance.

Prompt: A massive scene of tens of thousands of people at the Golden Gate Bridge. Every face in the crowd is clearly visible.

A diver plays the piano underwater; the new (left) realistic version has a more human touch.

Prompt: A diver plays the piano underwater, with mermaids watching. A hyper-realistic amateur photographic style.

Let it generate a photo with glare. In the comparison below, you can immediately see that the effect on the right looks fake.

Prompt: Create an image that includes a printed vintage photograph. The photograph should show a young Asian man and a young white man in a bar, both wearing Santa hats, one of them holding a drink. The printed photograph should show reflections from a camera flash. It should also have a visible thin white border and be slightly tilted.

To reach new heights

To evaluate performance, OpenAI re-run many of the examples from the ChatGPTImage 1.0 release.

In various cases, the new model showed significant improvements, although the results are still not perfect. While this version represents meaningful progress, there is still considerable room for improvement in future iterations.

For example, the new (left) version shows cross-sections of marine life at different depths in a Japanese anime style, but the style is clearly not as consistent with the "Japanese anime style" as the old version (right).

Prompt: Create a poster of deep-sea creatures, showcasing different depths. Use a vertical cross-section of the ocean, with a highly detailed and aesthetically pleasing Japanese anime style.

The new version (left) also shows a clear misunderstanding of the dark fantasy anime style compared to the old version (right):

Prompt: Draw me a portrait, in a dark fantasy anime style.

OpenAI admits that its ability to generate certain art styles has regressed compared to previous versions.

The solution is to try using the preset filters in the "Image" function; that should help. Additionally, the previous version of ChatGPT Images has now been made into a custom GPT, so you can use the older version directly.

Another major limitation is that the new model cannot reliably Photoshop large group photos (above), and facial features are easily distorted after processing (below).

Prompt: Could you put them all in T-shirts with "OpenAI" printed on them, and make everyone smile?

With a large number of people, it becomes difficult for the new model to accurately maintain the facial features of each individual during image editing.

Another major limitation is multilingual text rendering , which presents numerous problems.

I've already finished reading the Chinese... not to mention non-English languages like Arabic and Hebrew.

Prompt: Could you draw a diagram listing some common phrases for ordering food in Chinese?

API: 20% cheaper

GPT Image 1.5 in the API offers all the same improvements as ChatGPT Images.

For example, it maintains greater consistency in brand logos and key visuals across multiple edits, making it ideal for marketing and branding efforts such as graphic and logo design, as well as for e-commerce teams to generate a complete product gallery (different variations, scenes, and angles) from a single source image.

Compared to GPT Image 1, GPT Image 1.5 is now 20% cheaper for both image input and output, so you can generate and iterate more images with the same budget.

Currently, businesses and startups across various industries, including creative tools, e-commerce, and marketing software, are already using GPT Image 1.5.

Ultraman personally sounded the emergency alarm.

This update is also a strong response to Google.

Just last month, Altman urgently issued a "code red" because Google Gemini was seizing market share.

However, at that time, Google had just released its new flagship model, Gemini 3, and the image generation tool Nano Banana Pro, the latter of which topped the LMARaena leaderboard in multiple benchmark tests.

Faced with Google's relentless pressure, OpenAI has quickly accelerated its pace: about five days ago, it released GPT-5.2; now, it has launched an upgraded image model.

This update from OpenAI is clearly a direct challenge to Google's Nano Banana Pro , which has an excellent reputation among developers.

OpenAI's competitors are far more than just Google.

In August of this year, Qwen-Image already supported the generation of readable text in both Chinese and English; Black Forest Labs also released the open-source image model Flux.2 , demonstrating its impressive capabilities.

This battle over AI image models has clearly entered a heated phase, with a clear objective: to win the enterprise market.

Ultraman emphasized the numerous improvements and new editing features of the new model.

As the CEO of OpenAI's applications, Fidji Simo subtly compared it to Google's Nano Banana Pro, stating that this time it's a shift from single text to dynamic AI experiences.

She believes that ChatGPT Images and other features can shorten the distance between "what you think" and "what you get".

Human thought extends far beyond words. In fact, the most captivating inspirations often begin as a picture, a melody, a movement, or a pattern in our minds. If AI is to help us unleash our full potential, it must use the ways we are accustomed to—to understand, to express, and to communicate.

Fidji Simo revealed that, in addition to the image generator, OpenAI is also comprehensively upgrading the visual experience of ChatGPT:

For the past few months, I've been talking about the evolution of ChatGPT: it's transforming from a passive, text-based product into a more intuitive, intelligent assistant that's more closely tied to the tasks you want to accomplish.
The shift from plain text to multimodal and dynamic UIs (user interfaces) is a key part of this transformation, and I am very excited to see these developments.

Fidji Simo revealed that in the future, users will see more visual information and clearer sources when searching for answers. For example, in scenarios such as unit conversions or checking match scores, diagrams will be more intuitive than text.

However, netizens who have experienced GPT Image 1.5 and Nano Banana Pro bluntly stated that OpenAI has "run out of ideas" this time:

A meme featuring a frog's head mocking Ultraman has begun circulating:

The image of Sad Frog or Pepe the Frog (left) closely resembles OpenAI's teaser image (right) – the same background, the same text, the same gaze, and similar clothing.

This does indeed have some implications about Ultraman.

But the harshest comments came from netizens who directly "slapped" OpenAI's tweet in the face:

OpenAI is completely finished.

Under the same prompts, Nano Banana generates more realistic and natural images compared to GPT Image 1.5 , which is extremely advantageous for e-commerce creative materials.

In the image below, the top two images were generated by GPT Image 1.5, and the bottom two images were generated by Nano Banana Pro.

The caption reads: "A 53-year-old white German man in a bedroom, a typical Italian-style bedroom, with boxes and books piled on shelves, a desk in the background with an iMac and papers scattered around, wearing a gray hoodie (with a simple logo), a wedding ring, and a subtle red bracelet on his wrist, looking directly at the camera in a natural, candid user-generated content style."

However, some netizens commented that the "Musk and Ultraman Christmas photo" generated by GPT Image 1.5 was so realistic that there was not a single flaw in it.

Considering that Gemini 3.0 Flash is coming soon, the new Nano Banana image generation function may be faster and cheaper. It is unknown whether OpenAI's Image 1.5 is a "futile struggle".

It is certain that OpenAI's emergency alerts are not going to stop anytime soon.

Reference: HYJ

https://x.com/OpenAI/status/2000990989629161873

https://openai.com/index/new-chatgpt-images-is-here/

This article is from the WeChat official account "New Zhiyuan" , author: New Zhiyuan, and published with authorization from 36Kr.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

All-in station

Ho Chi Minh City launches a $1 billion Digital Asset Fund, aiming to become a "financial hub" for investors.

TechFlow

Cryptocurrency Crash: Veteran Crypto Yi Lihua Loses $700 Million in a Week

BTC

4.12%

ODAILY

The day CZ missed his best investment, Crypto missed out on AI.

CAI