The open source version of GPT-4o is here. This 17B domestic model has a raw image effect comparable to 4o and can also be used commercially

avatar
36kr
04-15
This article is machine translated
Show original
Here's the English translation:

Recently, GPT-4o went viral, with its dramatic improvement in image generation and editing capabilities making everyone want to try it. Although OpenAI later announced that free users can also use it, slow image generation and limited attempts still trouble ordinary people without a ChatGPT subscription.

So, besides GPT-4o, are there other options? Just check the text-to-image model arena on Artificial Analysis.

In this arena, we discovered that the recently second-ranked model - the 17B parameter HiDream-I1 - scores very close to GPT-4o.

The AI benchmarking and analysis platform Artificial Analysis announced via tweet that HiDream-I1 has become the new SOTA for open-source text-to-image models. This platform uses an arena mode to evaluate models by simultaneously presenting two images generated by different models and having humans choose the one that best matches the prompt.

Notably, this model topped the Artificial Analysis arena leaderboard within 24 hours of its launch, and is the first Chinese self-developed generative AI model to do so.

Through some comparison images, HiDream-I1's generation effects seem to be on par with GPT-4o, and even better than the previously "dethroned Midjourney" FLUX1.1 [pro]. Importantly, among these three models, only HiDream-I1 is open-source, and the kind of open-source that allows commercial use (MIT license).

HiDream-I1 Model: https://huggingface.co/HiDream-ai/HiDream-I1-Full

HiDream-I1 Code: https://github.com/HiDream-ai/HiDream-I1

Moreover, the domestic company that open-sourced this model - Zhixiang Future - just announced that their upcoming open-source model HiDream-E1 will support interactive image editing, allowing you to modify images to any style or content like GPT-4o. Together, they achieve a "speak and it happens" effect similar to GPT-4o's image generation and editing, filling the "open-source GPT-4o" gap.

HiDream-E1's image editing effect, with the model to be open-sourced soon.

[The rest of the translation continues in the same manner, maintaining the original structure and translating all text while preserving the <> tags and image sources.]

Prompt: A 3D wolf wearing a musician's tailcoat. Standing upright on two legs like a human, holding a guitar, surrounded by amplifiers and a stage exuding an artistic and elegant atmosphere.

Moreover, this sense of realism and delicacy stems from the model's understanding of objective laws. From the image below, it can evident be HiDream-I1 has a relatively precise understanding of objective laws. Whether it's the placement of objects, character poses, or light and shadow effects in the environment, HiDream-I1 can demonstrate natural laws that conform to the real world. Flux, on the other hand, has certain limitations in this aspect, especially when handling dynamic scenes and complex physical interactions, the model's performance is not enough often presenting situations that do not comply with physical laws.

[Image]

Prompt: A 3D cat wearing a musician's tailcoat, standing on two legs, holding a violin, surrounded by rotating musical notes and a grand piano,ating art and elegatmosphere, with spotlights illuminating the scene, creating a dramatic and exquisite environment.

Even when encountering complex prompts, these characteristics are still retained in the images generated by HiDream-I1. This demonstrates ability follow text comprehension and adhercompliance capabilities.

[Image]

HiDream-I1 generated image. Prompt:: prompt: walls of medieval castle,'s, a warrior in armclad facing the camera, flickering flames outlining his rugged facial features behind him. Sparks scattered in the wind on the rusty chainmail, his right hand unconsciously gripping the sword hilt at his waist, the deep brown cloak violently billowing in the heat waves. Burning arrows continue to fall in the distant tower, the orange-red firelight forming a strong contrast with the indigo night, sky illuminating moss-covered crenicesces of wall and the old scar on brow.

>>>visual effects have been confirmed in various benchmark test data:

  • First, HPSv2.1HiDream-I1 generates images various styles that are more in line with human aesthetics.
  • Secondly, GenEval and DPG-Bench, former verifies the degree between generated images and text images and promptsing by detecting object and color classifications, while the latter focuses on detecting multiple objects, detailed attributes, and complex relationships in generated images (suitable for evaluating prompts that are long and complex). complex. On these two benchmarks, HiDream-I1 also achieved the optimal results. This shows that HiDream-I1 has strong instruction following capabilities.
  • Zhixiang's R&D personnel revealed that the next open-source model - HiDream-E1 will soon be open-sourced, and related benchmark test data will be released in the near future. Looking forward to the excellent editing experience this model will bring.

    This article is from the WeChat official account "Machine Heart" (ID: almosthuman2014), author: Machine Heart, published with authorization from 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments