Specialized in solving the "face collapse" of AI raw images, 8 times faster than GPT, the new version of FLUX.1 refreshes SOTA in all aspects

05-30

This article is machine translated

Show original

[Guide] FLUX.1 Kontext is a next-generation model that integrates instant text-image editing and text-to-image generation, supporting text and image prompts with strong character consistency and a speed up to 8 times faster than GPT-Image-1.

When generating and editing images with AI, if you want to create a complete story template, but the protagonist "changes face" faster than turning a page, what can you do?

Don't panic, the brand new image model FLUX.1 Kontext is here! It supports in-context image generation, can use both text and image prompts, and can seamlessly extract and modify visual concepts to generate new, coherent images.

Paper address: https://bfl.ai/announcements/flux-1-kontext

FLUX.1 Kontext is a series of generative flow matching models that can generate and edit images. Unlike existing text-to-image models, the FLUX.1 Kontext series supports in-context image generation.

Consistent and Context-Aware Text and Image Generation and Editing

Your Image, Your Text, Your World

FLUX.1 Kontext marks an important extension of classic text-to-image models by merging instant text-image editing with text-to-image generation.

As a multimodal flow model, it combines state-of-the-art character consistency, context understanding, and local editing capabilities, while possessing powerful text-to-image synthesis abilities.

[The rest of the translation follows the same approach, maintaining technical terminology and preserving the original structure.]

Implementation Details

Starting from a model checkpoint for generating images from pure text, jointly fine-tune the model for image-to-image generation and text-to-image generation tasks.

Although the method naturally supports multiple input images, it currently focuses on using a single image as a conditional input.

FLUX.1 Kontext[pro] is first trained with a flow objective, and then trained with LADD. Using the technique proposed by Meng et al., a guidance distillation method is applied to a 12-billion-parameter diffusion Transformer model, resulting in FLUX.1 Kontext[dev].

To improve the performance of FLUX.1 Kontext [dev] in editing tasks, the focus is on training for image-to-image generation, without training for pure text-to-image generation.

To prevent the generation of non-consensual intimate images (NCII) and child sexual exploitation material (CSEM), safety training mechanisms are introduced, including classifier-based screening and adversarial training.

Researchers use FSDP2 combined with mixed-precision training: all-gather operations use bfloat16, while reduce-scatter operations for gradients use float32 to improve numerical stability.

A selective activation checkpoint mechanism is also used to reduce maximum memory usage.

To enhance throughput, Flash Attention is adopted, and local compilation optimization is applied to each Transformer module.

The above shows the effect on photographic works. (a) Input image, displaying a complete outfit. (b) Extracted dress, placed on a white background, presenting a product photography style. (c) Close-up of the dress fabric, highlighting its texture and pattern details.

Currently, FLUX.1 Kontext still has some limitations in practical applications, such as potentially introducing visual artifacts and degrading image quality when excessive multi-round editing occurs.

However, using different models for iterative editing based on the same initial image and identical editing prompts (top: FLUX.1 Kontext, middle: gpt-image-1, bottom: Runway Gen4), FLUX.1 Kontext demonstrates superior performance in preserving facial features compared to other models.

The release of FLUX.1 Kontext and KontextBench provides a solid foundation and comprehensive evaluation framework for unified research in image generation and editing, promising to drive continuous progress in this field.

References:

https://bfl.ai/announcements/flux-1-kontext

https://cdn.sanity.io/files/gsvmb6gz/production/880b072208997108f87e5d2729d8a8be481310b5.pdf

This article is from the WeChat public account "New Intelligence", author: Editor: Ding Hui, published with authorization from 36Kr.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

All-in station

The ringleader of a cryptocurrency scam worth approximately $74 million has been sentenced to 20 years in prison.

Coin68

MegaETH launches mainnet, aiming for 50,000 TPS and a Block Time of 10 milliseconds.

MEGA

BlockTempo

Bitcoin surged to $71,000, mirroring the rise in US stocks, while Ethereum fluctuated and then fell back to $2,150. Bitmine buy the dips the opportunity to buy 80 million worth of ETH.

BTC

0.49%