[Guide] FLUX.1 Kontext is a next-generation model that integrates instant text-image editing and text-to-image generation, supporting text and image prompts with strong character consistency and a speed up to 8 times faster than GPT-Image-1.
When generating and editing images with AI, if you want to create a complete story template, but the protagonist "changes face" faster than turning a page, what can you do?
Don't panic, the brand new image model FLUX.1 Kontext is here! It supports in-context image generation, can use both text and image prompts, and can seamlessly extract and modify visual concepts to generate new, coherent images.
Paper address: https://bfl.ai/announcements/flux-1-kontext
FLUX.1 Kontext is a series of generative flow matching models that can generate and edit images. Unlike existing text-to-image models, the FLUX.1 Kontext series supports in-context image generation.
Consistent and Context-Aware Text and Image Generation and Editing
Your Image, Your Text, Your World
FLUX.1 Kontext marks an important extension of classic text-to-image models by merging instant text-image editing with text-to-image generation.
As a multimodal flow model, it combines state-of-the-art character consistency, context understanding, and local editing capabilities, while possessing powerful text-to-image synthesis abilities.
[The rest of the translation follows the same approach, maintaining technical terminology and preserving the original structure.]Implementation Details
Starting from a model checkpoint for generating images from pure text, jointly fine-tune the model for image-to-image generation and text-to-image generation tasks.
Although the method naturally supports multiple input images, it currently focuses on using a single image as a conditional input.
FLUX.1 Kontext[pro] is first trained with a flow objective, and then trained with LADD. Using the technique proposed by Meng et al., a guidance distillation method is applied to a 12-billion-parameter diffusion Transformer model, resulting in FLUX.1 Kontext[dev].
To improve the performance of FLUX.1 Kontext [dev] in editing tasks, the focus is on training for image-to-image generation, without training for pure text-to-image generation.
To prevent the generation of non-consensual intimate images (NCII) and child sexual exploitation material (CSEM), safety training mechanisms are introduced, including classifier-based screening and adversarial training.
Researchers use FSDP2 combined with mixed-precision training: all-gather operations use bfloat16, while reduce-scatter operations for gradients use float32 to improve numerical stability.
A selective activation checkpoint mechanism is also used to reduce maximum memory usage.
To enhance throughput, Flash Attention is adopted, and local compilation optimization is applied to each Transformer module.
The above shows the effect on photographic works. (a) Input image, displaying a complete outfit. (b) Extracted dress, placed on a white background, presenting a product photography style. (c) Close-up of the dress fabric, highlighting its texture and pattern details.
Currently, FLUX.1 Kontext still has some limitations in practical applications, such as potentially introducing visual artifacts and degrading image quality when excessive multi-round editing occurs.
However, using different models for iterative editing based on the same initial image and identical editing prompts (top: FLUX.1 Kontext, middle: gpt-image-1, bottom: Runway Gen4), FLUX.1 Kontext demonstrates superior performance in preserving facial features compared to other models.
The release of FLUX.1 Kontext and KontextBench provides a solid foundation and comprehensive evaluation framework for unified research in image generation and editing, promising to drive continuous progress in this field.
References:
https://bfl.ai/announcements/flux-1-kontext
https://cdn.sanity.io/files/gsvmb6gz/production/880b072208997108f87e5d2729d8a8be481310b5.pdf
This article is from the WeChat public account "New Intelligence", author: Editor: Ding Hui, published with authorization from 36Kr.




