each::sense is live
Eachlabs | AI Workflows for app builders
flux-2-klein-4b-base-edit

FLUX-2

Flux 2 [klein] 4B from Black Forest Labs enables precise image-to-image editing using natural-language instructions and hex color control.

Avg Run Time: 10.000s

Model Slug: flux-2-klein-4b-base-edit

Playground

Input

Advanced Controls

Output

Example Result

Preview and download your result.

Preview
Your request will cost $0.001 per megapixel for output.

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents
Overview
Technical Specifications
Key Considerations
Tips & Tricks
Capabilities
What Can I Use It For?
Things to Be Aware Of
Limitations

Overview

Flux 2 [klein] 4B Base is a compact image generation and editing model developed by Black Forest Labs, designed to deliver professional-grade visual quality while maintaining accessibility for consumer hardware. The model unifies text-to-image generation, single-image editing, and multi-reference image editing capabilities within a single 4-billion parameter architecture, eliminating the need for separate specialized models. Built on a rectified flow transformer foundation with a Qwen3-based text encoder, this model represents a significant advancement in efficient visual intelligence by achieving state-of-the-art quality with inference times under one second on high-end consumer GPUs. The 4B Base variant is fully open under the Apache 2.0 license, making it suitable for both commercial and research applications, and distinguishes itself through its ability to handle complex editing tasks while running on modest hardware configurations.

The model family was engineered specifically to address the latency and resource constraints that have historically limited real-time creative workflows. Unlike distilled variants optimized purely for speed, the Base version preserves the complete training signal without distillation, providing maximum flexibility for fine-tuning, LoRA training, and custom pipeline development. This makes it particularly valuable for users who prioritize output quality and customization over raw inference speed, while still maintaining practical performance characteristics suitable for production environments.

Technical Specifications

Architecture
Rectified flow transformer with Qwen3-based text encoder
Parameters
4 billion
Resolution
Supports 1024x1024 and higher resolutions, capable of generating 4MP images
Input/Output formats
Text prompts for generation, image inputs for editing, supports multi-reference image inputs
Inference steps
25-50 steps per generation or edit (configurable)
VRAM requirements
Approximately 12-13GB for standard operation on consumer GPUs like RTX 3090/4070
Quantization support
FP8 (up to 1.6x faster, 40% less VRAM), NVFP4 (up to 2.7x faster, 55% less VRAM)
Generation speed
Sub-second inference on RTX 5090 (as low as 1.2 seconds for 4MP images), practical performance on 8GB-12GB VRAM setups
License
Apache 2.0 (fully open for commercial use)

Key Considerations

  • The Base variant uses 25-50 inference steps, providing superior quality compared to distilled models but requiring more computational resources and longer generation times
  • VRAM management is critical; the model requires approximately 12-13GB VRAM for comfortable operation, making it suitable for mid-range consumer GPUs but not entry-level hardware
  • Multi-reference editing performance is significantly better on the 9B models; the 4B variant handles single-image edits more reliably than complex multi-image compositions
  • Prompt engineering should be precise and detailed for optimal results, particularly when specifying editing instructions or color values
  • The model demonstrates high resilience against violative inputs, having undergone third-party safety evaluation prior to release
  • For production deployments requiring maximum speed, consider the distilled variants; the Base version prioritizes quality and customization flexibility
  • Fine-tuning and LoRA training are viable options with the Base variant due to preserved training signal, but require appropriate GPU resources
  • Character consistency and spatial logic are strong points, making the model suitable for character-focused creative work
  • Iterative refinement is recommended for complex editing tasks; multiple renders may be necessary for achieving specific multi-image edit results

Tips & Tricks

  • Use FP8 quantization to reduce VRAM requirements by approximately 40% while maintaining quality, enabling operation on lower-end consumer GPUs
  • Structure editing prompts with specific spatial references and descriptive language to improve consistency in single-image edits
  • For character work, leverage the model's native character consistency capabilities by providing clear reference images and detailed character descriptions
  • When performing multi-reference editing, start with simpler compositions and gradually increase complexity rather than attempting complex blends immediately
  • Experiment with step counts between 25-50 to find the optimal balance between quality and generation time for your specific use case
  • Use hex color control in editing prompts to achieve precise color modifications without affecting other image elements
  • For production workflows, implement caching strategies to avoid redundant generations when iterating on similar prompts
  • Combine the model with LoRA fine-tuning for domain-specific applications, leveraging the preserved training signal in the Base variant
  • Test prompts on smaller batches first to validate output quality before committing to large-scale generation runs
  • When using the model for commercial applications, ensure compliance with the Apache 2.0 license requirements

Capabilities

  • Photorealistic image generation from natural language text descriptions with high output diversity
  • Precise image-to-image editing using natural language instructions and hex color control
  • Multi-reference image editing, allowing users to blend concepts and iterate on complex compositions
  • Unified architecture supporting text-to-image, single-image editing, and multi-reference editing without model switching
  • Sub-second inference speed on modern consumer hardware, enabling interactive creative workflows
  • Strong character consistency and spatial logic for character-focused applications
  • Fine-tuning and LoRA training capabilities due to undistilled architecture preserving complete training signal
  • High resilience against violative inputs, demonstrated through third-party safety evaluation
  • Flexible inference step configuration (25-50 steps) allowing quality-speed trade-off optimization
  • Quantization support enabling operation on lower-VRAM hardware without significant quality degradation
  • Professional-grade output quality that matches or exceeds larger competing models while using significantly less computational resources

What Can I Use It For?

  • Interactive creative workflows requiring real-time image generation and editing capabilities
  • Character design and iteration for games, animation, and digital art projects
  • Product visualization and mockup generation for e-commerce and design applications
  • Background modification and scene composition for photography and digital art
  • Rapid prototyping of visual concepts for design and creative industries
  • Local development and edge deployment scenarios where cloud connectivity is unavailable or undesirable
  • Fine-tuning for domain-specific applications such as architectural visualization, fashion design, or medical imaging
  • Educational and research applications exploring image generation and editing techniques
  • Production deployments requiring cost-effective visual content generation at scale
  • Iterative design workflows where multiple refinements and variations are needed quickly

Things to Be Aware Of

  • The 4B Base model sometimes produces slightly over-processed or "overcooked" results compared to distilled variants, particularly when using maximum step counts
  • Multi-image editing consistency can be inconsistent on the 4B model and typically requires multiple renders and careful prompt engineering to achieve desired results
  • The model performs significantly better on single-image edits than complex multi-reference compositions; users should manage expectations accordingly
  • VRAM requirements of 12-13GB limit accessibility to users with mid-range or higher consumer GPUs; entry-level hardware may struggle
  • Generation speed, while sub-second on high-end cards like RTX 5090, increases noticeably on lower-tier consumer GPUs
  • The model demonstrates high character consistency and spatial logic, which users consistently report as a major strength in community discussions
  • Users report that the model delivers professional-grade quality suitable for production use despite its compact 4B parameter size
  • The Apache 2.0 license provides commercial freedom, which users appreciate for business applications
  • Community feedback indicates the model represents excellent value for users seeking to run capable image generation locally without cloud dependencies
  • Users note that prompt precision significantly impacts editing quality, particularly for color control and spatial modifications

Limitations

  • Multi-reference image editing performance is notably weaker than the 9B variants; complex compositions with multiple reference images may produce inconsistent results requiring multiple iterations
  • The 4B model is not optimal for commercial text-to-image workflows where maximum quality is the primary concern, as larger models may deliver superior results in some scenarios
  • VRAM requirements of approximately 12-13GB restrict deployment to mid-range consumer hardware and above, limiting accessibility for users with entry-level GPUs