each::sense is in private beta.
Eachlabs | AI Workflows for app builders
flux-2-klein-4b-edit

FLUX-2

Flux 2 [klein] 4B Base from Black Forest Labs provides image-to-image editing with precise natural-language controls and hex color–based adjustments.

Avg Run Time: 7.000s

Model Slug: flux-2-klein-4b-edit

Playground

Input

Advanced Controls

Output

Example Result

Preview and download your result.

Preview
Your request will cost $0.001 per megapixel for output.

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents
Overview
Technical Specifications
Key Considerations
Tips & Tricks
Capabilities
What Can I Use It For?
Things to Be Aware Of
Limitations

Overview

FLUX.2 [klein] 4B is a lightweight yet powerful image-to-image editing model developed by Black Forest Labs. It is a 4 billion parameter rectified flow transformer that unifies text-to-image generation and image editing capabilities in a single, compact architecture. The model represents the fastest offering in the FLUX family, designed specifically for interactive workflows, real-time previews, and latency-critical applications where speed is essential without sacrificing quality. The model uses a latent flow matching architecture rather than traditional diffusion approaches, learning direct paths between noise and clean images for improved efficiency and consistency. What distinguishes FLUX.2 [klein] is its ability to handle complex visual tasks typically challenging for smaller models, including accurate text rendering in images, proper spatial reasoning with realistic lighting and shadows, and high-resolution editing up to 4 megapixels while preserving detail and coherence. The model is fully open source under Apache 2.0 license, enabling commercial use without licensing fees, and runs on consumer-grade GPUs like the RTX 3090 or 4070.

Technical Specifications

Architecture
Rectified flow transformer with vision-language model (based on Mistral 3) and specialized FLUX.2 VAE autoencoder
Parameters
4 billion parameters
Model Variants
4B Base (undistilled for fine-tuning) and 4B Distilled (4-step for production speed)
Resolution
Supports image generation and editing up to 4 megapixels
Input/Output formats
Text prompts (up to 10,000 characters), image inputs for editing, multiple reference images for multi-reference editing
Inference Speed
Distilled variant achieves sub-second inference (approximately 1.2 seconds on RTX 5090), Base variant approximately 17 seconds on RTX 5090
VRAM Requirements
Distilled model requires 8.4GB VRAM, Base model requires 9.2GB VRAM
Generation Steps
Distilled model optimized for 4 steps; configurable from 1 to 50 steps for quality vs speed balance
CFG Scale
Range from 1 to 20 with default of 3.5 for controlling prompt adherence
Pricing
$0.012 per image for editing operations with flat-rate pricing regardless of image size

Key Considerations

  • The distilled 4B variant is optimized for speed and production deployments, while the Base variant is better suited for fine-tuning and custom pipelines requiring maximum flexibility
  • CFG scale controls how closely the model follows your prompt; higher values (closer to 20) enforce stricter adherence to descriptions while lower values allow more creative interpretation
  • The model supports negative prompts (optional) to specify what should not appear in the image, with both positive and negative prompts supporting up to 10,000 characters
  • Acceleration settings control the speed vs quality tradeoff, with options for none, low, medium, or high acceleration (default is high)
  • For optimal results with text rendering in images, provide clear and specific descriptions of typography, layout, and text content desired
  • Multi-reference editing requires providing multiple reference images for context-aware composition and consistency across generations
  • Output dimensions can be optionally set or left empty to match input image dimensions
  • The model uses a seed parameter for reproducibility; setting seed to -1 generates random results

Tips & Tricks

  • Start with the default CFG scale of 3.5 and adjust upward (toward 20) if the model is not following your prompt closely enough, or downward if results feel too constrained
  • For text rendering in images, be explicit about font style, size, and layout requirements; the model handles complex typography better than most smaller models
  • Use the 4-step distilled variant for interactive applications where sub-second response times are critical; use the Base variant when you need to fine-tune the model for specific domains or styles
  • Leverage multi-reference image inputs to maintain consistent character appearances, product designs, or brand aesthetics across multiple edited images
  • For high-resolution editing up to 4 megapixels, the model preserves geometry and texture during edits rather than hallucinating new details, making it suitable for detailed product photography and professional imagery
  • Experiment with acceleration settings to find your optimal balance; higher acceleration means faster results but may impact quality
  • Use negative prompts to exclude unwanted elements; for example, if editing a portrait, you might specify negative prompts to avoid artifacts or unwanted style changes
  • For iterative editing workflows, the sub-second inference speed allows for rapid refinement cycles and real-time preview capabilities
  • Structure prompts with specific descriptive language about lighting, materials, and spatial relationships to leverage the model's strong understanding of physics-based rendering

Capabilities

  • Text-to-image generation with accurate, readable text rendering in complex layouts, infographics, and user interface mockups
  • Image-to-image editing with natural language descriptions for style transforms, content modification, and effect application
  • Multi-reference editing supporting multiple input images for context-aware composition and consistent character or product rendering
  • High-resolution editing up to 4 megapixels while maintaining detail and coherence
  • Spatial reasoning with realistic lighting, proper shadow placement, and correct perspective relationships
  • Semantic editing capabilities including object replacement, removal, and style transformation
  • Iterative editing support enabling rapid refinement cycles
  • Sub-second inference for interactive workflows and real-time applications
  • Reference-to-image generation for maintaining visual consistency across multiple outputs
  • Built-in prompt enhancer tool to automatically improve prompts for better results
  • Flexible output sizing with optional dimension specification
  • Reproducible results through seed control

What Can I Use It For?

  • Professional product photography editing and variation generation for e-commerce applications
  • User interface and mockup generation with accurate text rendering for design workflows
  • Character consistency maintenance across multiple scenes for creative projects and storytelling
  • Brand asset generation with consistent styling across product variations
  • Real-time interactive design tools requiring sub-second response times
  • Infographic and data visualization creation with readable text and proper layout
  • Style transfer and aesthetic transformation of existing images
  • Object removal and replacement in photographs while maintaining spatial coherence
  • Fine-tuning for specialized domains through the undistilled Base model variant
  • Edge deployment and local development on consumer-grade hardware
  • Production deployments where latency is a critical constraint
  • Iterative design refinement with rapid preview cycles

Things to Be Aware Of

  • The distilled 4B variant achieves its speed through optimization for 4-step inference; while more steps can improve quality, they increase generation time
  • The model runs efficiently on consumer GPUs but requires approximately 8.4GB VRAM for the distilled variant and 9.2GB for the Base variant
  • While the model excels at text rendering compared to other small models, extremely complex or stylized typography may still require careful prompt engineering
  • The rectified flow architecture differs from traditional diffusion models, which may require different prompt engineering approaches for users familiar with other image generation models
  • Multi-reference editing works best when reference images are clearly related to the desired output; ambiguous or conflicting references may produce inconsistent results
  • The model's spatial reasoning is strong, but highly unusual or physically impossible scenarios may still produce unexpected results
  • Acceleration settings provide a tradeoff between speed and quality; maximum acceleration prioritizes speed over output refinement
  • The model supports up to 10,000 characters in prompts, but extremely long or complex prompts may not always improve results; clarity and specificity are more important than length
  • Output quality scales with input image resolution; editing at maximum 4 megapixels requires sufficient computational resources
  • The Apache 2.0 open source license enables commercial use, but users should verify compliance with their specific use case requirements

Limitations

  • As a 4 billion parameter model, it may not match the quality or capability range of larger foundation models for extremely complex or highly specialized visual tasks
  • The model is optimized for speed, which means it may not achieve the same level of detail refinement as slower, larger models in scenarios where inference time is not a constraint
  • While the model handles text rendering well for its size, it may still struggle with extremely small text, highly stylized fonts, or text in non-Latin scripts compared to larger specialized models