each::sense is live
Eachlabs | AI Workflows for app builders
flux-2-klein-9b-base-text-to-image

FLUX-2

FLUX.2 [klein] 9B Base from Black Forest Labs delivers text-to-image generation with enhanced realism, sharper text rendering, and built-in native editing capabilities.

Avg Run Time: 7.000s

Model Slug: flux-2-klein-9b-base-text-to-image

Playground

Input

Advanced Controls

Output

Example Result

Preview and download your result.

Preview
Your request will cost $0.002 per megapixel for output.

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents
Overview
Technical Specifications
Key Considerations
Tips & Tricks
Capabilities
What Can I Use It For?
Things to Be Aware Of
Limitations

Overview

FLUX.2 [klein] 9B Base is a compact rectified flow transformer developed by Black Forest Labs, released in January 2026. This model represents a significant advancement in efficient image generation, combining a 9 billion parameter flow model with an 8 billion parameter Qwen3 text embedder to deliver high-quality text-to-image generation in a unified architecture. The 9B Base variant is specifically designed for users who prioritize maximum quality and customization flexibility over speed, using 25-50 step sampling schedules that enable detailed, high-resolution outputs with extensive control over generation parameters.

The model unifies text-to-image generation and image editing capabilities within a single architecture, eliminating the need for separate specialized models. FLUX.2 [klein] 9B Base is positioned on the Pareto frontier for quality versus latency and VRAM usage, matching or exceeding the performance of models five times its size while maintaining practical resource requirements for high-end consumer hardware. The architecture leverages step distillation and guidance distillation techniques to optimize performance, though the Base variant retains longer sampling schedules compared to the distilled variants, providing researchers and advanced users with greater flexibility for fine-tuning and custom pipeline development.

Technical Specifications

Architecture
Rectified flow transformer with unified generation and editing support
Parameters
9 billion parameter flow model plus 8 billion parameter Qwen3 text embedder
Resolution
Supports 1024x1024 resolution for text-to-image generation, with capability for higher resolutions
Input/Output formats
Text prompts for generation, image inputs for editing tasks, outputs in standard image formats
Sampling steps
25-50 steps per generation or edit (configurable)
VRAM requirements
24GB or more recommended for optimal performance
Inference time
Several seconds per image due to extended sampling schedules
Text encoder
Qwen3 8B (qwen38b_fp8mixed.safetensors)
Quantization support
FP8 quantization available for reduced VRAM usage
License
Non-Commercial License (FLUX NCL)

Key Considerations

  • The 9B Base model is optimized for maximum quality and customization rather than speed, making it suitable for production workflows where generation time is less critical than output fidelity
  • Requires 24GB or more VRAM, limiting deployment to high-end consumer GPUs such as RTX 4090 or professional GPU systems
  • The Non-Commercial license restricts use to non-commercial applications, research, and personal projects
  • Longer sampling schedules (25-50 steps) provide extensive control over generation parameters but result in slower inference compared to distilled variants
  • Best suited for users who need fine-tuning capabilities, custom pipeline development, and maximum flexibility in generation parameters
  • CFG scale of 5.0 is recommended for optimal balance between prompt adherence and creative freedom
  • Correct text encoder selection is critical to avoid shape mismatch errors during inference
  • The model excels at producing sharp, detailed outputs with high visual fidelity when given adequate sampling steps

Tips & Tricks

  • Use CFG scale of 5.0 as a baseline for text-to-image generation, adjusting upward for stricter prompt adherence or downward for more creative variation
  • Experiment with sampling steps between 25-50 to find the optimal balance between quality and generation time for your specific use case
  • For maximum detail in high-resolution outputs, maintain sampling steps toward the higher end of the range (40-50 steps)
  • Structure prompts with specific descriptive terms and artistic styles to leverage the model's strong prompt fidelity capabilities
  • When using image editing features, provide clear reference images and detailed editing instructions in your prompts for best results
  • Ensure the correct Qwen3 8B text encoder is loaded before inference to prevent technical errors
  • For iterative refinement, use the model's flexibility to adjust CFG scale and sampling steps between generations rather than modifying prompts alone
  • Take advantage of the unified architecture by combining text-to-image and editing workflows within the same pipeline without model switching
  • When fine-tuning or developing custom pipelines, the Base variant's longer sampling schedules provide superior control compared to distilled alternatives

Capabilities

  • Generates photorealistic images from text descriptions with exceptional prompt fidelity and visual accuracy
  • Produces sharp text rendering within generated images, addressing a common weakness in many image generation models
  • Supports unified text-to-image generation and image editing within a single model architecture
  • Enables multi-reference image editing, allowing generation based on multiple reference images simultaneously
  • Delivers high diversity in outputs, producing varied results from similar prompts
  • Provides extensive customization through adjustable sampling steps and CFG parameters
  • Supports fine-tuning and custom pipeline development through the Base model variant
  • Generates images at 1024x1024 resolution with capability for higher resolutions
  • Maintains quality consistency across different prompt types and artistic styles
  • Offers flexibility for research applications and experimental workflows

What Can I Use It For?

  • Professional digital art and illustration creation with precise control over visual elements
  • High-quality concept art generation for game development, film production, and creative industries
  • Detailed product visualization and marketing imagery for e-commerce applications
  • Research and experimentation in generative AI and computer vision applications
  • Custom pipeline development for specialized image generation workflows
  • Fine-tuning for domain-specific image generation tasks in academic and research settings
  • Educational projects exploring advanced image generation techniques and flow-based models
  • Artistic exploration and creative experimentation with photorealistic image synthesis
  • Detailed illustration work requiring sharp text rendering and high visual fidelity
  • Multi-reference image editing for complex visual composition tasks

Things to Be Aware Of

  • The model demonstrates strong performance on anatomy and hand rendering compared to some competitors, though users report occasional inconsistencies in complex anatomical scenarios
  • Generation times are significantly longer than distilled variants due to 25-50 step sampling schedules, typically requiring several seconds per image
  • The Non-Commercial license restricts commercial deployment, limiting business applications to licensed variants
  • VRAM requirements of 24GB or more restrict deployment to high-end consumer hardware, making it inaccessible for users with lower-specification GPUs
  • The 9B Base model can produce sharper or more appealing results in some cases compared to the 9B Distilled variant despite using more steps, indicating quality benefits from extended sampling
  • Multi-reference image editing capabilities are more robust in the 9B models compared to smaller 4B variants, though users report occasional inconsistencies requiring multiple renders
  • The model shows strong performance on the Pareto frontier for quality versus latency and VRAM usage, outperforming competing models like Z-Image while supporting unified generation and editing
  • Users report that careful prompt engineering and parameter tuning are necessary to achieve optimal results, particularly for complex or detailed image requirements
  • The model integrates well with ComfyUI and other image generation frameworks, with proper text encoder configuration being critical for successful deployment

Limitations

  • Non-Commercial license restricts use to non-commercial applications, research, and personal projects, excluding commercial deployment without additional licensing
  • Requires 24GB or more VRAM, limiting accessibility to users with high-end consumer GPUs or professional hardware
  • Longer inference times due to 25-50 step sampling schedules make it less suitable for real-time or interactive applications compared to distilled variants