each::sense is in private beta.
Eachlabs | AI Workflows for app builders
tencent-flux-1-srpo-image-to-image

FLUX-TENCENT

FLUX.1 SRPO Image-to-Image [dev] is a 12 billion parameter flow transformer fine-tuned to transform input images into enhanced outputs with superior realism and aesthetics. It preserves the core content of the original image while improving details, lighting, and overall visual quality.

Avg Run Time: 6.000s

Model Slug: tencent-flux-1-srpo-image-to-image

Playground

Input

Enter a URL or choose a file from your computer.

Output

Example Result

Preview and download your result.

Preview
Unsupported conditions - pricing not available for this input format

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents
Overview
Technical Specifications
Key Considerations
Tips & Tricks
Capabilities
What Can I Use It For?
Things to Be Aware Of
Limitations

Overview

The tencent-flux-1-srpo-image-to-image model, also known as FLUX.1 SRPO Image-to-Image, is a 12 billion parameter flow transformer developed by Tencent’s Hunyuan team. It is designed to transform input images into highly realistic and aesthetically enhanced outputs, focusing on preserving the core content while significantly improving details, lighting, and overall visual quality. The model is a fine-tuned version of the popular FLUX.1 Dev base, leveraging advanced optimization techniques to address common shortcomings in AI-generated imagery.

Key features include the use of Direct-Align and Semantic Relative Preference Optimization (SRPO) methods. Direct-Align injects controlled noise into images to stabilize early-stage optimization and reduce overfitting, while SRPO enables dynamic, text-conditioned reward signals for online adjustment of aesthetic standards. These innovations allow the model to achieve a dramatic increase in photorealism and user-perceived quality, with human evaluations reporting up to a 3x improvement in realism compared to the baseline FLUX.1 Dev. The model is particularly noted for its ability to generate lifelike portraits, detailed characters, and immersive landscapes, making it a powerful tool for both creative and professional image enhancement tasks.

Technical Specifications

  • Architecture: Flow Transformer (fine-tuned FLUX.1 Dev)
  • Parameters: 12 billion
  • Resolution: Supports high-resolution outputs; specific benchmarks indicate strong performance at standard diffusion model resolutions (e.g., 512x512 and higher)
  • Input/Output formats: Standard image formats (e.g., PNG, JPEG); supports image-to-image workflows with masking
  • Performance metrics:
  • Human-evaluated realism improved by 3x over baseline
  • 38.9% “excellent” rate in human evaluation (HPDv2 benchmark)
  • Training time reduced by 75x compared to traditional methods (e.g., DanceGRPO)
  • Processes 32 images per batch in under 10 minutes on 32 H20 GPUs

Key Considerations

  • The model excels at removing the “AI look” and producing outputs that are nearly indistinguishable from real photographs.
  • Direct-Align and SRPO techniques require careful prompt engineering to fully leverage their benefits; prompts should clearly specify desired attributes (e.g., “realistic lighting,” “natural skin texture”).
  • Over-optimization at late diffusion steps can be avoided by using Direct-Align, which interpolates between noise and target images for more stable results.
  • For best results, use positive and negative prompt augmentation to guide the model toward desired aesthetics.
  • Batch processing is highly efficient, but resource requirements are significant at high resolutions and large batch sizes.
  • Quality vs speed: The model is faster than previous versions, but higher realism may require slightly longer inference times depending on hardware and settings.

Tips & Tricks

  • Use clear, descriptive prompts that specify both the subject and desired style or realism level (e.g., “portrait of a woman, natural lighting, photorealistic skin”).
  • Employ positive and negative prompt augmentation: add terms for desired features and explicitly negate unwanted artifacts (e.g., “no digital artifacts, no excessive smoothing”).
  • For iterative refinement, start with a conservative prompt and gradually introduce more specific attributes based on output review.
  • Adjust noise injection parameters if available to fine-tune the balance between detail preservation and creative transformation.
  • Use masking workflows for targeted enhancement of specific image regions, such as faces or backgrounds.
  • Leverage inversion-based regularization to maintain consistency across different timesteps and avoid bias accumulation in outputs.
  • For batch processing, monitor GPU memory usage and adjust batch size or resolution as needed to prevent resource bottlenecks.

Capabilities

  • Generates highly photorealistic images with superior detail, lighting, and texture fidelity.
  • Preserves core content and structure of the original image while enhancing visual quality.
  • Supports both portrait and landscape generation, with strong performance on faces, skin, hair, and complex scenes.
  • Offers robust online adjustment of aesthetic standards via text-conditioned reward signals (SRPO).
  • Efficient training and inference, with significant reductions in computational overhead compared to traditional diffusion models.
  • Flexible integration with masking and region-specific enhancement workflows.

What Can I Use It For?

  • Professional photo enhancement and retouching, including restoration of old or low-quality images.
  • Creative art projects, such as generating lifelike character portraits, fantasy scenes, or digital concept art.
  • Business use cases in advertising, marketing, and e-commerce, where product images require photorealistic enhancement.
  • Personal projects, including social media content creation and hobbyist digital art.
  • Industry-specific applications such as film pre-visualization, gaming asset generation, and digital fashion design.
  • Academic and research projects focused on image realism, generative modeling, and AI-driven visual effects.

Things to Be Aware Of

  • Some experimental features, such as advanced masking and region-specific editing, may require additional setup or fine-tuning.
  • Users have reported that the model is particularly effective at reducing the “AI look,” but may still produce occasional artifacts in challenging scenarios (e.g., extreme lighting, unusual compositions).
  • Performance is hardware-dependent; high-resolution outputs and large batch sizes require substantial GPU resources.
  • Consistency across outputs is generally strong, but iterative refinement may be needed for highly specific or nuanced results.
  • Positive user feedback highlights the model’s realism, detail retention, and speed improvements over previous versions.
  • Some users note that while the model excels at photorealism, it may be less suited for highly stylized or abstract image generation.
  • Negative feedback patterns include occasional over-smoothing or loss of fine detail in certain edge cases, particularly when prompts are vague or conflicting.

Limitations

  • High resource requirements for optimal performance, especially at large resolutions and batch sizes.
  • May not be ideal for generating highly stylized, abstract, or non-photorealistic images.
  • Occasional artifacts or over-smoothing can occur in edge cases or with poorly structured prompts.

Pricing

Pricing Type: Dynamic

Charge $0.025 per image generation

Pricing Rules

ParameterRule TypeBase Price
num_images
Per Unit
Example: num_images: 1 × $0.025 = $0.025
$0.025