ILLUSION-DIFFUSION

Illusion Diffusion creates artistic and surreal visuals using advanced diffusion algorithms

Avg Run Time: 9.000s

Model Slug: illusion-diffusion-hq

Playground

Input

Prompt*

Qr Code Content

Negative Prompt

Guidance Scale

Width

Height

Image

Enter a URL or choose a file from your computer.

Invalid URL.

image/jpeg, image/png, image/jpg, image/webp (Max 50MB)

Controlnet Conditioning Scale

Border

qrcode_background

Advanced Controls

Output

Example Result

Preview and download your result.

The total cost depends on how long the model runs. It costs $0.001080 per second. Based on an average runtime of 9 seconds, each run costs about $0.009720. With a $1 budget, you can run the model around 102 times.

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents

Overview

Technical Specifications

Key Considerations

Tips & Tricks

Capabilities

What Can I Use It For?

Things to Be Aware Of

Limitations

Overview

illusion-diffusion-hq — Image-to-Image AI Model

illusion-diffusion-hq from Stability revolutionizes image-to-image AI model workflows by transforming single composite images into layered, manipulable RGBA components using advanced diffusion algorithms, enabling surreal artistic edits and 2.5D animations without manual separation. Developed as part of the illusion-diffusion family, this model excels at decomposing complex visuals like anime characters into semantic body parts with pixel-perfect transparency and hidden geometry, solving the challenge of intricate layer stratification such as interleaving hair strands. Users searching for Stability image-to-image tools find illusion-diffusion-hq ideal for high-fidelity reconstructions at 1024x1024 resolution, producing outputs ready for real-time animation and professional applications.

Technical Specifications

What Sets illusion-diffusion-hq Apart

illusion-diffusion-hq stands out in the image-to-image AI model landscape through its two-stage latent-diffusion training with a Body Part Consistency Module, which enforces global geometric coherence across predicted layers during denoising. This enables seamless handling of occluded regions and cross-layer interactions, delivering complete RGBA reconstructions that maintain the original SDXL latent-space distribution for stable fine-tuning.

Layered RGBA Decomposition: Generates independent semantic layers (e.g., hair, face, body) with transparency from a single input image at 1024x1024 resolution; this allows dynamic 2.5D puppet animations with physics-based motion like spring hair dynamics, far beyond flat image edits.
Body Part Consistency Module: Inserts part-dimensional attention in the U-Net to couple layers via occlusion boundaries; users gain improved completeness and coherence for applications like talking-head VTubing, preserving anime line work during real-time facial deformations.
Pseudo-Depth Inference: Predicts drawing-order and hidden geometry in ~10 seconds on RTX 4090; facilitates parallax effects and re-appearance of occluded parts in animations, differentiating it from standard diffusion models lacking multi-layer awareness.

Processing takes ~74 seconds per 1024x1024 image for decomposition on high-end GPUs, supporting RGBA outputs optimized for Live2D-style workflows and real-time physics integration.

Key Considerations

Use clear, structured prompts to achieve desired artistic effects and maintain control over image composition
Reference images or QR codes can be used to guide structure and style, improving consistency and creative direction
Start with lower resolutions (1K or 2K) for drafts, then upscale to 4K for final outputs to optimize speed and resource usage
Batch generation is possible, but iterative refinement with small batches (3-5 images) yields better results
Experiment with aspect ratios and style tags to match the intended use case (e.g., square for social media, ultrawide for banners)
Prompt engineering is crucial; separating subjects, styles, and instructions leads to more predictable results
Avoid overly complex prompts that may confuse the model or reduce output quality

Tips & Tricks

How to Use illusion-diffusion-hq on Eachlabs

Access illusion-diffusion-hq seamlessly on Eachlabs via the Playground for instant testing, API for scalable integrations, or SDK for custom apps. Upload a single composite image, add optional prompts for style guidance, and select 1024x1024 resolution to generate layered RGBA outputs with transparency and depth—perfect for animation pipelines. Eachlabs delivers fast inference with high-fidelity results optimized for professional workflows.

---

Capabilities

Generates high-resolution, artistic, and surreal images with impressive detail and clarity
Supports advanced control mechanisms via ControlNet, enabling guided generation with reference images or QR codes
Excels at blending multiple visual styles and introducing imaginative elements into realistic scenes
Produces outputs suitable for professional design, marketing, and creative projects
Versatile in aspect ratios and formats, adaptable to various use cases from social media to large-scale prints
Delivers consistent quality across diverse prompts, especially when best practices are followed

What Can I Use It For?

Use Cases for illusion-diffusion-hq

Anime Creators and VTubers: Feed a single anime illustration into illusion-diffusion-hq to extract stratified RGBA layers for puppet animations; add physics to hair and clothing for parallax depth, creating engaging Live2D models without tedious manual masking. This Stability image-to-image capability powers real-time talking-head systems that track expressions while retaining 2D aesthetic details.

Game Developers Building AI Image Editors: Developers seeking an illusion-diffusion-hq API can decompose character sprites into editable parts; for instance, input a composite hero image and prompt "separate hair, armor, and background with transparency," yielding manipulable layers for procedural animations in engines like Unity.

Digital Artists for Surreal Edits: Artists use the model's layer consistency to reimagine photos with surreal elements; upload a portrait and generate "interleave ethereal hair strands over face with glowing depth," producing high-res composites ready for animation, ideal for experimental visuals in e-commerce or NFTs.

Animation Studios: Production teams automate preprocessing for 2.5D assets; the pseudo-depth feature ensures accurate occlusion handling, enabling fluid character rigging from static art in minutes rather than hours of artist labor.

Things to Be Aware Of

Some experimental features (e.g., QR code integration) may behave unpredictably depending on prompt complexity
Users report occasional quirks with color rendering and style blending, especially in highly abstract prompts
Performance is generally strong for 1K-2K images; 4K generation requires more memory and may be slower on consumer hardware
Consistency improves with structured prompts and reference images; vague or ambiguous prompts can lead to unexpected results
Positive feedback highlights the model's ability to produce visually stunning, imaginative artwork with minimal effort
Common concerns include occasional artifacts in highly detailed scenes and the need for prompt refinement to achieve optimal results
Resource requirements are moderate for standard resolutions but increase significantly for batch generation or 4K outputs

Limitations

May produce artifacts or inconsistent results with overly complex or ambiguous prompts
4K image generation can be resource-intensive and slower on non-specialized hardware
Not optimal for strictly photorealistic outputs; excels more in artistic and surreal domains than in pure realism

Pricing

Pricing Detail

This model runs at a cost of $0.001080 per second.

The average execution time is 9 seconds, but this may vary depending on your input data.

The average cost per run is $0.009720

Pricing Type: Execution Time

Cost Per Second means the total cost is calculated based on how long the model runs. Instead of paying a fixed fee per run, you are charged for every second the model is actively processing. This pricing method provides flexibility, especially for models with variable execution times, because you only pay for the actual time used.

AI TRENDS

Related AI Models

You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.

Image to Image

Kling Image V3 is the latest image generation model from Kling, delivering improved quality, consistency, and visual detail.

Kling | v3 | Image to Image

60 s

Image to Image

Generates images from text combined with edge, depth, or pose inputs using Tongyi-MAI’s ultra-fast 6B Z-Image Turbo model for precise and high-quality results.

Z Image | Turbo | Controlnet

12 s

Image to Image

Flux 2 [klein] 9B Base from Black Forest Labs supports precise image-to-image editing with natural-language instructions and hex color–based control.

Flux 2 | Klein | 9B | Base | Edit

10 s

Image to Image

Flux 2 [klein] 4B Base from Black Forest Labs provides image-to-image editing with precise natural-language controls and hex color–based adjustments.

Flux 2 | Klein | 4B | Edit

7 s

Explore More