Z-IMAGE

Generates images from text and reference images using Tongyi-MAI’s ultra-fast 6B Z-Image Turbo model for fast, high-quality visual results.

Avg Run Time: 10.000s

Model Slug: z-image-turbo-image-to-image

Release Date: December 8, 2025

Playground

Input

Prompt*

Image URL*

Enter a URL or choose a file from your computer.

Invalid URL.

(Max 50MB)

Image Size

Number of Inference Steps

Strength

Number of Images

Enable Safety Checker

Enable Prompt Expansion

Output Format

Acceleration

Seed

Output

Example Result

Preview and download your result.

Your request will cost $0.005 per megapixel for output.

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents

Overview

Technical Specifications

Key Considerations

Tips & Tricks

Capabilities

What Can I Use It For?

Things to Be Aware Of

Limitations

Overview

z-image-turbo-image-to-image — Image-to-Image AI Model

z-image-turbo-image-to-image, developed by Zhipu AI as part of the z-image family and powered by Alibaba’s Tongyi Lab, delivers ultra-fast image transformations using its 6 billion parameter Z-Image Turbo model. This image-to-image AI model excels in sub-second inference on enterprise hardware, enabling seamless enhancements from subtle upscaling to dramatic artistic reimaginings controlled by a single strength parameter. Developers and creators seeking a Zhipu AI image-to-image solution appreciate its bilingual text rendering and precise control, generating high-quality visuals at resolutions up to 2048x2048 pixels without switching tools.

Unlike traditional editors, z-image-turbo-image-to-image operates on a flexible spectrum: low strength values sharpen and detail reference images like advanced upscaling, while high values use the input as loose inspiration for new compositions. Integrated prompt enhancement ensures optimal results from simple inputs, making it ideal for AI image editor API integrations in workflows demanding speed and quality.

Technical Specifications

What Sets z-image-turbo-image-to-image Apart

The z-image-turbo-image-to-image model stands out in the competitive landscape of image-to-image AI models through its Scalable Single-Stream DiT (S3-DiT) architecture and optimized 8-step generation process, achieving sub-second latency on H800 GPUs or 16GB consumer devices. This enables real-time transformations unattainable by heavier competitors, with custom output sizing independent of input dimensions and bilingual English-Chinese text rendering for global applications.

Strength parameter control (0.0-1.0): Fine-tunes transformation intensity from quality enhancement (0.0-0.3) to creative reimagination (0.8-1.0), allowing one model to handle upscaling, style transfer, and composition changes. Users gain precise outputs without multiple tools, streamlining edit images with AI pipelines.
Sub-second inference with 6B parameters: Processes images in under 1 second via 8 NFEs, fitting 16GB VRAM. This supports high-volume z-image-turbo-image-to-image API calls for production apps needing instant feedback.
Built-in prompt enhancer and flexible resolutions: Automatically refines prompts and supports outputs from 512x512 to 2048x2048 pixels in PNG format. It empowers consistent, photorealistic results tailored to e-commerce or design needs.

These specs—total pixels 512512 to 20482048, default 1024x1024—position z-image-turbo-image-to-image as a leader for fast, versatile image editing.

Key Considerations

Use minimal sampling steps (e.g., 9) for maximum speed, but increase to 20+ for higher detail in complex scenes
Optimize VRAM usage with quantized versions like FP8 or GGUF to fit on 16-24GB consumer GPUs
Balance quality and speed: lower steps prioritize rapidity but may reduce fine details compared to larger models
Prompt with clear, descriptive language emphasizing style, lighting, and composition for best photorealism
Avoid overly abstract or highly intricate prompts initially, as the model's distillation favors straightforward semantic understanding
Test on local hardware to account for variability in inference time based on GPU and optimizations

Tips & Tricks

How to Use z-image-turbo-image-to-image on Eachlabs

Access z-image-turbo-image-to-image through Eachlabs Playground for instant testing, API for scalable integrations, or SDK for custom apps—simply provide a prompt, reference image (URL or upload), strength (0-1, default 0.6), optional width/height (up to 2048x2048), and seed. Receive high-quality PNG outputs in sub-seconds with photorealistic detail, bilingual text, and precise transformations optimized for production workflows.

---

Capabilities

Generates high-quality photorealistic images with excellent detail preservation at ultra-low latency
Accurate bilingual text rendering in posters, graphics, and small fonts with proper alignment and typography
Strong semantic reasoning and world knowledge for logical, culturally grounded outputs
Versatile across styles, from realistic scenes to creative compositions, matching larger models in fidelity
Efficient local inference on 16-24GB GPUs, enabling real-time generation
Superior speed in benchmarks, nearly twice as fast as next competitors for batch processing

What Can I Use It For?

Use Cases for z-image-turbo-image-to-image

E-commerce developers building automated image editing API tools can upload product photos with prompts like "enhance this shoe image with studio lighting and add 'Summer Sale 50% Off' text in elegant font," using low strength (0.3) for quick upscaling and bilingual text overlays that maintain photorealism without manual Photoshop work.

Graphic designers handling style transfers feed reference art plus "transform this landscape photo into cyberpunk neon cityscape, strength 0.7," preserving composition while dramatically altering aesthetics in sub-seconds—perfect for rapid iterations on posters or thumbnails where text legibility in English or Chinese is crucial.

Marketing teams for global campaigns use z-image-turbo-image-to-image to reimagine brand visuals: input a hero banner image and prompt "restyle as minimalist Scandinavian interior with product placement," leveraging high strength (0.9) and custom sizing for diverse social media formats, cutting production time from hours to moments.

Content creators experimenting with artistic edits apply the model's prompt enhancer on inputs like "turn this portrait into watercolor painting with Chinese poetry overlay," ensuring robust instruction adherence and reproducible seeds for series consistency in blogs or videos.

Things to Be Aware Of

Runs efficiently on 24GB GPUs like mobile 5090, with usage closer to 24GB unoptimized; quantized versions reduce to 16GB
Outputs closely resemble leading models like Flux.2 Dev in quality but with extreme speed trade-off
Common MacOS issues include KSampler float8 conversion errors, resolvable with GGUF custom nodes
Consistent high aesthetic quality in benchmarks, especially photorealism, but may lack ultra-fine details of massive models
Positive feedback on speed and local runnability: "one of the fastest offline models I've seen" and "fantastic overall"
Variability in generation time (e.g., 9 seconds to slightly longer for complex prompts) based on hardware and optimizations

Limitations

Distilled design prioritizes speed over maximum detail, potentially underperforming larger models in hyper-intricate or artistic nano-level quality
Higher VRAM usage than expected without quantization (up to 24GB); may require optimizations for lower-end hardware
Experimental quantized variants (FP8, GGUF) can encounter platform-specific errors like float8 issues on MacOS

AI TRENDS

Related AI Models

You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.

Image to Image

Nano Banana 2 Edit enables advanced image-to-image transformations, delivering ultra high quality refinements, seamless edits, and precise control guided by your prompt.

Nano Banana 2 | Edit

50 s

Image to Image

Generates new images by blending styles and visual elements from your prompt and multiple reference images, enabling seamless combinations such as outfits from separate fashion items or portraits merged with scenic backgrounds.

Bytedance | Seedream | v5 | Lite | Edit

50 s

Image to Image

P-image Edit is an image editing model that applies precise, high-quality edits from text prompts with fast performance and consistent results, built for production use cases.

P Image | Edit

6 s

Image to Image

A face swap model automatically replaces the face in an image with another face while preserving expressions, lighting, and overall realism

AI Face Swap V1

10 s

Explore More