Z Image | Turbo | Image to Image | Lora

each::sense is in private beta.
Eachlabs | AI Workflows for app builders
z-image-turbo-image-to-image-lora

Z-IMAGE

Creates images from text and reference images with custom LoRA support, powered by Tongyi-MAI’s ultra-fast 6B Z-Image Turbo model for rapid, high-quality generation.

Avg Run Time: 10.000s

Model Slug: z-image-turbo-image-to-image-lora

Release Date: December 8, 2025

Playground

Input

Enter a URL or choose a file from your computer.

Output

Example Result

Preview and download your result.

z-image-turbo-image-to-image-lora
Your request will cost $0.009 per megapixel for output.

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents
Overview
Technical Specifications
Key Considerations
Tips & Tricks
Capabilities
What Can I Use It For?
Things to Be Aware Of
Limitations

Overview

Z-Image-Turbo is a lightweight, ultra-fast image generation model developed by Tongyi-MAI, a lab within Alibaba. It is a distilled version of the original Z-Image model, optimized for rapid inference while matching or exceeding the performance of leading open-source models like Flux.1 Dev in quality and speed.

The model leverages a unique Scalable Single-Stream Multi-Modal Diffusion Transformer (S3-DiT) architecture, which enables dense cross-modal interactions at every layer, allowing superior results from a compact 6 billion parameter design. Key features include text-to-image generation in as few as 9 steps, sub-second latency on high-end GPUs, and compatibility with consumer hardware down to 16GB VRAM. It supports natural language prompts effectively and is positioned for real-time and batch workloads, with an upcoming image-to-image editing variant.

What sets Z-Image-Turbo apart is its efficiency: it outperforms larger models in benchmarks for speed and cost-effectiveness, generating batches of 100 images in under 5 minutes, nearly twice as fast as competitors. Community popularity is evident from high download counts on repositories and enthusiastic discussions highlighting its local run capability and open license.

Technical Specifications

  • Architecture: Scalable Single-Stream Multi-Modal Diffusion Transformer (S3-DiT)
  • Parameters: 6 billion
  • Resolution: Up to 2048x2048 (tested effectively at lower resolutions like 1024 for quality comparisons)
  • Input/Output formats: Text prompts for image generation; supports LoRA integration; upcoming image-to-image editing
  • Performance metrics: Generates images in 9 steps or fewer; batch of 100 images in 279 seconds (4:39 min); sub-second inference on enterprise H800 GPUs; fits in 16-24GB VRAM consumer cards

Key Considerations

  • Use optimized samplers beyond basic Euler for better quality, as basic ones underperform
  • Factor in VRAM usage, which can approach 24GB even on consumer cards without tuning
  • Balance step count (e.g., 9 steps) for speed vs. quality trade-offs, as lower steps prioritize rapidity
  • Prompt engineering benefits from natural language but enhances with style, camera, and lens details for precision
  • Avoid unoptimized code runs to minimize memory overhead and maximize local efficiency
  • Test on target hardware (e.g., 16GB+ VRAM) as performance scales with GPU like RTX 3080 to 5090

Tips & Tricks

  • Optimal parameter settings: 9 inference steps, FP8 or GGUF quantized models for speed on mid-range hardware (8-24GB VRAM); BF16 for highest quality on desktops
  • Prompt structuring advice: Start with natural sentences, add specifics like "movie poster style" or "Runescape screenshot" for targeted outputs; enhance with prompt enhancers for Flux-like results
  • Achieve specific results: For realism, use tested workflows; batch test settings like samplers on resolutions up to 1024x1024
  • Iterative refinement strategies: Compare aesthetic quality across 140+ settings, select winners like advanced samplers; refine prompts incrementally for speed-quality balance
  • Advanced techniques: Integrate LoRA support via tools like DiffSynth-Studio; run GGUF workflows on Apple Silicon or laptops for cross-platform use; benchmark locally with consistent prompts

Capabilities

  • Excels in ultra-fast text-to-image generation, outperforming Flux.1 Dev in most areas with quicker times
  • Handles diverse styles: movie posters, game screenshots (e.g., Runescape), natural language prompts effectively
  • Strong batch processing: 100 images in ~4.5 minutes, ideal for large-scale workloads
  • Versatile on consumer hardware: Runs locally offline in 16-24GB VRAM, sub-second on enterprise GPUs
  • High quality for speed: Matches leading models, validated on independent benchmarks; LoRA compatibility for customization
  • Adaptable to real-world data via efficient training infrastructure

What Can I Use It For?

  • Rapid prototyping of visual concepts, like movie posters and game assets, as shown in hands-on tests
  • Local offline generation for creators needing speed without cloud dependency, per video demos on desktops and laptops
  • Batch image creation for developers, evidenced by benchmark superiority in time and cost
  • Realism-focused workflows, where it serves as a go-to over slower models in community guides
  • Custom model fine-tuning with LoRA, supported in community tools and GitHub discussions

Things to Be Aware Of

  • Performs best locally on 16GB+ VRAM GPUs like RTX 3080/4070/5090 or Apple M4 Pro; scales to 8GB with quantization
  • Outputs decent quality for speed but not always photorealistic like top specialized models; trade-off praised in reviews
  • Handles natural prompts well, with enhanced results from detailed inputs; consistent across hardware benchmarks
  • Upcoming image-edit variant anticipated for expanded use, building excitement in communities
  • Positive feedback on speed (9-second generations), open license, and efficiency; users highlight entertainment value and practicality
  • Resource tweaks needed for optimal VRAM (closer to 24GB untuned); GGUF/AIO formats aid low-end setups

Limitations

  • Quality trades off against extreme speed, not matching ultra-high-fidelity models like NanoBanana in detail
  • Higher VRAM usage than advertised (up to 24GB) without optimization; image-to-image editing not yet released
  • Best at standard resolutions; higher like 2048x2048 possible but less tested for quality-speed balance