Eachlabs | AI Workflows for app builders
alibaba-wan-2.7-text-to-image

WAN-2.7

Alibaba Wan 2.7 Text to Image is the latest generation of Alibaba's Wan image generation model, delivering significant improvements in prompt fidelity, compositional accuracy, and visual detail over previous Wan versions. It produces photorealistic and artistic images across diverse styles from natural language descriptions with stronger semantic understanding. Ideal for marketing asset generation, product concept visualization, and creative workflows requiring precise, high-quality text-to-image output.

Avg Run Time: 25.000s

Model Slug: alibaba-wan-2-7-text-to-image

Playground

Input

Output

Example Result

Preview and download your result.

Preview
0.03/Per image pricing

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents
Overview
Technical Specifications
Key Considerations
Tips & Tricks
Capabilities
What Can I Use It For?
Things to Be Aware Of
Limitations

Overview

Alibaba | Wan 2.7 | Text to Image is a unified model from Alibaba Tongyi Lab that excels in generating and editing high-quality images from text prompts, solving the challenge of creating detailed, customizable visuals without standardized "AI faces." Part of the Wan family, this April 2026 release introduces advanced avatar customization and superior text rendering, setting it apart with bone-level facial adjustments and print-quality output for complex elements like tables and formulas. Users benefit from its end-to-end architecture that ensures precise prompt adherence and multilingual support across 12 languages. On each::labs, access the Alibaba | Wan 2.7 | Text to Image API for seamless integration into creative workflows, delivering richly detailed images up to 4K resolution in the Pro variant.

Technical Specifications

  • Resolution Support: Up to 2K (2048×2048) for standard version; 4K (4096×4096) for Image Pro variant, with flexible aspect ratios and custom dimensions.
  • Input Formats: Text prompts up to 5,000 characters; optional up to 9 reference images for editing, style transfer, or fusion.
  • Output Formats: High-quality PNG/JPG images; supports num_outputs of 1-4 (or 1-12 in image set mode).
  • Text Handling: Up to 3,000 tokens input, print-grade rendering for ultra-long texts, tables, math formulas, and multilingual layouts equivalent to an A4 page.
  • Special Modes: Thinking mode for enhanced reasoning (default for text-to-image); image_set_mode for coherent sets.
  • Architecture: Unified generation and understanding with shared latent space semantic mapping.

Key Considerations

Before using Alibaba | Wan 2.7 | Text to Image, note that the Pro variant offers 4K output but prioritizes quality over speed compared to the standard 2K model. It requires detailed prompts for optimal results, especially with reference images (up to 9), making it ideal for professional editing over simple generations. Best for scenarios needing precise customization like avatars or documents, rather than rapid prototyping. On each::labs, the Alibaba | Wan 2.7 | Text to Image API handles processing efficiently, but thinking mode increases generation time—disable for faster iterations. No local GPU melting issues reported, suitable for cloud deployment.

Tips & Tricks

Optimize prompts for Alibaba | Wan 2.7 | Text to Image by specifying bone structure, eye shape, and facial details for avatars, e.g., "A portrait of a young woman with high cheekbones, almond-shaped green eyes, and subtle freckles across the nose." Use the palette function by referencing colors: "Generate a landscape in the exact color scheme of a sunset photo, with 40% orange, 30% purple hues." Enable thinking mode for complex scenes to improve reasoning and adherence. For text-heavy outputs, include layout instructions: "Create an A4 page with a table of sales data, math formulas below, in Japanese and English." Combine up to 9 images in image set mode for coherent series, like "Fuse styles from three fashion photos into a new outfit design." Test with shorter prompts first to refine before scaling to 5,000 characters.

Capabilities

  • Generates high-quality text-to-image outputs up to 4K in Pro mode with flexible resolutions like 1K, 2K, or custom sizes.
  • Advanced avatar customization at bone level, adjusting structure, eyes, and features for unique, non-standardized faces.
  • Superior text rendering: print-quality long texts, tables, formulas, and infographics across 12 languages up to 3K tokens.
  • Image editing with up to 9 references for style transfer, element swapping, or multi-image fusion.
  • Palette extraction: One-click color scheme matching from references, adjustable proportions.
  • Thinking mode enhances prompt interpretation for better quality and composition stability.
  • Image set generation: Coherent outputs of 1-12 images in set mode.
  • Unified architecture for seamless generation-to-editing transitions.

What Can I Use It For?

For designers: Create custom avatars by fine-tuning bone structures—"Portrait of executive with sharp jawline, piercing blue eyes, professional attire"—leveraging bone-level adjustments for branding assets.

For marketers: Generate infographics with precise text and palettes: "A4 poster with sales table, pie chart formula, brand colors extracted from logo image"—ideal for multilingual campaigns supporting 12 languages.

For developers: Integrate via each::labs Alibaba | Wan 2.7 | Text to Image API to edit product visuals, fusing up to 9 references: "Restyle smartphone mockups by blending three angle photos with futuristic glow."

For creators: Produce coherent image sets in thinking mode: "Series of 6 fantasy landscapes evolving from dawn to dusk, matching reference color scheme"—perfect for storyboarding with stable composition.

Things to Be Aware Of

Alibaba | Wan 2.7 | Text to Image performs best with detailed prompts; vague inputs may lead to less precise compositions despite thinking mode. Edge cases like ultra-complex multi-element fusions with 9 images can increase processing time significantly. Common mistakes include overloading prompts beyond 5,000 characters or ignoring reference image quality, resulting in suboptimal blends. Resource needs are cloud-friendly, but local runs await open weights. Multilingual text shines in supported languages, but rare dialects may falter. Always preview single outputs before batching to catch minor alignment issues.

Limitations

Alibaba | Wan 2.7 | Text to Image standard version caps at 2K resolution, with 4K exclusive to Pro. Lacks native video output, focusing solely on images despite family video capabilities. No built-in animation or motion; editing limited to static fusions. Potential inconsistencies in hyper-detailed physics or rare subjects without strong references. Open weights pending, currently cloud/API only. Does not support durations or audio, as it's image-centric.

Pricing

Pricing Type: Dynamic

0.03/Per image pricing

Current Pricing

0.03/Per image pricing
FREQUENTLY ASKED QUESTIONS

Dev questions, real answers.

Alibaba Wan 2.7 Text to Image is the latest generation of Alibaba's Wan text-to-image model, delivering improved prompt adherence, higher visual fidelity, and stronger compositional accuracy over previous Wan versions. It generates photorealistic and artistic images across diverse styles from natural language descriptions.

Alibaba Wan 2.7 Text to Image is accessible via the eachlabs unified API. Submit a text prompt with optional style or resolution parameters; the model returns a generated image. Billing is pay-as-you-go through eachlabs no Alibaba account is required.

Wan 2.7 introduces improvements in prompt fidelity, image sharpness, and compositional control over Wan v2.6. It handles more complex scene descriptions with greater accuracy. For new production workflows requiring the best Wan output quality, Wan 2.7 is the recommended choice; Wan v2.6 remains a stable fallback for existing integrations.