WAN-2.7
Wan 2.7 Image-to-Video generates high-quality videos from a single image with optional last-frame control, offering guided motion, audio synchronization, and intelligent prompt enhancement.
Avg Run Time: 200.000s
Model Slug: alibaba-wan-2-7-image-to-video
Release Date: April 3, 2026
Playground
Input
Output
Example Result
Preview and download your result.
API & SDK
Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
Readme
Overview
Alibaba | Wan 2.7 | Image to Video transforms a single static image into dynamic, high-quality video clips, solving the challenge of adding realistic motion and audio to visuals without complex editing tools. Developed by Alibaba Tongyi Lab as part of the advanced Wan 2.7 family, this model stands out with its support for guided motion control, including first/last frame options, and embedded audio synchronization for cinematic results. Users provide an input image and optional prompts to generate videos up to 15-30 seconds at 1080p or higher resolutions, making it ideal for creators seeking professional-grade output directly from each::labs (eachlabs.ai). The Alibaba | Wan 2.7 | Image to Video API enables seamless integration for developers building multimodal applications.
Technical Specifications
- Resolution Support: Native 1080p HD, up to 4K cinematic fidelity in advanced modes (e.g., 2048×2048 or 4096×4096 for related image tasks).
- Max Duration: 15-30 seconds per generation, extending beyond previous Wan models' 5-10 second limits.
- Aspect Ratios: Flexible, including custom dimensions like 1920×1080; supports widescreen cinematic formats.
- Input Formats: Single image input with text prompt (up to 5,000 characters); optional multi-reference images (up to 9) for control.
- Output Formats: Video with native audio; MP4 or similar standard video files.
- Processing Time: Efficient rendering via Diffusion Transformer architecture with T5 encoder and MoE routing; near-instant scaling on cloud infrastructure.
- Architecture: Video diffusion model with synchronous audio-visual Flow Matching for enhanced speed and quality.
Key Considerations
Before using Alibaba | Wan 2.7 | Image to Video on each::labs (eachlabs.ai), ensure your input image is high-resolution for optimal motion transfer. This model excels in scenarios requiring precise frame control, like extending static shots into narrated scenes, over basic text-to-video alternatives. Processing favors cloud deployment due to high compute needs, balancing cost with output quality—expect credits-based pricing starting around $10 for substantial usage. Developers integrating the Alibaba | Wan 2.7 | Image to Video API should account for prompt length limits and enable thinking mode for complex edits. Best for professional workflows where audio sync and duration matter more than ultra-short clips.
Tips & Tricks
Optimize prompts for Alibaba | Wan 2.7 | Image to Video by specifying motion direction, speed, and audio cues explicitly, leveraging its contextual command processing. Use "first frame: [describe input image], last frame: [target pose], smooth camera pan right with ambient forest sounds" to guide transitions precisely. Enable thinking mode for better reasoning on intricate scenes, and experiment with multi-image references (up to 9) for style-consistent animations. Set seed values for reproducible results during iteration. For longer videos, break prompts into sequential generations with endpoint anchors.
Example prompts:
- "Animate this portrait with gentle head turn left, smiling expression, soft orchestral background music rising to crescendo."
- "Convert landscape photo to flying drone shot over mountains at sunset, wind sounds and eagle calls synchronized."
- "Image to video: character walks forward from static pose, rain falling, thunder audio effects building tension."
Combine with each::labs (eachlabs.ai) workflows for rapid prototyping.
Capabilities
- Generates high-quality videos from a single input image with realistic motion dynamics up to 15-30 seconds.
- Supports first/last frame control for precise guided motion and endpoint anchoring.
- Includes native audio synchronization with embedded scene acoustics like ambient sounds or music.
- Handles multi-reference inputs (up to 9 images) for 9-grid multi-scene video composition.
- Offers instruction-based editing via Diffusion Transformer for text-driven adjustments.
- Delivers 1080p to 4K resolutions with flexible aspect ratios and custom dimensions.
- Features subject and voice cloning integration for consistent character animation.
- Supports contextual prompt enhancement with T5 encoder for complex commands.
What Can I Use It For?
Content Creators: Filmmakers can animate storyboards by inputting a keyframe image with prompts like "first frame: hero stands ready, last frame: draws sword dramatically, epic orchestral score swells," producing 15-second clips with synced audio for quick edits.
Marketers: Agencies generate product demo videos from a static photo, using multi-reference grids: "Pan around smartphone from top view, highlight features with voiceover narration," ideal for social media ads with native sound.
Developers: Build interactive apps via the Alibaba | Wan 2.7 | Image to Video API on each::labs (eachlabs.ai), feeding user-uploaded images into "animate avatar with custom gesture sequence and speech audio," for personalized virtual assistants.
Designers: Animate UI mockups with "transition static wireframe to interactive prototype, subtle click sounds and hover effects," leveraging instruction editing for precise motion control in presentation reels.
Things to Be Aware Of
Alibaba | Wan 2.7 | Image to Video performs best with clear, high-contrast input images; blurry sources lead to motion artifacts. Complex physics simulations, like rapid object interactions, may show trails compared to specialized models. Users often overlook prompt specificity—vague descriptions yield generic motion. Resource-intensive for local runs; rely on cloud via each::labs (eachlabs.ai) to avoid GPU overload. Steeper learning curve for multi-frame control, so test short clips first. Audio sync shines in ambient scenes but requires descriptive cues for dialogue-heavy outputs.
Limitations
Alibaba | Wan 2.7 | Image to Video caps at 15-30 seconds, unsuitable for full-length productions. Resolution tops at 1080p standard, with 4K limited to pro modes or image tasks. Struggles with hyper-realistic physics in fast-action scenes, producing occasional artifacts. No open weights yet—cloud-only access via APIs like on each::labs (eachlabs.ai). Input restricted to up to 9 reference images; longer texts beyond 5,000 characters unsupported.
Pricing
Pricing Type: Dynamic
1080P pricing: $0.15/sec (default)
Current Pricing
Pricing Rules
| Condition | Pricing |
|---|---|
resolution matches "720P" | 720P pricing: $0.10/sec |
Rule 2(Active) | 1080P pricing: $0.15/sec (default) |
Related AI Models
You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.
