Eachlabs | AI Workflows for app builders

WAN-2.7

Wan 2.7 Video Edit applies instruction-based edits, reference image-based edits, or style transfer to existing videos. Supports 720P/1080P, preserves or regenerates audio, and handles 2-10s input videos.

Avg Run Time: 300.000s

Model Slug: alibaba-wan-2-7-video-edit

Release Date: April 3, 2026

Playground

Input

Enter a URL or choose a file from your computer.

Enter a URL or choose a file from your computer.

Enter a URL or choose a file from your computer.

Enter a URL or choose a file from your computer.

Output

Example Result

Preview and download your result.

1080P pricing: $0.15/sec (default)

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents
Overview
Technical Specifications
Key Considerations
Tips & Tricks
Capabilities
What Can I Use It For?
Things to Be Aware Of
Limitations

Overview

Alibaba | Wan 2.7 | Video Edit transforms existing videos through instruction-based edits, reference image guidance, or style transfers, solving the challenge of precise video manipulation without full regeneration. Part of Alibaba's advanced Wan 2.7 family, this video-to-video model excels in temporal feature transfer, preserving motion dynamics, camera work, and visual effects from source videos. Its standout capability lies in supporting native 1080p output for 2-10 second inputs with multi-reference handling up to 5 simultaneous inputs, enabling complex multi-subject compositions. Available via the Alibaba | Wan 2.7 | Video Edit API on platforms like each::labs, it streamlines workflows for creators needing high-fidelity Alibaba video-to-video edits. Ideal for professional video refinement, it maintains audio synchronization and handles real human references seamlessly.

Technical Specifications

  • Resolution Support: Native 1080p across all editing modes, with flexible aspect ratios.
  • Max Duration: 2-10 seconds for reference-to-video (R2V) editing; supports 2-15s for related generation modes.
  • Input Formats: Video inputs with optional joint image, video, and audio references (up to 5 simultaneous for multi-subject control); text instructions for edits.
  • Output Formats: High-quality video with preserved or regenerated native audio; supports first/last frame control.
  • Processing Time: Serverless deployment optimized for efficient editing; exact times vary by complexity and references.
  • Architecture: Built on Wan model family with temporal feature transfer for motion preservation and multi-reference consistency.

Key Considerations

Before using Alibaba | Wan 2.7 | Video Edit, ensure input videos are 2-10 seconds to match optimal performance windows. It requires clear text instructions or reference media for best results, with up to 5 references enhancing multi-subject accuracy. Best for targeted edits like style transfers or object modifications rather than full recreations, outperforming in scenarios needing motion fidelity. On each::labs, leverage the Alibaba | Wan 2.7 | Video Edit API for scalable Alibaba video-to-video tasks. Consider cost tradeoffs: efficient for short clips but may increase with multiple references. No local deployment yet; cloud access via API is standard.

Tips & Tricks

Optimize prompts for Alibaba | Wan 2.7 | Video Edit by being specific about temporal changes, like "replace the background with a sunset while keeping the subject's walking motion identical." Use multi-references strategically: combine image for subject appearance, video for motion, and audio for voice sync. Enable first/last frame control for seamless transitions in edits. For style transfer, reference high-quality sources to maintain 1080p fidelity.

Example prompts:

  • "Edit the video to change the man's shirt to red, preserve original walking path and camera pan."
  • "Apply cyberpunk style to this cityscape video, transfer neon lighting effects temporally."
  • "Replace actor's face with reference image, sync lip movements to original audio."

Workflow tip: Test with single references first, then scale to 5 for complex scenes on each::labs.

Capabilities

  • Instruction-based video editing via natural language prompts for object replacement, scene alteration, or enhancements.
  • Reference-based edits supporting up to 5 simultaneous image/video/audio inputs for multi-subject consistency.
  • Temporal feature transfer to preserve motion dynamics, camera movements, and effects from source videos.
  • Native 1080p output for 2-10s inputs, with audio preservation or regeneration.
  • Style transfer applying visual aesthetics from references while maintaining original timing.
  • Real human image/video references as first frames or subjects, ensuring natural appearance and motion.
  • Joint subject+voice control via mixed media inputs for synchronized edits.

What Can I Use It For?

Content Creators: Refine raw footage by instruction-based object swaps, e.g., "remove the logo from the product demo video, keep hand movements natural." Leverages temporal transfer for seamless pro results.

Marketers: Perform style transfers on promo clips, like "apply luxury gold tones from reference image to car ad video." Multi-reference support ensures brand consistency across subjects.

Video Designers: Edit social media reels with face swaps using real human references: "replace presenter's face with actor image, sync to original speech audio." Preserves 1080p quality for platforms.

Developers: Integrate via Alibaba | Wan 2.7 | Video Edit API for app-based Alibaba video-to-video tools, automating multi-subject scene edits with 5-reference inputs for dynamic content generation.

Things to Be Aware Of

Complex multi-reference setups (up to 5 inputs) may introduce minor inconsistencies in highly dynamic scenes. Edge cases like rapid motion or low-light inputs can affect temporal transfer precision. Common mistakes include vague prompts lacking temporal details, leading to altered motions—always specify preservation. Resource needs scale with references; test on each::labs for API quotas. Audio sync works best with clear source voice; noisy inputs may require regeneration. Avoid overlong videos beyond 10s to prevent quality drops.

Limitations

Alibaba | Wan 2.7 | Video Edit caps at 2-10s for reference editing, unsuitable for longer formats. Multi-subject handling is strong up to 5 references but may falter in overcrowded compositions. No confirmed 4K video support yet, sticking to 1080p. Open weights pending; cloud-only access currently. Fails on extreme deformations or non-human subjects without strong references. Input videos must be short to avoid processing issues.

Pricing

Pricing Type: Dynamic

1080P pricing: $0.15/sec (default)

Current Pricing

1080P pricing: $0.15/sec (default)

Pricing Rules

ConditionPricing
resolution matches "720P"720P pricing: $0.10/sec
Rule 2(Active)1080P pricing: $0.15/sec (default)