Alibaba Wan 2.7 · Video Edit
Wan 2.7 Video Edit applies instruction-based edits, reference image-based edits, or style transfer to existing videos. Supports 720P/1080P, preserves or regenerates audio, and handles 2-10s input videos.
- Runtime (p50)
- 5m
- Estimated price
- From $0.1
Overview
Alibaba | Wan 2.7 | Video Edit transforms existing videos through instruction-based edits, reference image guidance, or style transfers, solving the challenge of precise video manipulation without full regeneration. Part of Alibaba's advanced Wan 2.7 family, this video-to-video model excels in temporal feature transfer, preserving motion dynamics, camera work, and visual effects from source videos. Its standout capability lies in supporting native 1080p output for 2-10 second inputs with multi-reference handling up to 5 simultaneous inputs, enabling complex multi-subject compositions. Available via the Alibaba | Wan 2.7 | Video Edit API on platforms like each::labs, it streamlines workflows for creators needing high-fidelity Alibaba video-to-video edits. Ideal for professional video refinement, it maintains audio synchronization and handles real human references seamlessly.
Capabilities
- Instruction-based video editing via natural language prompts for object replacement, scene alteration, or enhancements.
- Reference-based edits supporting up to 5 simultaneous image/video/audio inputs for multi-subject consistency.
- Temporal feature transfer to preserve motion dynamics, camera movements, and effects from source videos.
- Native 1080p output for 2-10s inputs, with audio preservation or regeneration.
- Style transfer applying visual aesthetics from references while maintaining original timing.
- Real human image/video references as first frames or subjects, ensuring natural appearance and motion.
- Joint subject+voice control via mixed media inputs for synchronized edits.
Use cases
Content Creators: Refine raw footage by instruction-based object swaps, e.g., "remove the logo from the product demo video, keep hand movements natural." Leverages temporal transfer for seamless pro results.
Marketers: Perform style transfers on promo clips, like "apply luxury gold tones from reference image to car ad video." Multi-reference support ensures brand consistency across subjects.
Video Designers: Edit social media reels with face swaps using real human references: "replace presenter's face with actor image, sync to original speech audio." Preserves 1080p quality for platforms.
Developers: Integrate via Alibaba | Wan 2.7 | Video Edit API for app-based Alibaba video-to-video tools, automating multi-subject scene edits with 5-reference inputs for dynamic content generation.
Tips & tricks
Optimize prompts for Alibaba | Wan 2.7 | Video Edit by being specific about temporal changes, like "replace the background with a sunset while keeping the subject's walking motion identical." Use multi-references strategically: combine image for subject appearance, video for motion, and audio for voice sync. Enable first/last frame control for seamless transitions in edits. For style transfer, reference high-quality sources to maintain 1080p fidelity.
Example prompts:
- "Edit the video to change the man's shirt to red, preserve original walking path and camera pan."
- "Apply cyberpunk style to this cityscape video, transfer neon lighting effects temporally."
- "Replace actor's face with reference image, sync lip movements to original audio."
Workflow tip: Test with single references first, then scale to 5 for complex scenes on each::labs.
Technical spec
- Resolution Support: Native 1080p across all editing modes, with flexible aspect ratios.
- Max Duration: 2-10 seconds for reference-to-video (R2V) editing; supports 2-15s for related generation modes.
- Input Formats: Video inputs with optional joint image, video, and audio references (up to 5 simultaneous for multi-subject control); text instructions for edits.
- Output Formats: High-quality video with preserved or regenerated native audio; supports first/last frame control.
- Processing Time: Serverless deployment optimized for efficient editing; exact times vary by complexity and references.
- Architecture: Built on Wan model family with temporal feature transfer for motion preservation and multi-reference consistency.
Things to be aware of
Complex multi-reference setups (up to 5 inputs) may introduce minor inconsistencies in highly dynamic scenes. Edge cases like rapid motion or low-light inputs can affect temporal transfer precision. Common mistakes include vague prompts lacking temporal details, leading to altered motions—always specify preservation. Resource needs scale with references; test on each::labs for API quotas. Audio sync works best with clear source voice; noisy inputs may require regeneration. Avoid overlong videos beyond 10s to prevent quality drops.
Key considerations
Before using Alibaba | Wan 2.7 | Video Edit, ensure input videos are 2-10 seconds to match optimal performance windows. It requires clear text instructions or reference media for best results, with up to 5 references enhancing multi-subject accuracy. Best for targeted edits like style transfers or object modifications rather than full recreations, outperforming in scenarios needing motion fidelity. On each::labs, leverage the Alibaba | Wan 2.7 | Video Edit API for scalable Alibaba video-to-video tasks. Consider cost tradeoffs: efficient for short clips but may increase with multiple references. No local deployment yet; cloud access via API is standard.
Limitations
Alibaba | Wan 2.7 | Video Edit caps at 2-10s for reference editing, unsuitable for longer formats. Multi-subject handling is strong up to 5 references but may falter in overcrowded compositions. No confirmed 4K video support yet, sticking to 1080p. Open weights pending; cloud-only access currently. Fails on extreme deformations or non-human subjects without strong references. Input videos must be short to avoid processing issues.