Alibaba Wan 2.7 · Reference to Video

Video·wan-2.7·by Alibaba

Wan 2.7 Reference-to-Video generates videos with consistent character and object appearance from a reference image, supporting single or multi-shot scenes and optional motion guidance from video references.

Runtime (p50)
8m
Estimated price
From $0.1
Call the API
prediction.sh
sh
curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "alibaba-wan-2-7-reference-to-video",
    "version": "0.0.1",
    "input": {
        "ratio": "16:9",
        "prompt": "A brave young child rides on the back of a giant bird flying high above the ocean at sunset. The bird’s massive wings flap powerfully through the glowing sky as waves crash against cliffs below. The camera follows in a cinematic aerial shot, sweeping around them as they glide through clouds and golden light. The child holds tightly to the bird’s feathers, hair and clothes moving in the wind. Epic fantasy adventure mood, realistic motion, dramatic scale, highly detailed, cinematic atmosphere.",
        "duration": 5,
        "shot_type": "single",
        "resolution": "1080P",
        "prompt_extend": true,
        "reference_image": "https://storage.googleapis.com/magicpoint/inputs/alibaba-wan-2-7-reference-to-video-input.png"
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/
Documentation8 sections
  • Overview

    Alibaba | Wan 2.7 | Reference to Video revolutionizes video generation by producing high-quality videos with consistent character and object appearances from a single reference image, ideal for maintaining subject fidelity across multi-shot scenes. Developed by Alibaba Tongyi Lab as part of the advanced Wan 2.7 family, this video-to-video model stands out with its support for multi-reference inputs like 9-grid scenes and first/last frame control, enabling precise cinematic outputs up to 15-30 seconds. Unlike basic generators, it incorporates instruction-based editing and optional motion guidance from video references, solving key challenges in consistent character animation for creators and filmmakers. Available via the Alibaba | Wan 2.7 | Reference to Video API on platforms like each::labs, it empowers users to craft professional videos efficiently.

  • Capabilities
    • Generates videos with consistent characters/objects from a single reference image across multi-shot scenes.
    • Supports 9-grid multi-reference inputs for complex spatial arrangements and scene building.
    • Provides first/last frame control for precise narrative structuring.
    • Includes native lip-sync and audio generation synchronized with visuals.
    • Enables instruction-based video editing, such as style transfer or element swaps using text prompts.
    • Handles image-to-video with optional motion guidance from reference videos up to 15-30 seconds.
    • Offers subject and voice cloning for personalized avatar animations.
    • Supports flexible resolutions from 1080p to 4K with thinking mode for enhanced quality.
  • Use cases

    For content creators: Produce YouTube intros with a consistent host avatar using 9-grid references: "Animate the reference character delivering a script across three shots, from close-up to wide angle, with lip-sync."

    For marketers: Create product demo videos maintaining brand mascot fidelity: "From reference image, show mascot interacting with products in a 20-second sequence, guided by demo motion video."

    For designers: Develop animated storyboards with first/last frame control: "Transition character from static pose A to action pose B in a multi-shot scene under studio lighting."

    For developers: Integrate via Alibaba | Wan 2.7 | Reference to Video API for app prototypes, cloning user-uploaded faces with custom voices: "Generate personalized tutorial video from user photo and script." These leverage the model's multimodal editing for efficient, high-fidelity outputs on each::labs.

  • Tips & tricks

    Optimize prompts for Alibaba | Wan 2.7 | Reference to Video by specifying "maintain subject consistency from reference image" to leverage its core strength in character fidelity. Use multi-reference grids (up to 9 images) for complex scenes, combining with first/last frame controls: "Generate a 15-second clip where the character from reference image walks from frame A to frame B under dramatic lighting." Enable thinking mode for text-to-video elements to improve reasoning and output quality, especially with long prompts up to 5,000 characters. For motion guidance, pair a short video reference with instructions like "Apply walking motion from video ref to static character image, add lip-sync dialogue." Workflow tip: Start with image-to-video base, then iterate via instruction editing for refinements. Test seeds for reproducibility in professional pipelines on each::labs.

  • Technical spec
    • Resolution Support: Native 1080p HD, with capabilities up to 4K cinematic fidelity in advanced modes (e.g., Wan 2.7 Pro variants).
    • Max Duration: 15-30 seconds per generation, extending beyond previous Wan 2.6 limits of 5-10 seconds.
    • Aspect Ratios: Flexible, including standard video ratios like 1920x1080 and custom dimensions.
    • Input/Output Formats: Accepts reference images (up to 9 in multi-reference grids), optional video for motion guidance, text prompts; outputs MP4 videos with native audio.
    • Processing Time: Efficient rendering via Diffusion Transformer architecture with T5 encoder and MoE routing, suitable for cloud deployment without excessive GPU demands.
    • Architecture: Multimodal Diffusion Transformer for contextual command processing and synchronous audio-visual flow matching.
  • Things to be aware of

    Alibaba | Wan 2.7 | Reference to Video has a steeper learning curve due to advanced features like instruction editing and multi-grid inputs, requiring practice for optimal prompts. Edge cases include complex physics simulations, where trails may appear less refined than specialized models. Common mistakes: Overloading prompts without clear reference hierarchy, leading to inconsistent outputs—always prioritize subject consistency directives. Resource needs scale with duration and resolution; 4K pro modes demand more credits on each::labs. Test short clips first to avoid wasted generations in multi-shot workflows.

  • Key considerations

    Before using Alibaba | Wan 2.7 | Reference to Video, ensure access to high-quality reference images for optimal subject consistency, as multi-shot scenes rely on clear inputs like 9-grid references. This model excels in scenarios requiring character persistence, such as short films or ads, over alternatives lacking first/last frame control. Processing via the Alibaba | Wan 2.7 | Reference to Video API balances speed and quality, with pro variants offering 4K at higher compute costs. Users should prioritize cloud platforms like each::labs for seamless integration, noting credit-based pricing starting around $10 for 100 credits. Best for teams handling instruction-based edits rather than raw physics simulations.

  • Limitations

    Alibaba | Wan 2.7 | Reference to Video caps at 15-30 seconds, unsuitable for full-length videos. Physics handling lags behind top competitors in dynamic scenes, with occasional motion artifacts. No open weights yet—cloud-only via APIs like on each::labs, pending Q2 2026 release. Input limits to 9 reference images; complex edits may require multiple iterations. Audio sync excels in lip-sync but falters with heavy accents or non-frontal faces.

Related models

4 models