Wan 2.5 Preview · Text to Video

Video·wan-2.5·by Alibaba

Wan 2.5 Preview is a model designed to generate realistic videos directly from text. It transforms short descriptions into cinematic visuals with natural motion, smooth camera work, and high-quality output. The “Preview” version is optimized for quick tests and experiments, making it easy to visualize ideas before moving into full production.

Runtime (p50)
3m
Estimated price
From $0.05
Call the API
prediction.sh
sh
curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "wan-2-5-preview-text-to-video",
    "version": "0.0.1",
    "input": {
        "prompt": "Hyperspeed POV shot of a motorcycle ride, the rider’s hands gripping the handlebars clearly visible. Dodging explosions while weaving through smoke, rubble, and blasts, the camera races forward as the chaotic environment blurs in rapid motion all around.",
        "aspect_ratio": "16:9",
        "resolution": "720p",
        "duration": "5",
        "negative_prompt": "low resolution, error, worst quality, low quality, defects",
        "enable_prompt_expansion": true
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/
Documentation8 sections
  • Overview

    wan-2-5-preview-text-to-video — Text to Video AI Model

    Developed by Alibaba as part of the wan-2.5 family, wan-2-5-preview-text-to-video is a cutting-edge text-to-video AI model that generates realistic 5-10 second videos with synchronized native audio directly from text prompts. This preview version stands out by producing lip-synced speech, matching background music, and environmental sound effects in a single pass, eliminating the need for post-production audio editing common in other text-to-video AI models. Ideal for creators seeking quick, high-quality cinematic visuals with sound, it supports resolutions up to 1080p at 30 fps in MP4 format, making it perfect for social media content like TikTok videos or YouTube shorts.

    Alibaba's wan-2-5-preview-text-to-video transforms simple descriptions into dynamic clips with smooth camera movements and stable object tracking, offering unmatched efficiency for prototyping video ideas.

  • Capabilities
    • Native Audio Generation: Wan 2.5 can generate synchronized audio, including dialogues, ambient sounds, and background music.
    • Style Adaptation: Seamlessly adapts across cinematic, anime, and illustration styles.
    • High-Quality Outputs: Produces videos with clear details and smooth motion.
    • Versatility: Suitable for storytelling, advertising, creative projects, and more.
    • Technical Strengths: Offers strong prompt adherence and visual reasoning capabilities.
  • Use cases

    Use Cases for wan-2-5-preview-text-to-video

    Content creators can use wan-2-5-preview-text-to-video's native audio sync to prototype TikTok videos, inputting a prompt like "A barista pours steaming espresso into a white cup with cafe chatter and soft jazz in the background, slow-motion close-up" to get a 10s 1080p clip with lip-synced ambient sounds—no extra audio work needed. This leverages its strength in environmental effects for engaging short-form content.

    Marketers building "text-to-video AI model" campaigns for e-commerce benefit from its high-resolution outputs and sound integration, generating product demos like dynamic unboxings with narrated instructions and matching music, optimized for 9:16 portrait format on Instagram Reels.

    Developers integrating Alibaba text-to-video capabilities via API can animate static assets into promotional videos, using image references plus text for consistent branding with auto-generated voiceovers, ideal for app-based video tools targeting social platforms.

    Filmmakers experimenting with previews find value in its 10s duration and smooth motion controls, creating storyboards with synchronized dialogue and effects to visualize scenes before full production.

  • Tips & tricks

    How to Use wan-2-5-preview-text-to-video on Eachlabs

    Access wan-2-5-preview-text-to-video seamlessly through Eachlabs' Playground for instant testing with text prompts, optional audio references, and settings for 5-10s duration, 480p-1080p resolution, and aspect ratios like 16:9 or 9:16. Integrate via API or SDK for production apps, receiving MP4 outputs with native synced audio in minutes—perfect for scaling text-to-video workflows.

    ---
  • Technical spec

    What Sets wan-2-5-preview-text-to-video Apart

    The wan-2-5-preview-text-to-video model differentiates itself in the competitive text-to-video landscape through its pioneering native audio synchronization, generating lip-synced speech and sound effects alongside visuals in 480p, 720p, or 1080p resolutions for 5s or 10s durations at 30 fps. This enables users to create complete audio-visual content without additional editing tools, saving time for marketers producing "Alibaba text-to-video" clips with professional sound.

    Unlike many competitors limited to silent videos or shorter clips, it supports multimodal inputs like text and audio references while maintaining advanced camera control for flicker-free motion, ideal for developers integrating a wan-2-5-preview-text-to-video API into apps. This consistency across 16:9, 9:16, or 1:1 aspect ratios ensures platform-optimized outputs for Instagram or YouTube.

    Its lenient content policies and multilingual support further empower bold, global storytelling, with outputs in MP4 (H.264) ready for immediate use.

    • Native audio-video sync: Produces synchronized speech, music, and effects; streamlines workflows for quick social media video generation.
    • Extended 10s duration in 1080p: Exceeds typical 5-8s limits; supports richer narratives for storytelling content.
    • Smooth cinematography: Features stable tracking and transitions; delivers professional-grade results without jitter.
  • Things to be aware of
    • Experimental Features: The "Preview" version is optimized for quick tests and may have limitations compared to full versions.
    • Known Quirks: Some users report occasional inconsistencies in audio-visual synchronization.
    • Performance Considerations: Resource requirements can vary based on video complexity.
    • Consistency Factors: Outputs may vary slightly in quality depending on prompt clarity and complexity.
    • Positive Feedback Themes: Users appreciate the model's ability to generate high-quality visuals and synchronized audio.
  • Key considerations
    • Prompt Accuracy: Ensure that prompts are clear and specific to achieve desired results.
    • Style Adaptation: Wan 2.5 can adapt across various styles, but consistency may vary depending on the complexity of the prompt.
    • Resource Efficiency: The model is optimized for efficient output, but resource requirements can vary based on the complexity of the video generated.
    • Quality vs Speed Trade-offs: Higher quality outputs may require more processing time.
    • Prompt Engineering Tips: Use detailed descriptions and specify desired styles or genres for better results.
  • Limitations
    • Video Duration: Limited to generating videos up to 10 seconds in length.
    • Technical Constraints: May require significant computational resources for complex video generation tasks.
    • Style Consistency: While adaptable across styles, maintaining consistency can be challenging with very complex or abstract prompts.

Related models

4 models
* FAQ

About Wan 2.5 Preview · Text to Video

01 / 03

What is Wan 2.5 Preview text-to-video and how does it compare to Wan v2.6?

Wan 2.5 Preview text-to-video is Alibaba's early-access next-generation model that offers developers first access to Wan 2.5 capabilities ahead of official release. It aims to improve on Wan v2.6 with advances in text-to-video scene generation, motion quality, and prompt coherence. As a preview, the model may be updated as the release is finalized.