WAN-2.5
Wan 2.5 Preview is a model that generates short, cinematic videos from a single input image. It preserves the details of the original image while adding camera movements and atmosphere to bring the scene to life. This allows a still photo to be transformed into a film-like moving sequence. The “Preview” version is optimized for quick tests and concept exploration, making it ideal for prototyping and creative experimentation.
Avg Run Time: 385.000s
Model Slug: wan-2-5-preview-image-to-video
Playground
Input
Enter a URL or choose a file from your computer.
Invalid URL.
(Max 50MB)
Enter a URL or choose a file from your computer.
Click to upload or drag and drop
(Max 50MB)
Output
Example Result
Preview and download your result.
API & SDK
Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
Readme
Overview
wan-2-5-preview-image-to-video — Image-to-Video AI Model
Developed by Alibaba as part of the wan-2.5 family, wan-2-5-preview-image-to-video transforms a single input image and text prompt into cinematic short videos with native audio synchronization, ideal for creators seeking quick prototyping of dynamic scenes from static photos. This preview version excels in preserving original image details while adding fluid camera movements and atmospheric sound, supporting resolutions up to 1080P and durations of 5s or 10s at 30 fps in MP4 format. Perfect for Alibaba image-to-video applications, it enables rapid concept exploration without complex setups, making it a go-to for image-to-video AI model users building engaging content efficiently.
Technical Specifications
What Sets wan-2-5-preview-image-to-video Apart
The wan-2-5-preview-image-to-video stands out with its audio-video sync capability, generating videos with synchronized sound from text prompts, images, and optional audio inputs—enabling realistic dubbing that elevates still images to production-ready clips. Unlike single-shot competitors, it maintains high fidelity to the input image's structure during complex camera motions like pans and zooms, ensuring no distortion in product or character details for professional outputs. It supports flexible resolutions (480P, 720P, 1080P) and fixed durations (5s, 10s), with aspect ratios adapted from the input image for seamless image-to-video AI model workflows.
- Native Audio Sync: Produces videos with automatic dubbing or custom audio files, syncing sound perfectly to motion—ideal for creators needing voiced narratives without post-production.
- Image Fidelity Preservation: Retains exact details, lighting, and reflections from the source photo during motion generation—perfect for e-commerce product animations.
- Optimized Preview Specs: Delivers 30 fps MP4 outputs in under 24-hour access windows, balancing speed and quality for iterative testing.
Key Considerations
- The model excels at generating cinematic camera movements and atmospheric effects but may introduce minor artifacts if the input image is low quality or highly complex
- For best results, use high-resolution, well-lit images with clear subject separation
- Avoid input images with excessive noise, compression artifacts, or ambiguous foreground/background separation
- The Preview version prioritizes speed over maximum quality; for final production, further refinement may be necessary
- Prompt engineering can influence the style and mood of the generated video; descriptive prompts yield more controlled results
- Iterative testing is recommended to fine-tune motion dynamics and visual effects
- Be mindful of GPU memory requirements, especially when processing high-resolution images
Tips & Tricks
How to Use wan-2-5-preview-image-to-video on Eachlabs
Access wan-2-5-preview-image-to-video seamlessly on Eachlabs via the Playground for instant testing with image uploads, text prompts, audio files, resolution (480P-1080P), and duration (5s/10s) settings, or integrate through the API/SDK for scalable apps. Outputs deliver high-fidelity 30 fps MP4 videos with synced audio, ready for download within 24 hours—empowering fast iteration on Eachlabs.
---Capabilities
- Generates short, cinematic video sequences from a single input image
- Preserves core details and composition of the original image while adding realistic motion
- Supports a wide range of camera movements and atmospheric effects
- Produces outputs suitable for concept visualization, storyboarding, and creative prototyping
- Adapts well to various artistic styles and subject matter, from landscapes to portraits
- Delivers fast generation times, enabling rapid iteration and experimentation
What Can I Use It For?
Use Cases for wan-2-5-preview-image-to-video
Content creators can upload a portrait photo with the prompt "slow pan across the face with subtle smile emerging, soft ambient music fading in" to generate a 10-second cinematic intro with lip-sync ready audio, streamlining social media teasers. Marketers building Alibaba image-to-video campaigns feed product images into wan-2-5-preview-image-to-video for dynamic demos, like turning a static shoe photo into a rotating 5-second clip with footstep sounds, boosting e-commerce engagement without video shoots.
Developers integrating wan-2-5-preview-image-to-video API create apps for real estate, animating property stills into walkthrough previews with environmental audio, maintaining architectural accuracy across 1080P outputs. Designers prototyping brand stories use it for style-consistent shorts, inputting a logo image and prompt for atmospheric motion clips that preserve visual identity in advertising prototypes.
Things to Be Aware Of
- Some users report occasional artifacts or unnatural motion in highly detailed or complex scenes
- The Preview version may not fully capture subtle lighting nuances compared to production-grade models
- Generation speed is optimized, but output quality may require post-processing for professional use
- GPU acceleration is recommended for best performance; CPU-only processing may be significantly slower
- Consistency between frames is generally strong, but edge cases with ambiguous input images can result in flickering or jitter
- Positive feedback highlights the model’s ease of use and impressive cinematic effects from simple inputs
- Negative feedback centers on limitations in video length and occasional loss of fine image details
Limitations
- Limited to short video sequences (typically 2-5 seconds); not suitable for long-form video generation
- May struggle with highly complex scenes or images with ambiguous subject/background separation
- Output quality, while strong for prototyping, may require additional refinement for final production use
Pricing
Pricing Type: Dynamic
Applies when the input video resolution is 720p. Pricing is calculated using a default rate of $0.10 per second based on the output duration.
Current Pricing
Pricing Rules
| Condition | Pricing |
|---|---|
resolution matches "480p" | Applies when the input video resolution is 480p. Pricing is calculated based on the output duration with a rate of $0.05 per second. |
resolution matches "1080p" | Applies when the input video resolution is 1080p. Pricing is calculated based on the output duration with a rate of $0.15 per second. |
Rule 3(Active) | Applies when the input video resolution is 720p. Pricing is calculated using a default rate of $0.10 per second based on the output duration. |
Related AI Models
You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.
