VIDU-1.5

Vidu 1.5 builds visually stable, realistic video scenes from multiple reference photos.

Avg Run Time: 50.000s

Model Slug: vidu-1-5-reference-to-video

Playground

Input

Prompt*

Image Url*

Enter a URL or choose a file from your computer.

Invalid URL.

(Max 50MB)

Image URL 2

Enter a URL or choose a file from your computer.

Click to upload or drag and drop

(Max 50MB)

Image URL 3

Enter a URL or choose a file from your computer.

Click to upload or drag and drop

(Max 50MB)

Image URL 4

Enter a URL or choose a file from your computer.

Click to upload or drag and drop

(Max 50MB)

Advanced Controls

Output

Example Result

Preview and download your result.

Unsupported conditions - pricing not available for this input format

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents

Overview

Technical Specifications

Key Considerations

Tips & Tricks

Capabilities

What Can I Use It For?

Things to Be Aware Of

Limitations

Overview

vidu-1-5-reference-to-video — Image-to-Video AI Model

Developed by Vidu as part of the vidu-1.5 family, vidu-1-5-reference-to-video transforms multiple reference photos into visually stable, realistic video scenes, solving the challenge of maintaining character and object consistency across motion. This image-to-video AI model excels by accepting 3–7 reference images from different angles to generate short videos of 4–8 seconds at up to 1080p resolution, ensuring expressiveness and clean motion without manual editing. Ideal for creators seeking Vidu image-to-video tools that preserve identity in dynamic sequences, vidu-1-5-reference-to-video delivers production-ready outputs directly from uploads and prompts.

Technical Specifications

What Sets vidu-1-5-reference-to-video Apart

vidu-1-5-reference-to-video stands out in the image-to-video AI model landscape by supporting 3–7 reference images for precise subject consistency, a capability that outperforms single-image models in multi-angle scenarios. This enables users to create videos where characters or objects retain exact details like facial features and poses across frames, reducing artifacts common in competitors.

Unlike basic text-to-video generators, it automatically handles prompts alongside references to produce 4–8 second clips at 1080p with smooth motion and temporal stability. Developers integrating vidu-1-5-reference-to-video API benefit from fast processing for apps needing reliable reference-based animation, such as virtual try-ons or product demos.

Multi-reference input (3–7 images): Locks in subject identity from varied angles, enabling consistent videos for complex scenes like rotating product views.
High-res output (up to 1080p, 4–8s duration): Delivers crisp, stable footage ideal for social media or ads without post-production scaling.
Automatic prompt integration: Combines text guidance with images for expressive motion, streamlining workflows for Vidu image-to-video applications.

Key Considerations

Multiple high-quality reference images improve output realism and scene coherence
Prompt specificity is crucial; detailed prompts yield more controlled and predictable results
Camera motion parameters (pan, tilt, dolly) can be adjusted for dynamic effects but may require experimentation for best results
Higher resolutions and longer video durations increase computational requirements and generation time
Iterative refinement (regenerating with adjusted prompts or references) is often necessary for optimal results
Avoid low-resolution or poorly composed reference images to prevent artifacts or unstable outputs
Balancing quality and speed: higher quality settings may significantly increase generation time

Tips & Tricks

How to Use vidu-1-5-reference-to-video on Eachlabs

Access vidu-1-5-reference-to-video seamlessly on Eachlabs via the Playground for instant testing—upload 3–7 reference images, add a descriptive prompt, select aspect ratio and duration (4–8s), and generate 1080p videos with preserved consistency. Integrate through the API or SDK for production apps, specifying image URLs, text prompts, and output formats like MP4 for high-quality, motion-stable results in your workflows.

---

Capabilities

Generates visually stable, realistic video scenes from multiple reference photos
Supports both text-to-video and image-to-video workflows
Enables smooth camera motion effects (pan, tilt, dolly) within generated videos
Produces high-resolution outputs suitable for professional and creative use
Maintains strong visual coherence and minimizes flicker across frames
Adaptable to a wide range of visual styles and subject matter based on input references

What Can I Use It For?

Use Cases for vidu-1-5-reference-to-video

Content creators can upload 3–5 photos of a character from different angles plus a prompt like "the character walks confidently through a bustling city street at dusk, camera following from behind" to generate a consistent 1080p video clip, perfect for short films or TikTok sketches without reshoots.

Marketers building e-commerce visuals feed product reference images from multiple views into vidu-1-5-reference-to-video to animate 360-degree rotations or lifestyle integrations, creating engaging image-to-video AI model assets that boost conversion rates.

Developers embedding vidu-1-5-reference-to-video API in apps for personalized avatars use reference selfies to produce talking-head videos with preserved facial details, ideal for virtual assistants or social platforms needing quick, consistent animations.

Designers prototyping UI animations supply app screenshot references and motion prompts to output smooth transition videos, accelerating feedback loops in mobile app development with stable, high-fidelity results.

Things to Be Aware Of

Some users report experimental features, such as advanced camera motion, may produce unpredictable results and require manual tuning
Known quirks include occasional flicker or instability when reference images are too dissimilar or low quality
Performance benchmarks indicate longer videos and higher resolutions demand significant computational resources and may increase wait times
Consistency across frames is generally strong, but edge cases (complex backgrounds, rapid scene changes) may challenge stability
Positive feedback highlights the model's realism, ease of use, and ability to animate static images effectively
Common concerns include occasional artifacts, limited video duration per generation, and the need for prompt refinement to achieve specific outcomes

Limitations

Maximum video duration per generation is typically limited to around 10 seconds
Output quality is highly dependent on the quality and consistency of reference images; poor inputs can lead to artifacts or instability
May not be optimal for highly complex scenes, rapid action sequences, or scenarios requiring precise frame-by-frame control

Pricing

Pricing Type: Dynamic

720p, 4s

Conditions

Sequence	Resolution	Duration	Price
1	"360p"	"4"	$0.4
2	"720p"	"4"	$1
3	"1080p"	"4"	$2
8	"720p"	"8"	$2

AI TRENDS

Related AI Models

You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.

Image to Video

Wan 2.6 is an image-to-video model that transforms images into high-quality videos with smooth motion and visual consistency.

Wan | v2.6 | Image to Video

300 s

Image to Video

Edit videos using xAI’s Grok Imagine.Seamlessly modify and transform your existing videos with AI powered edits.

XAI | Grok Imagine | Edit Video

80 s

Image to Video

Pixverse v5.6 turns static images into stunning, high-quality videos with natural motion, smooth transitions, and cinematic visuals in seconds.

Pixverse v5.6 | Image to Video

150 s

Image to Video

Kling 3.0 Standard delivers high-quality image-to-video generation with cinematic visuals, smooth motion, native audio, and support for custom elements.

Kling | v3 | Standard | Image to Video

250 s

Explore More