each::sense is live
Eachlabs | AI Workflows for app builders

VIDU-1.5

Vidu 1.5 builds visually stable, realistic video scenes from multiple reference photos.

Avg Run Time: 50.000s

Model Slug: vidu-1-5-reference-to-video

Playground

Input

Enter a URL or choose a file from your computer.

Enter a URL or choose a file from your computer.

Enter a URL or choose a file from your computer.

Enter a URL or choose a file from your computer.

Advanced Controls

Output

Example Result

Preview and download your result.

Unsupported conditions - pricing not available for this input format

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents
Overview
Technical Specifications
Key Considerations
Tips & Tricks
Capabilities
What Can I Use It For?
Things to Be Aware Of
Limitations

Overview

vidu-1-5-reference-to-video — Image-to-Video AI Model

Developed by Vidu as part of the vidu-1.5 family, vidu-1-5-reference-to-video transforms multiple reference photos into visually stable, realistic video scenes, solving the challenge of maintaining character and object consistency across motion. This image-to-video AI model excels by accepting 3–7 reference images from different angles to generate short videos of 4–8 seconds at up to 1080p resolution, ensuring expressiveness and clean motion without manual editing. Ideal for creators seeking Vidu image-to-video tools that preserve identity in dynamic sequences, vidu-1-5-reference-to-video delivers production-ready outputs directly from uploads and prompts.

Technical Specifications

What Sets vidu-1-5-reference-to-video Apart

vidu-1-5-reference-to-video stands out in the image-to-video AI model landscape by supporting 3–7 reference images for precise subject consistency, a capability that outperforms single-image models in multi-angle scenarios. This enables users to create videos where characters or objects retain exact details like facial features and poses across frames, reducing artifacts common in competitors.

Unlike basic text-to-video generators, it automatically handles prompts alongside references to produce 4–8 second clips at 1080p with smooth motion and temporal stability. Developers integrating vidu-1-5-reference-to-video API benefit from fast processing for apps needing reliable reference-based animation, such as virtual try-ons or product demos.

  • Multi-reference input (3–7 images): Locks in subject identity from varied angles, enabling consistent videos for complex scenes like rotating product views.
  • High-res output (up to 1080p, 4–8s duration): Delivers crisp, stable footage ideal for social media or ads without post-production scaling.
  • Automatic prompt integration: Combines text guidance with images for expressive motion, streamlining workflows for Vidu image-to-video applications.

Key Considerations

  • Multiple high-quality reference images improve output realism and scene coherence
  • Prompt specificity is crucial; detailed prompts yield more controlled and predictable results
  • Camera motion parameters (pan, tilt, dolly) can be adjusted for dynamic effects but may require experimentation for best results
  • Higher resolutions and longer video durations increase computational requirements and generation time
  • Iterative refinement (regenerating with adjusted prompts or references) is often necessary for optimal results
  • Avoid low-resolution or poorly composed reference images to prevent artifacts or unstable outputs
  • Balancing quality and speed: higher quality settings may significantly increase generation time

Tips & Tricks

How to Use vidu-1-5-reference-to-video on Eachlabs

Access vidu-1-5-reference-to-video seamlessly on Eachlabs via the Playground for instant testing—upload 3–7 reference images, add a descriptive prompt, select aspect ratio and duration (4–8s), and generate 1080p videos with preserved consistency. Integrate through the API or SDK for production apps, specifying image URLs, text prompts, and output formats like MP4 for high-quality, motion-stable results in your workflows.

---

Capabilities

  • Generates visually stable, realistic video scenes from multiple reference photos
  • Supports both text-to-video and image-to-video workflows
  • Enables smooth camera motion effects (pan, tilt, dolly) within generated videos
  • Produces high-resolution outputs suitable for professional and creative use
  • Maintains strong visual coherence and minimizes flicker across frames
  • Adaptable to a wide range of visual styles and subject matter based on input references

What Can I Use It For?

Use Cases for vidu-1-5-reference-to-video

Content creators can upload 3–5 photos of a character from different angles plus a prompt like "the character walks confidently through a bustling city street at dusk, camera following from behind" to generate a consistent 1080p video clip, perfect for short films or TikTok sketches without reshoots.

Marketers building e-commerce visuals feed product reference images from multiple views into vidu-1-5-reference-to-video to animate 360-degree rotations or lifestyle integrations, creating engaging image-to-video AI model assets that boost conversion rates.

Developers embedding vidu-1-5-reference-to-video API in apps for personalized avatars use reference selfies to produce talking-head videos with preserved facial details, ideal for virtual assistants or social platforms needing quick, consistent animations.

Designers prototyping UI animations supply app screenshot references and motion prompts to output smooth transition videos, accelerating feedback loops in mobile app development with stable, high-fidelity results.

Things to Be Aware Of

  • Some users report experimental features, such as advanced camera motion, may produce unpredictable results and require manual tuning
  • Known quirks include occasional flicker or instability when reference images are too dissimilar or low quality
  • Performance benchmarks indicate longer videos and higher resolutions demand significant computational resources and may increase wait times
  • Consistency across frames is generally strong, but edge cases (complex backgrounds, rapid scene changes) may challenge stability
  • Positive feedback highlights the model's realism, ease of use, and ability to animate static images effectively
  • Common concerns include occasional artifacts, limited video duration per generation, and the need for prompt refinement to achieve specific outcomes

Limitations

  • Maximum video duration per generation is typically limited to around 10 seconds
  • Output quality is highly dependent on the quality and consistency of reference images; poor inputs can lead to artifacts or instability
  • May not be optimal for highly complex scenes, rapid action sequences, or scenarios requiring precise frame-by-frame control

Pricing

Pricing Type: Dynamic

720p, 4s

Conditions

SequenceResolutionDurationPrice
1"360p""4"$0.4
2"720p""4"$1
3"1080p""4"$2
8"720p""8"$2