each::sense is in private beta.
Eachlabs | AI Workflows for app builders

VIDU-1.5

Vidu 1.5 builds visually stable, realistic video scenes from multiple reference photos.

Avg Run Time: 50.000s

Model Slug: vidu-1-5-reference-to-video

Playground

Input

Enter a URL or choose a file from your computer.

Enter a URL or choose a file from your computer.

Enter a URL or choose a file from your computer.

Enter a URL or choose a file from your computer.

Advanced Controls

Output

Example Result

Preview and download your result.

Unsupported conditions - pricing not available for this input format

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents
Overview
Technical Specifications
Key Considerations
Tips & Tricks
Capabilities
What Can I Use It For?
Things to Be Aware Of
Limitations

Overview

Vidu 1.5 is an advanced AI video generation model designed to create visually stable and realistic video scenes from multiple reference photos. Developed as part of a new generation of image-to-video models, Vidu 1.5 leverages sophisticated deep learning techniques to synthesize smooth, coherent video sequences that maintain high visual fidelity and consistency across frames. The model is particularly noted for its ability to interpret and blend multiple reference images, enabling the generation of dynamic scenes that closely match the visual style and content of the provided inputs.

Key features of Vidu 1.5 include support for both text-to-video and image-to-video workflows, smooth camera motion control, and the ability to produce videos with high-resolution output. The model is engineered to minimize visual artifacts and flickering, which are common challenges in video synthesis from static images. Its unique approach to reference-based video generation makes it especially valuable for creators seeking to animate still photos or generate short video clips that require visual continuity and realism.

Vidu 1.5 stands out due to its focus on stability and realism, offering creators a tool that bridges the gap between static image generation and fully dynamic video content. Its architecture is optimized for both creative flexibility and technical robustness, making it suitable for a wide range of professional, creative, and personal applications.

Technical Specifications

  • Architecture: Advanced diffusion-based video synthesis model (specific architecture details not publicly disclosed)
  • Parameters: Not officially specified
  • Resolution: Supports up to ultra-HD (4K) video output; common outputs include 1080p and 4K
  • Input/Output formats: Accepts multiple reference images (JPG/PNG, minimum 300px width/height, up to 10MB each); outputs video files (MP4, MOV, or similar standard formats)
  • Performance metrics: Notable for high visual stability, smooth transitions, and minimal flicker; typical video length up to 10 seconds per generation

Key Considerations

  • Multiple high-quality reference images improve output realism and scene coherence
  • Prompt specificity is crucial; detailed prompts yield more controlled and predictable results
  • Camera motion parameters (pan, tilt, dolly) can be adjusted for dynamic effects but may require experimentation for best results
  • Higher resolutions and longer video durations increase computational requirements and generation time
  • Iterative refinement (regenerating with adjusted prompts or references) is often necessary for optimal results
  • Avoid low-resolution or poorly composed reference images to prevent artifacts or unstable outputs
  • Balancing quality and speed: higher quality settings may significantly increase generation time

Tips & Tricks

  • Use at least 2-3 reference images from different angles or with varied lighting for richer scene dynamics
  • Structure prompts with clear scene descriptions, desired actions, and visual style cues (e.g., "sunset cityscape, gentle camera pan, cinematic lighting")
  • For smoother transitions, ensure reference images are visually consistent (similar backgrounds, color palettes)
  • Adjust camera motion settings incrementally; small changes can have a significant impact on perceived movement
  • If initial outputs show flicker or instability, try replacing or reordering reference images and refining the prompt
  • Use iterative generation: review outputs, tweak parameters, and regenerate to converge on the desired result
  • For advanced effects, experiment with combining text prompts and reference images to guide both content and style

Capabilities

  • Generates visually stable, realistic video scenes from multiple reference photos
  • Supports both text-to-video and image-to-video workflows
  • Enables smooth camera motion effects (pan, tilt, dolly) within generated videos
  • Produces high-resolution outputs suitable for professional and creative use
  • Maintains strong visual coherence and minimizes flicker across frames
  • Adaptable to a wide range of visual styles and subject matter based on input references

What Can I Use It For?

  • Professional video content creation, such as marketing clips, explainer videos, and product showcases
  • Creative projects including animated photo stories, digital art, and short films
  • Business applications like social media content, advertising, and promotional materials
  • Personal projects such as animating family photos, travel memories, or event highlights
  • Industry-specific uses in fashion, real estate, and entertainment for dynamic visual presentations

Things to Be Aware Of

  • Some users report experimental features, such as advanced camera motion, may produce unpredictable results and require manual tuning
  • Known quirks include occasional flicker or instability when reference images are too dissimilar or low quality
  • Performance benchmarks indicate longer videos and higher resolutions demand significant computational resources and may increase wait times
  • Consistency across frames is generally strong, but edge cases (complex backgrounds, rapid scene changes) may challenge stability
  • Positive feedback highlights the model's realism, ease of use, and ability to animate static images effectively
  • Common concerns include occasional artifacts, limited video duration per generation, and the need for prompt refinement to achieve specific outcomes

Limitations

  • Maximum video duration per generation is typically limited to around 10 seconds
  • Output quality is highly dependent on the quality and consistency of reference images; poor inputs can lead to artifacts or instability
  • May not be optimal for highly complex scenes, rapid action sequences, or scenarios requiring precise frame-by-frame control

Pricing

Pricing Type: Dynamic

720p, 4s

Conditions

SequenceResolutionDurationPrice
1"360p""4"$0.4
2"720p""4"$1
3"1080p""4"$2
8"720p""8"$2