How do I use Vidu 2.0 Reference to Video via API?

Vidu 2.0 Reference to Video is accessible via the eachlabs unified API. Submit reference images and a scene or motion prompt; the model returns a video featuring the referenced subjects. Billing is pay-as-you-go through eachlabs.

How does Vidu 2.0 Reference to Video compare to Vidu 1.5 Reference to Video?

Vidu 2.0 delivers superior subject fidelity, smoother motion, and more precise reference consistency than Vidu 1.5. For production work requiring accurate character or product representation across video frames, Vidu 2.0 is the recommended choice. Vidu 1.5 remains suitable for cost-conscious, high-volume generation.

Example inputhover

aspect_ratio: "16:9"
duration: 4
image_url
model: "vidu2.0"
model_name: "-"
movement_amplitude: "auto"
prompt: "Scene: On the surface of an alien planet, strange rocks stand in the red desert, and two huge planets hang in the sky. Character: An astronaut, wearing a heavy space suit, holding a detection instrument. Emotion: Loneliness and exploration, full of unknowns and adventures. Action: Astronauts explore the surface of the planet, discover ancient alien ruins, and unravel the secrets. Style: Sci-fi adventure style, mainly cold colors, strong light and shadow."
resolution: "720p"

Vidu 2.0 · Reference to Video

Video·vidu-2.0·by Vidu

Vidu 2.0 Reference to Video generates realistic motion by combining multiple reference photos into a seamless video.

Try it now →

API reference

Runtime (p50): 40s
Estimated price: $0.005 / credit

Call the API

prediction.sh

curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "vidu-2-0-reference-to-video",
    "version": "0.0.1",
    "input": {
        "aspect_ratio": "16:9",
        "duration": 4,
        "image_url": "https://storage.googleapis.com/magicpoint/inputs/vidu-2.0-r2v-input.jpg",
        "model": "vidu2.0",
        "model_name": "-",
        "movement_amplitude": "auto",
        "prompt": "Scene: On the surface of an alien planet, strange rocks stand in the red desert, and two huge planets hang in the sky. Character: An astronaut, wearing a heavy space suit, holding a detection instrument. Emotion: Loneliness and exploration, full of unknowns and adventures. Action: Astronauts explore the surface of the planet, discover ancient alien ruins, and unravel the secrets. Style: Sci-fi adventure style, mainly cold colors, strong light and shadow.",
        "resolution": "720p"
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/

Documentation8 sections

Overview
vidu-2-0-reference-to-video — Image-to-Video AI Model

Transform static reference photos into seamless, realistic motion videos with vidu-2-0-reference-to-video, Vidu's advanced image-to-video AI model from the vidu-2.0 family. This model excels by fusing multiple reference images—up to 4 images alongside 2 videos—into coherent short-form videos with exceptional consistency in characters, actions, and styles, solving the challenge of uncontrollable AI video outputs.

Developed by Vidu, vidu-2-0-reference-to-video powers production-grade workflows for creators seeking precise control over video generation from images. It supports native 1080p resolution and durations up to 16 seconds, making it ideal for Vidu image-to-video applications like short films and commercials where multi-reference fusion ensures pixel-level accuracy without post-production guesswork.
Capabilities
- Generates realistic, high-fidelity motion from static reference photos
- Maintains consistent character identity and style across all video frames
- Supports advanced camera grammar, including smooth push-ins, pull-backs, and tracking shots
- Faithfully adheres to detailed prompts, minimizing semantic drift
- Produces polished, cinematic short videos suitable for professional and creative applications
- Handles both character-driven and product-focused scenes with high detail retention
Use cases
Use Cases for vidu-2-0-reference-to-video

For filmmakers and animators, vidu-2-0-reference-to-video streamlines short drama production by fusing character reference images with action videos, ensuring identical appearances across scenes. Upload a face photo, pose video, and style image to generate a 10-second clip of the character walking through a cityscape, maintaining expressions and textures seamlessly—perfect for "AI video from multiple images" workflows.

Marketers building e-commerce visuals use it to animate product shots with reference textures and effects, creating dynamic demos like a watch rotating on a velvet surface with sparkling light reflections. This image-to-video AI model eliminates studio needs, delivering 1080p videos ready for ads.

Developers integrating Vidu image-to-video APIs craft interactive apps for designers, inputting 3-7 angle references for consistent object animations in prototypes. Example prompt: "Animate this product from front and side references with smooth 360-degree rotation, glossy metallic texture from image 3, in a modern showroom setting"—yielding coherent 16-second outputs for UI testing.

Content creators prototype narrative shorts by combining scene images and motion clips, achieving multi-shot structures with lip-synced elements for social media reels.
Tips & tricks
How to Use vidu-2-0-reference-to-video on Eachlabs

Access vidu-2-0-reference-to-video seamlessly on Eachlabs via the Playground for instant testing—upload 2-4 reference images/videos, add a text prompt specifying motion or style, select 1080p resolution and duration up to 16 seconds. Integrate through the API or SDK for apps, receiving high-fidelity MP4 outputs with fused consistency in seconds.
---
Technical spec
What Sets vidu-2-0-reference-to-video Apart

vidu-2-0-reference-to-video stands out in the image-to-video AI model landscape through its multimodal reference fusion, accepting 2 reference videos and 4 images simultaneously to cover six dimensions: special effects, expressions, textures, actions, characters, and scenes. This enables precise replication and transfer of elements like character identities or dynamic motions, turning random generations into controllable edits akin to a video format brush.

Unlike basic image-to-video tools, it delivers production-level coherence with 1080p resolution, up to 16-second clips, and smooth temporal consistency, reducing flicker in multi-subject scenes. Users gain storyboard-level control, importing references for fused outputs that maintain logical flow across shots.
- Multi-reference orchestration: Handles 2 videos + 4 images for deep fusion, locking in character consistency and style transfer—unique for high-frequency creative pipelines.
- Six reference dimensions: Pixel-level control over effects, actions, and expressions, enabling "addition, deletion, and modification" without external tools like AE or C4D.
- Fast, stable rendering: 3x faster generation with ultra-consistent characters and camera control, optimized for vidu-2-0-reference-to-video API integrations.
Things to be aware of
- Some experimental features, such as advanced camera motion or complex multi-character scenes, may yield variable results based on user feedback
- Users have reported that reference consistency is generally strong, but extreme changes in input images can cause identity drift or motion artifacts
- Performance is optimized for short clips; generating longer videos may require more memory and can introduce temporal inconsistencies
- High-quality outputs may demand significant computational resources, especially at maximum resolution settings
- Community feedback highlights the model’s strength in micro-expression fidelity and cinematic motion, with positive reviews for creative control and ease of use
- Common concerns include occasional prompt drift in highly complex scenes and the need for iterative prompt refinement to achieve desired results
Key considerations
- Reference image quality and diversity significantly impact output realism and consistency
- Best results are achieved with high-resolution, well-lit reference photos that clearly depict the subject’s features and intended motion cues
- Prompt engineering is crucial: detailed scene descriptions and camera instructions yield more predictable and cinematic results
- There is a trade-off between output quality and generation speed; higher fidelity settings may increase processing time
- Consistency across frames is a key strength, but extreme pose or lighting changes between reference images can introduce artifacts
- Iterative refinement (adjusting prompts or reference sets) is often necessary for optimal results
Limitations
- Limited to short video clips (typically up to 8–10 seconds); not suitable for long-form video generation
- May struggle with highly dynamic scenes, extreme pose changes, or inconsistent reference images
- Requires careful prompt engineering and high-quality input images for optimal results; generic or low-quality inputs can reduce output fidelity

Related models

4 models

Bytedance Seedance 2.0 Reference to Video · Fast AI model preview

Bytedance Seedance 2.0 Reference to Video · FastBytedance

Alibaba Wan 2.7 · Reference to VideoAlibaba

Kling o3 Standard · Reference to VideoKling

XAI Grok Imagine · Reference to VideoxAI

* FAQ

About Vidu 2.0 · Reference to Video

01 / 03

What is Vidu 2.0 Reference to Video?

Vidu 2.0 Reference to Video is an AI video generation model by ShengShu that produces animated video using reference images to ensure subject visual consistency. Built on the Vidu 2.0 architecture, it delivers improved character and object fidelity across video frames compared to earlier reference-based variants.

Vidu 2.0 · Reference to Video