What are the main use cases for Veo 3.1 Reference to Video?

Veo 3.1 Reference to Video is ideal for generating brand-consistent video content, character-consistent short films, product showcase videos, and creative campaigns where visual identity across multiple clips must be maintained. It is also effective for producing follow-on scenes that visually match existing footage without directly continuing it.

How can I access Veo 3.1 Reference to Video through the eachlabs API?

Veo 3.1 Reference to Video is available on the eachlabs platform under the model ID veo3.1-reference-to-video. Submit reference images and a generation prompt via the eachlabs unified API to receive a guided video clip. eachlabs provides access to the full Veo 3.1 model family on pay-as-you-go pricing without Google Cloud setup.

Veo 3.1 · Reference to Video

Video·veo3.1·by Google

Analyze the style and structure of a video you admire with veo3-1-reference-to-video and replicate its visual language and motion structure in your new videos.

Try it now →

API reference

Runtime (p50): 2m
Estimated price: From $1.60

Call the API

prediction.sh

curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "veo3-1-reference-to-video",
    "version": "0.0.1",
    "input": {
        "prompt": "The subject sits down into the armchair, then reaches coffee table to pick up the glass matcha mug and brings it to his lips for a calm, relaxed sip",
        "image_urls": [
            "https://storage.googleapis.com/magicpoint/inputs/veo3-1-reference-to-video-input1.jpg",
            "https://storage.googleapis.com/magicpoint/inputs/veo3-1-reference-to-video-input2.jpg",
            "https://storage.googleapis.com/magicpoint/inputs/veo3-1-reference-to-video-input3.jpg"
        ],
        "resolution": "720p",
        "aspect_ratio": "16:9",
        "generate_audio": true
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/

Documentation8 sections

Overview
veo3.1-reference-to-video — Image-to-Video AI Model

Developed by Google as part of the veo3.1 family, veo3.1-reference-to-video is an image-to-video AI model that transforms reference images into expressive, cinematic videos while maintaining precise visual consistency. Instead of generating videos from text alone, this model anchors video generation to your source images—whether character photos, product shots, or style references—ensuring that every frame preserves the identity, appearance, and visual characteristics you provide. This solves a critical problem for creators and developers: maintaining character consistency and brand alignment across AI-generated video content without manual frame-by-frame editing.

The model uses an advanced diffusion-transformer architecture to understand both your reference images and natural language prompts with high semantic accuracy. You provide up to four reference images showing your character, object, or desired visual style, write a text prompt describing the scene and action you want, and veo3.1-reference-to-video generates a video that stays true to your references while executing your creative direction. This makes it ideal for serialized storytelling, branded content creation, and any workflow where visual consistency across multiple videos is non-negotiable.
Capabilities
- Generates high-fidelity, realistic videos from text prompts and reference images, with smooth transitions between keyframes.
- Preserves subject appearance and artistic style across frames using reference images.
- Supports native, synchronized audio generation for a more immersive output.
- Offers control over cinematic elements such as camera motion, lighting, and ambiance via prompt engineering.
- Enables rapid prototyping and iterative refinement, making it suitable for both creative and technical workflows.
- Delivers strong scene coherence and character continuity, even in multi-shot sequences.
- Suitable for generating both landscape (16:9) and portrait (9:16) aspect ratio videos.
Use cases
Use Cases for veo3.1-reference-to-video

Serialized Character Animation for Content Creators: YouTube creators and animation studios can upload character reference sheets and generate multiple scenes featuring the same character with consistent appearance, expressions, and movement style. A creator might prompt: "The character walks through a bustling Tokyo street at sunset, looking amazed, with soft golden hour lighting and busy pedestrians in the background." The model preserves the character's identity across every frame while executing the scene direction, enabling rapid production of episodic content without hiring voice actors or animators for each scene.

Brand-Aligned Product Videos for E-Commerce: Marketing teams building an AI video generator for product launches can feed product photography plus a text prompt like "Place this luxury watch on a marble surface with dramatic studio lighting, rotating slowly to show all angles" and receive photorealistic product videos. veo3.1-reference-to-video maintains exact product appearance while varying lighting, backgrounds, and camera angles—eliminating expensive studio shoots and enabling rapid A/B testing of product presentation styles.

Vertical Social Media Content at Scale: Social media managers and influencers can generate native 9:16 vertical videos for TikTok and YouTube Shorts using reference images of themselves or branded characters. The native vertical support means no cropping, no quality loss, and no awkward framing—just full-screen, high-resolution content optimized for each platform's aspect ratio.

Developers Building Custom Video Workflows: Developers integrating an image-to-video API into creative applications can leverage veo3.1-reference-to-video's multi-image reference system and 4K output to build tools for game studios, advertising agencies, and film production companies. The model's ability to accept multiple reference images and maintain visual consistency across complex scenes makes it suitable for professional-grade video editing and composition workflows that demand pixel-perfect consistency.
Tips & tricks
How to Use veo3.1-reference-to-video on Eachlabs

Access veo3.1-reference-to-video through Eachlabs via the interactive Playground or through the API for production integration. Provide up to four reference images, write your video prompt describing the scene and action, and specify your desired output parameters including aspect ratio (16:9, 9:16, 1:1, or 4:3) and resolution up to 4K. The model returns MP4 video files with synchronized audio, ready for immediate use in creative projects, social media platforms, or downstream editing workflows.
---END---
Technical spec
What Sets veo3.1-reference-to-video Apart

Multi-Image Reference Anchoring: Unlike single-image reference models, veo3.1-reference-to-video accepts up to four reference images simultaneously, allowing you to provide character sheets, multiple angles, or style guides in one generation. This multi-reference approach dramatically improves character identity persistence and visual coherence across complex scenes and camera movements.

4K Resolution Output with 60fps Support: veo3.1-reference-to-video generates videos at true 4K resolution (3840×2160) at up to 60 frames per second, surpassing competitors capped at 1080p. This production-grade quality eliminates the need for post-processing upscaling and delivers cinema-ready footage suitable for professional broadcast and high-end creative work.

Native Vertical Video Generation: The model natively supports 9:16 vertical aspect ratio alongside 16:9 landscape, 1:1 square, and 4:3 standard formats. Creators building content for TikTok, YouTube Shorts, Instagram Reels, and Snapchat can generate full-screen vertical videos without cropping or quality loss—a critical advantage for social media-first workflows.

Enhanced Character Expressiveness and Audio Sync: veo3.1-reference-to-video generates videos with improved character expressions, dynamic movements, and synchronized audio including dialogue, sound effects, and ambient soundscapes. The model maintains consistency in character, object, and background details while allowing you to blend various visual elements into a cohesive output.

Technical Specifications: Maximum output duration is 8 seconds per generation, with support for MP4 input and output formats. The model supports both 720p and 1080p input resolution and delivers outputs up to 4K. Processing is handled asynchronously through the API, with results ready for immediate integration into production pipelines.
Things to be aware of
- User feedback highlights the model’s strength in visual realism and scene coherence, especially compared to earlier versions and some competitors.
- The model is praised for its ability to handle multi-scene transitions and maintain character consistency, which is valuable for narrative projects.
- Some users note that while the model is fast for prototyping, achieving highly detailed or complex scenes may require multiple iterations and careful prompt engineering.
- Audio generation, while impressive, can significantly increase the cost per second of video in some deployment scenarios.
- There is a learning curve to effective prompt and reference image selection; suboptimal inputs can lead to inconsistent or lower-quality outputs.
- The model applies safety filters to inputs and outputs, which may restrict certain types of content.
- Community discussions suggest that the model’s performance is best for short to medium-length clips; very long or highly dynamic scenes may challenge its coherence.
- Positive reviews often mention the ease of integrating the model into iterative creative workflows, but some users desire even finer control over motion and timing.
Key considerations
- Prepare clear, concise prompts that describe subject, action, camera, style, and environment for best results.
- Use up to three high-quality reference images to guide character or object consistency—poor-quality or inconsistent references can degrade output quality.
- Review generated clips for subject fidelity, motion, framing, lighting, and audio alignment; iterate on prompts and references as needed.
- Be aware of the trade-off between generation speed and output quality; this model is optimized for quick prototypes, so complex or highly detailed scenes may require multiple iterations.
- Frame-specific control (e.g., constraining first/last frames) can help achieve desired transitions, but may not be supported in all implementations.
- Audio generation increases computational cost and may affect pricing in some deployment scenarios.
- Safety filters are applied to both input images and generated content to prevent misuse.
Limitations
- Output duration is typically limited to short clips (commonly 4–8 seconds, with some extensions possible), which may not suit all narrative or commercial needs.
- Highly complex or fast-paced scenes can sometimes result in less coherent motion or artifacts, requiring manual refinement.
- The model’s performance and quality depend heavily on the quality and relevance of reference images and the precision of the text prompt.
- Native audio generation, while advanced, may not always perfectly match the desired mood or pacing of the visual content.
- As with most generative models, there is a risk of unintended biases or artifacts in the output, necessitating careful review before final use.

Related models

4 models

Kling o3 Standard Video to Video · ReferenceKling

Vidu 2.0 · Reference to VideoVidu

Bytedance Seedance 2.0 · Reference to Video AI model preview

Bytedance Seedance 2.0 · Reference to VideoBytedance

Kling o3 Pro · Referance to VideoKling

* FAQ

About Veo 3.1 · Reference to Video

01 / 03

What is Veo 3.1 Reference to Video and how does it use reference images?

Veo 3.1 Reference to Video is Google's latest video generation model that uses reference images to guide video creation, maintaining visual consistency with the provided references without directly animating them. It uses Veo 3.1's advanced temporal modeling to generate new video content that reflects the style, subject, or composition of the reference inputs.

Veo 3.1 · Reference to Video