VEO3.1
When your footage isn't long enough, use veo3-1-extend-video to seamlessly extend the duration without breaking the scene's context or narrative flow.
Avg Run Time: 100.000s
Model Slug: veo3-1-extend-video
Release Date: December 16, 2025
Playground
Input
Enter a URL or choose a file from your computer.
Invalid URL.
(Max 50MB)
Output
Example Result
Preview and download your result.
API & SDK
Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
Readme
Overview
Veo3.1-extend-video is a specialized video extension capability developed by Google DeepMind as part of the Veo 3.1 model family, available through Google Vertex AI and related APIs. It takes an existing Veo-generated video clip, typically up to 8 seconds long, and extends it by appending new segments based on a text prompt, preserving visual quality, motion continuity, scene consistency, and style across the output. The model supports extensions up to 30 seconds total duration through chaining multiple 4-8 second segments, with each new part conditioned on the last second of the previous clip for seamless transitions.
Key features include natural motion continuation, support for 720p or 1080p resolutions in 16: 9 or 9:16 aspect ratios, optional audio generation with spatial sound and dialogue synchronization, and the ability to reference up to three images for enhanced character and style consistency. What makes it unique is its context-aware scene extension using alignment embeddings and pixel-space similarity, which minimizes flicker, drift, or style changes, outperforming prior versions like Veo 3.0 in continuity, realism, and prompt adherence. This enables streamlined workflows for longer video production without manual editing or re-renders.
The underlying technology leverages advanced video-to-video generation with last-frame conditioning, enabling high-fidelity extensions that maintain lighting, backgrounds, facial structures, and motion dynamics. It is optimized for cinematic and production use, with safety filters applied to inputs and outputs.
Technical Specifications
- Architecture: Video-to-video generation with last-frame conditioning and alignment embeddings
- Parameters: Not publicly disclosed
- Resolution: 720p or 1080p (extensions typically 720p)
- Input/Output formats: MP4, MOV, WEBM, M4V, GIF; input videos up to 8s, Veo-generated only; output merged video with extensions of 4-8s per segment (up to 30s total via chaining)
- Performance metrics: Over 95% visual continuity with reference images; supports prompt lengths up to 1200 tokens; extension cost approximately $0.20 per second (audio off) or $0.40 (audio on)
Key Considerations
- Input videos must be Veo-generated for optimal results; non-Veo videos may lack audio or fail extension
- Use prompts that specify action, style, camera motion, and ambiance for best continuity
- Limit reference images to 1-3 to avoid memory limits and ensure stability
- Chain extensions by using the last second of prior output as input for cumulative durations beyond 60s
- Balance quality and speed by selecting 720p for faster generation versus 1080p for higher fidelity
- Enable auto_fix for prompts that might fail safety or validation checks
- Test short extensions first to refine prompts before full chains
Tips & Tricks
- Optimal parameter settings: Set duration to 7s, resolution to 720p for initial tests; use aspectratio auto; generateaudio true for synchronized sound
- Prompt structuring advice: "Continue the scene naturally, maintaining the same style, motion, and lighting. Camera pans right as the character walks forward in a moody forest atmosphere."
- How to achieve specific results: For character consistency, include 1-3 reference images of key subjects; specify camera motions like "zoom in slowly" for dynamic extensions
- Iterative refinement strategies: Generate a short extension, review for drift, then re-extend with adjusted prompt emphasizing prior elements
- Advanced techniques: Chain multiple extensions referencing the final frame each time; combine with first-and-last-frame control for precise scene transitions, e.g., prompt "Transition from walking to jumping while matching exact lighting from input end frame."
Capabilities
- Seamlessly extends Veo videos up to 30s with natural motion and scene continuity
- Maintains consistent characters, environments, lighting, and style across extensions using reference images
- Generates synchronized spatial audio, ambient sounds, and dialogue without abrupt changes
- Supports multi-aspect ratios (16:9, 9:16) and high resolutions up to 1080p
- Delivers cinematic realism with over 95% visual continuity and minimal flicker
- Handles complex prompts for style transitions, camera movements, and ambiance control
- Versatile for chaining segments to create long-form videos exceeding 60s
What Can I Use It For?
- Cinematic storytelling and scene continuations in film production workflows
- Branded content and advertising videos requiring seamless extensions
- Social media series with consistent visual flow across episodes
- Teaching and tutorial videos maintaining subject continuity
- Creative projects like animated sequences extended from initial clips, as shared in developer scripts
- Production studios streamlining shot extensions without manual stitching
Things to Be Aware Of
- Model restricted to extending only Veo-generated videos for best audio and continuity results
- Extensions typically 4-8s per segment; longer videos require chaining with last-second reference
- High consistency in motion dynamics and subjects via pixel-space similarity
- Users report smooth first/last-frame transitions and improved realism over Veo 3.0
- Resource-intensive for 1080p or audio-on; prompts over 1200 tokens may destabilize
- Safety filters block unsafe content in inputs and generations
- Positive feedback on audio synchronization and no-style-drift in community tests
Limitations
- Limited to Veo-generated input videos; other sources may not extend properly or retain audio
- Short segment durations (4-8s) necessitate chaining for longer outputs, increasing complexity
- Potential memory limits with more than 3 reference images or very long prompts
Related AI Models
You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.
