Example inputhover

prompt: "Ultra-realistic cinematic product video of a pink and white chunky sole sneaker pair. Fast dynamic cuts: front view dead-on, then quick spin to side profile showing sole detail, sharp cut to close-up of mesh texture and laces, fast low-angle shot from ground level looking up at the chunky sole, quick overhead flat lay with rose petals scattered around, dramatic 3/4 angle shot with soft studio light raking across the surface highlighting the leather and mesh texture, final slow push-in to the toe box. Clean white background throughout. Soft pink rim lighting. Each shot holds for 0.5 seconds. Sharp fast transitions, no fades. Cinematic 4K product commercial style."
image_urls
resolution: "720p"
duration: "6"
generate_audio: true
aspect_ratio: "16:9"

Bytedance Seedance 2.0 Reference to Video · Fast

Video·seedance-2.0·by Bytedance

An advanced video generation model producing cinematic visuals with native audio, realistic physics, and precise camera control, supporting text, image, audio, and video inputs.

Try it now →

API reference

Runtime (p50): 3m
Estimated price: From $0.1129

Call the API

prediction.sh

curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "bytedance-seedance-2-0-reference-to-video-fast",
    "version": "0.0.1",
    "input": {
        "prompt": "Ultra-realistic cinematic product video of a pink and white chunky sole sneaker pair. Fast dynamic cuts: front view dead-on, then quick spin to side profile showing sole detail, sharp cut to close-up of mesh texture and laces, fast low-angle shot from ground level looking up at the chunky sole, quick overhead flat lay with rose petals scattered around, dramatic 3/4 angle shot with soft studio light raking across the surface highlighting the leather and mesh texture, final slow push-in to the toe box. Clean white background throughout. Soft pink rim lighting. Each shot holds for 0.5 seconds. Sharp fast transitions, no fades. Cinematic 4K product commercial style.",
        "image_urls": [
            "https://storage.googleapis.com/magicpoint/inputs/bytedance-seedance-2-0-reference-to-video-fast-input.png"
        ],
        "resolution": "720p",
        "duration": "6",
        "generate_audio": true,
        "aspect_ratio": "16:9"
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/

Documentation8 sections

Overview
Bytedance | Seedance 2.0 | Reference to Video | Fast Overview

Bytedance | Seedance 2.0 | Reference to Video | Fast is a specialized variant of ByteDance's flagship Seedance 2.0 AI video generation model, optimized for rapid image-to-video transformations with precise reference control. Developed by ByteDance's Seed AI research division, it excels in converting static images into dynamic cinematic videos featuring native audio, realistic physics, and stable motion. The primary differentiator is its multimodal reference support—up to 9 images and 1 short video clip (≤15 seconds)—enabling creators to lock in character consistency, scene style, and camera paths for production-ready outputs without extensive editing. Part of the Seedance family, this fast mode prioritizes speed while delivering up to 1080p or 2K resolution videos with synchronized sound, making it ideal for quick prototyping in storytelling and visual content creation on platforms like each::labs.
Capabilities
Capabilities
- Generates image-to-video from up to 9 images and 1 short video reference, preserving subject identity and style across frames.
- Native audio-video synthesis including dialogue, ambient sounds, music, and lip-sync in multiple languages.
- Realistic physics simulation for natural motion, objects, and interactions without blurring in fast movements.
- Timeline prompting for multi-shot narratives with automatic camera angle changes and editing rhythm.
- High-resolution output up to 2K at 1080p standard, supporting diverse aspect ratios.
- Multimodal inputs: text, image, audio, video for precise creative control as an "AI Director."
- Stable frame consistency over 60-second clips, reducing flicker via sectional generation.
Use cases
Use Cases for Bytedance | Seedance 2.0 | Reference to Video | Fast

Content Creators: Animate static artwork into promotional reels using image references for character consistency. Example: "Bring this fantasy character sketch to life: sword fight in castle, clashing metal sounds, dynamic tracking shot, 15s."

Marketers: Transform product photos into engaging ads with native audio. Example: "Reference product image: smartphone spins in futuristic interface, upbeat electronic track, reveal features via text overlay, vertical format."

Designers: Prototype motion graphics from mood boards with multi-image inputs. Example: "9 reference images of abstract shapes: morph into logo animation, ambient synth music, smooth transitions over 30s."

Developers: Test Bytedance | Seedance 2.0 | Reference to Video | Fast API on each::labs for app integrations, generating demo videos from user uploads with timeline prompts for scripted sequences.
Tips & tricks
Tips and Tricks

Optimize prompts for Bytedance | Seedance 2.0 | Reference to Video | Fast by using timeline prompting to dictate scene changes, like "0-10s: slow pan over landscape, 10-20s: character walks forward with dialogue." Combine up to 9 reference images for character and style consistency, uploading a primary subject image first to anchor motion. Specify camera controls explicitly, such as "steady dolly zoom on face with realistic lip-sync," to leverage its physics engine. For audio, include descriptors like "tense orchestral score with echoing footsteps" to enhance native generation. Test short durations initially to refine before scaling to 60 seconds.

Example prompts:
- "Animate this portrait: woman in red dress dancing in rainy street, native jazz music, smooth 360 spin, 1080p."
- "From reference image: cyberpunk cityscape evolves to neon chase scene, car zoom-by with engine roar, 20s duration."
- "Reference video clip: extend bird flight with wind sounds, add dialogue 'Fly higher!', multi-angle cuts."
Technical spec
Technical Specifications
- Resolution Support: Up to 1080p standard, with 2K output capabilities for high-quality renders.
- Max Duration: Up to 60 seconds for narrative videos, supporting multi-shot sequences.
- Aspect Ratios: Multiple formats including standard widescreen and vertical for social media.
- Input Formats: Text prompts, up to 9 images, 1 video clip (≤15 seconds), audio references for multimodal control.
- Output Formats: Cinematic videos with native audio integration, including dialogue, ambient sounds, and music.
- Processing Time: Fast mode optimized for quicker generation compared to full narrative renders, leveraging efficient diffusion-based architecture.
- Architecture: Large-scale diffusion model with timeline prompting for motion stability and physics simulation.
Things to be aware of
Things to Be Aware Of

Bytedance | Seedance 2.0 | Reference to Video | Fast may struggle with highly complex scenes lacking strong references, leading to minor inconsistencies in long motions. Users often overlook precise timestamping in prompts, causing unintended pacing issues—always structure timelines explicitly. Edge cases like extreme deformations or rapid subject changes can introduce subtle flickering despite stability improvements. High-quality inputs are crucial; low-res references amplify artifacts. Resource needs scale with duration, so fast mode suits iterative testing on each::labs to manage compute efficiently.
Key considerations
Key Considerations

Before using Bytedance | Seedance 2.0 | Reference to Video | Fast, ensure access via platforms like each::labs to bypass regional restrictions common with ByteDance APIs. It shines in scenarios requiring quick image-to-video animation with consistent references, outperforming alternatives in native audio sync and motion realism for short clips. Users need clear reference images or clips for best results, as vague inputs may reduce fidelity. Cost-effectiveness favors this fast variant for iterative workflows, balancing speed and quality over longer, resource-heavy generations. Ideal for creators prioritizing rapid prototyping versus ultra-long videos.
Limitations
Limitations

Bytedance | Seedance 2.0 | Reference to Video | Fast is capped at 60-second videos and may not handle ultra-long narratives without quality drops. Regional API locks limit direct access, requiring platforms like each::labs. Complex multi-character interactions or abstract concepts perform less reliably without multiple precise references. Output remains diffusion-based, so photorealism in edge lighting or occluded motions trails specialized tools. No confirmed support for outputs beyond 2K resolution currently.