Example inputhover

prompt: "A vacant lot between two red brick English terraced houses slowly transforms over time. First the empty overgrown plot with a wooden hoarding fence, then a timber frame skeleton rising with scaffolding, finally a fully completed brick house matching the neighbours perfectly. Camera is fixed straight-on from across the street, continuous timelapse transition between all three stages. Camera is fixed timelapse."
image_urls
resolution: "720p"
duration: "10"
generate_audio: false
aspect_ratio: "auto"

Bytedance Seedance 2.0 · Reference to Video

Video·seedance-2.0·by Bytedance

An advanced video generation model delivering cinematic visuals with native audio, realistic physics, and director-level camera control, supporting text, image, audio, and video inputs.

Try it now →

API reference

Runtime (p50): 3m
Estimated price: From $0.1412

Call the API

prediction.sh

curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "bytedance-seedance-2-0-reference-to-video",
    "version": "0.0.1",
    "input": {
        "prompt": "A vacant lot between two red brick English terraced houses slowly transforms over time. First the empty overgrown plot with a wooden hoarding fence, then a timber frame skeleton rising with scaffolding, finally a fully completed brick house matching the neighbours perfectly. Camera is fixed straight-on from across the street, continuous timelapse transition between all three stages. Camera is fixed timelapse.",
        "image_urls": [
            "https://storage.googleapis.com/magicpoint/inputs/bytedance-seedance-2-0-reference-to-video-input1.png",
            "https://storage.googleapis.com/magicpoint/inputs/bytedance-seedance-2-0-reference-to-video-input2.png",
            "https://storage.googleapis.com/magicpoint/inputs/bytedance-seedance-2-0-reference-to-video-inputt-3.png"
        ],
        "resolution": "720p",
        "duration": "10",
        "generate_audio": false,
        "aspect_ratio": "auto"
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/

Documentation8 sections

Overview
Bytedance | Seedance 2.0 | Reference to Video Overview

Bytedance | Seedance 2.0 | Reference to Video transforms static images, videos, audio, and text into cinematic videos with native audio synchronization and precise motion control. Developed by ByteDance as part of the Seedance family, this multimodal model excels in image-to-video generation, preserving subject identity, composition, and style while adding realistic physics and director-level camera movements. Its standout differentiator is support for up to 12 mixed reference files—images, videos, and audio—in a single generation, enabling Hollywood-grade outputs that outperform single-input competitors. Available via APIs like on each::labs, Bytedance | Seedance 2.0 | Reference to Video empowers creators to produce 1080p clips up to 60 seconds with lip-synced dialogue and sound effects, revolutionizing workflows from storyboards to final edits.
Capabilities
Capabilities
- Generates cinematic 1080p videos from mixed inputs: text, images, video clips, audio with native sync
- Multi-reference support: Up to 9 images, 3 videos, 3 audios for precise style, motion, rhythm control
- Image-faithful animation: Preserves subject identity, lighting, composition from reference images
- Start/end frame control: Optional first/last images for exact scene composition
- Realistic physics and motion: Handles dancing, sports, object interactions accurately
- Dialogue and audio generation: Lip-sync via quoted speech in prompts, timed effects
- Multi-shot storyboards: Timeline prompting for camera angle changes
- Video editing/extension: Modify or continue reference videos while keeping consistency
Use cases
Use Cases for Bytedance | Seedance 2.0 | Reference to Video

Content Creators: Animate storyboards with multi-image references for consistent characters. Example: "[Image1] knight in armor, [Image2] castle background, charges into battle with sword clash sounds, camera dolly zoom." Produces 1080p clip with synced audio.

Marketers: Turn product shots into demos using image-to-video with end-frame control. Prompt: "[Image1] smartphone on table, rotates 360 degrees while narrator says 'Sleek design meets power,' ambient whoosh effects." Ideal for overviews.

Designers: Extend concept art videos with physics-accurate motion. Example: Provide reference video of fabric flow, prompt "Continue with wind gusts, realistic folds and ripples."

Fitness Trainers: Generate tutorials from pose images and audio rhythm. " [Image1] yoga pose sequence, instructor voices 'Inhale, stretch,' with breathing sync and mat creaks." Diversifies across pros needing quick, controllable cinematic output via each::labs API.
Tips & tricks
Tips and Tricks

For optimal results with Bytedance | Seedance 2.0 | Reference to Video, use specific references in prompts like "[Image1] of a dancer in studio lighting, performs a spin with smooth camera pan." Include dialogue in double quotes for lip-synced audio: "A chef chops vegetables, saying 'Fresh ingredients make the best meals,' with knife sounds syncing to motion." Leverage multi-references for consistency—provide up to 4 images for character/style and an end-frame image via last_image parameter to control scene closure. Optimize by starting with fast endpoints for previews, then full for finals; timeline prompts enable multi-shot sequences like "0-5s: wide shot [Image1], 5-10s: close-up zoom." Test spatial details early, as the model excels at multi-subject interactions and physics like collisions.
Technical spec
Technical Specifications
- Resolution: Up to 1080p (full HD)
- Max Duration: Up to 60 seconds (varies by endpoint; CapCut rollout starts at 15 seconds)
- Aspect Ratios: Multiple ratios supported, including six standard formats
- Inputs: Text prompts, up to 9 images, 3 video clips, 3 audio files (total up to 12 references); reference via [Image1], [Video1], etc. in prompts
- Outputs: Video with native synchronized audio (dialogue, effects, ambient); MP4 format typical
- Processing Time: Varies by provider; fast endpoints available for quicker inference
- Architecture: Unified multimodal model handling text, image, video, audio inputs in one pass
These specs make Bytedance | Seedance 2.0 | Reference to Video ideal for high-fidelity image-to-video tasks on platforms like each::labs.
Things to be aware of
Things to Be Aware Of

Bytedance | Seedance 2.0 | Reference to Video may struggle with real faces in references due to safety restrictions blocking such generations. Complex prompts with too many elements can lead to minor inconsistencies in long clips; test short durations first. Edge cases like extreme deformations or rapid multi-object interactions might show artifacts, despite strong physics. Users often overlook referencing files correctly (e.g., [Image1]), causing ignored inputs—always label explicitly. High resource demands suit API use on each::labs but may slow local setups; regional access limits beta features.
Key considerations
Key Considerations

Before using Bytedance | Seedance 2.0 | Reference to Video, ensure access via API providers like each::labs, as regional restrictions apply in some areas. It shines in scenarios needing multimodal references for consistency, outperforming text-only models for complex scenes with character or style matching. Prerequisites include clear reference files and detailed prompts; high API costs may factor into production use. Opt for this over alternatives when native audio sync and multi-image control are critical, balancing cost against superior physics and motion realism.
Limitations
Limitations

Bytedance | Seedance 2.0 | Reference to Video prohibits videos from real-face images/videos for safety, watermarking all outputs invisibly. Max inputs cap at 12 files, with durations up to 60s (shorter in some rollouts like 15s). No support for unauthorized IP generation; complex multi-shot prompts may not always seamless transition. Quality dips in hyper-detailed textures or unusual angles without strong references.