Eachlabs | AI Workflows for app builders
bytedance-seedance-2.0-reference-to-video

SEEDANCE-2.0

An advanced video generation model delivering cinematic visuals with native audio, realistic physics, and director-level camera control, supporting text, image, audio, and video inputs.

Avg Run Time: 200.000s

Model Slug: bytedance-seedance-2-0-reference-to-video

Playground

Input

Enter a URL or choose a file from your computer.

Enter a URL or choose a file from your computer.

Output

Example Result

Preview and download your result.

Default fallback (720p rate) when resolution is not specified.

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents
Overview
Technical Specifications
Key Considerations
Tips & Tricks
Capabilities
What Can I Use It For?
Things to Be Aware Of
Limitations

Overview

Bytedance | Seedance 2.0 | Reference to Video Overview

Bytedance | Seedance 2.0 | Reference to Video transforms static images, videos, audio, and text into cinematic videos with native audio synchronization and precise motion control. Developed by ByteDance as part of the Seedance family, this multimodal model excels in image-to-video generation, preserving subject identity, composition, and style while adding realistic physics and director-level camera movements. Its standout differentiator is support for up to 12 mixed reference files—images, videos, and audio—in a single generation, enabling Hollywood-grade outputs that outperform single-input competitors. Available via APIs like on each::labs, Bytedance | Seedance 2.0 | Reference to Video empowers creators to produce 1080p clips up to 60 seconds with lip-synced dialogue and sound effects, revolutionizing workflows from storyboards to final edits.

Technical Specifications

Technical Specifications
  • Resolution: Up to 1080p (full HD)
  • Max Duration: Up to 60 seconds (varies by endpoint; CapCut rollout starts at 15 seconds)
  • Aspect Ratios: Multiple ratios supported, including six standard formats
  • Inputs: Text prompts, up to 9 images, 3 video clips, 3 audio files (total up to 12 references); reference via [Image1], [Video1], etc. in prompts
  • Outputs: Video with native synchronized audio (dialogue, effects, ambient); MP4 format typical
  • Processing Time: Varies by provider; fast endpoints available for quicker inference
  • Architecture: Unified multimodal model handling text, image, video, audio inputs in one pass

These specs make Bytedance | Seedance 2.0 | Reference to Video ideal for high-fidelity image-to-video tasks on platforms like each::labs.

Key Considerations

Key Considerations

Before using Bytedance | Seedance 2.0 | Reference to Video, ensure access via API providers like each::labs, as regional restrictions apply in some areas. It shines in scenarios needing multimodal references for consistency, outperforming text-only models for complex scenes with character or style matching. Prerequisites include clear reference files and detailed prompts; high API costs may factor into production use. Opt for this over alternatives when native audio sync and multi-image control are critical, balancing cost against superior physics and motion realism.

Tips & Tricks

Tips and Tricks

For optimal results with Bytedance | Seedance 2.0 | Reference to Video, use specific references in prompts like "[Image1] of a dancer in studio lighting, performs a spin with smooth camera pan." Include dialogue in double quotes for lip-synced audio: "A chef chops vegetables, saying 'Fresh ingredients make the best meals,' with knife sounds syncing to motion." Leverage multi-references for consistency—provide up to 4 images for character/style and an end-frame image via last_image parameter to control scene closure. Optimize by starting with fast endpoints for previews, then full for finals; timeline prompts enable multi-shot sequences like "0-5s: wide shot [Image1], 5-10s: close-up zoom." Test spatial details early, as the model excels at multi-subject interactions and physics like collisions.

Capabilities

Capabilities
  • Generates cinematic 1080p videos from mixed inputs: text, images, video clips, audio with native sync
  • Multi-reference support: Up to 9 images, 3 videos, 3 audios for precise style, motion, rhythm control
  • Image-faithful animation: Preserves subject identity, lighting, composition from reference images
  • Start/end frame control: Optional first/last images for exact scene composition
  • Realistic physics and motion: Handles dancing, sports, object interactions accurately
  • Dialogue and audio generation: Lip-sync via quoted speech in prompts, timed effects
  • Multi-shot storyboards: Timeline prompting for camera angle changes
  • Video editing/extension: Modify or continue reference videos while keeping consistency

What Can I Use It For?

Use Cases for Bytedance | Seedance 2.0 | Reference to Video

Content Creators: Animate storyboards with multi-image references for consistent characters. Example: "[Image1] knight in armor, [Image2] castle background, charges into battle with sword clash sounds, camera dolly zoom." Produces 1080p clip with synced audio.

Marketers: Turn product shots into demos using image-to-video with end-frame control. Prompt: "[Image1] smartphone on table, rotates 360 degrees while narrator says 'Sleek design meets power,' ambient whoosh effects." Ideal for overviews.

Designers: Extend concept art videos with physics-accurate motion. Example: Provide reference video of fabric flow, prompt "Continue with wind gusts, realistic folds and ripples."

Fitness Trainers: Generate tutorials from pose images and audio rhythm. " [Image1] yoga pose sequence, instructor voices 'Inhale, stretch,' with breathing sync and mat creaks." Diversifies across pros needing quick, controllable cinematic output via each::labs API.

Things to Be Aware Of

Things to Be Aware Of

Bytedance | Seedance 2.0 | Reference to Video may struggle with real faces in references due to safety restrictions blocking such generations. Complex prompts with too many elements can lead to minor inconsistencies in long clips; test short durations first. Edge cases like extreme deformations or rapid multi-object interactions might show artifacts, despite strong physics. Users often overlook referencing files correctly (e.g., [Image1]), causing ignored inputs—always label explicitly. High resource demands suit API use on each::labs but may slow local setups; regional access limits beta features.

Limitations

Limitations

Bytedance | Seedance 2.0 | Reference to Video prohibits videos from real-face images/videos for safety, watermarking all outputs invisibly. Max inputs cap at 12 files, with durations up to 60s (shorter in some rollouts like 15s). No support for unauthorized IP generation; complex multi-shot prompts may not always seamless transition. Quality dips in hyper-detailed textures or unusual angles without strong references.

Pricing

Pricing Type: Dynamic

720p resolution: $0.3024 per second based on output duration.

Current Pricing

720p resolution: $0.3024 per second based on output duration.

Pricing Rules

ConditionPricing
resolution matches "720p"(Active)720p resolution: $0.3024 per second based on output duration.
resolution matches "480p"480p resolution: $0.1345 per second based on output duration.
Rule 3Default fallback (720p rate) when resolution is not specified.