Example inputhover

prompt: "Ultra realistic modern cinematic reinterpretation of a 1920s gothic silent horror scene, tall thin vampire-like silhouette slowly climbing staircase, exaggerated shadow stretching along wall, high contrast black and white lighting, dramatic expressionist set design with distorted angles, slow cinematic push in, subtle film grain texture, eerie atmosphere,"
duration: "10"
resolution: "720p"
aspect_ratio: "16:9"
generate_audio: true

Bytedance Seedance 2.0 Text to Video · Fast

Video·seedance-2.0·by Bytedance

A cutting-edge video generation model delivering cinematic visuals with native audio, realistic physics, and director-level camera control, supporting text, image, audio, and video inputs.

Try it now →

API reference

Runtime (p50): 2m
Estimated price: From $0.1129

Call the API

prediction.sh

curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "bytedance-seedance-2-0-text-to-video-fast",
    "version": "0.0.1",
    "input": {
        "prompt": "Ultra realistic modern cinematic reinterpretation of a 1920s gothic silent horror scene, tall thin vampire-like silhouette slowly climbing staircase, exaggerated shadow stretching along wall, high contrast black and white lighting, dramatic expressionist set design with distorted angles, slow cinematic push in, subtle film grain texture, eerie atmosphere,",
        "duration": "10",
        "resolution": "720p",
        "aspect_ratio": "16:9",
        "generate_audio": true
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/

Documentation8 sections

Overview
Bytedance | Seedance 2.0 | Text to Video | Fast Overview

Bytedance | Seedance 2.0 | Text to Video | Fast is ByteDance's flagship AI video generation model that transforms text prompts, images, audio, and video clips into high-quality cinematic videos with native synchronized audio. Part of the Seedance family from ByteDance, it solves the challenge of creating realistic, controllable video content quickly for creators and developers. Its primary differentiator is advanced timeline prompting and multimodal inputs, enabling precise control over motion, physics, and scene sequences in a single generation pass.

This fast variant delivers 1080p output across 16:9, 9:16, and 1:1 aspect ratios, supporting up to 60-second clips with realistic physics and director-level camera movements. Accessible via API on platforms like each::labs, Bytedance | Seedance 2.0 | Text to Video | Fast empowers users to generate professional-grade videos efficiently, ideal for iterative workflows in content creation.
Capabilities
Capabilities
- Text-to-video generation with cinematic visuals, multi-subject interactions, and emotional tone control
- Image-to-video animation, preserving input style while adding natural motion; supports start/end frames
- Multimodal inputs: up to 9 images, 3 videos, 3 audios for synchronized output in one pass
- Timeline prompting for precise temporal control over pacing, actions, and sequences
- Native audio generation with lip-sync for quoted dialogue and rhythm matching
- Realistic physics simulation for sports, dancing, collisions, and object interactions
- Director-level camera control: movements, angles, multi-shot editing
- Multi-image references (up to 4+) for consistent characters, styles, and scenes
Use cases
Use Cases for Bytedance | Seedance 2.0 | Text to Video | Fast

Content Creators: Generate TikTok-ready vertical videos using timeline prompting for precise dance sequences. Example: "Dancer in spotlight [Image1], 0-3s: slow spin, 3-6s: fast jumps to [Audio1] beat, 9:16."

Marketers: Create product overviews with realistic physics demos from reference images. Example: "Smartphone falls and bounces realistically [Image1 start], [Image2 end intact], cinematic 16:9."

Developers: Prototype app visuals via Bytedance | Seedance 2.0 | Text to Video | Fast API, testing UI animations with multi-reference inputs for consistency.

Designers: Animate storyboards using video clips and audio for client pitches, leveraging native sync. Example: "Storyboard [Video1] evolves with character dialogue 'Welcome aboard,' lip-synced."
Tips & tricks
Tips and Tricks

For Bytedance | Seedance 2.0 | Text to Video | Fast, use timeline prompting to specify actions at timestamps, e.g., "0-2s: character enters frame left, 2-5s: jumps over obstacle." Reference multimodal inputs explicitly: "[Image1] of a dancer in red dress performs [Audio1] rhythm." Optimize by starting with simple text-to-video, then adding images for style consistency or end frames for precise conclusions.

Example prompts:
- "A chef chops vegetables rapidly, steam rising, cinematic close-up, 16:9, realistic physics."
- "[Image1] mountain landscape at dawn transitions to hiker climbing, timeline: 0s static, 3s motion starts, with wind audio [Audio1]."
- ""The athlete sprints 'Go faster!'"" – generates lip-synced dialogue and voice."
Iterate by testing short durations first; combine up to 12 files for complex scenes to maintain character consistency.
Technical spec
Technical Specifications
- Resolution: Up to 1080p (full HD)
- Max Duration: Up to 60 seconds
- Aspect Ratios: 16:9 (landscape), 9:16 (vertical), 1:1 (square)
- Input Types: Text prompts, up to 9 images, 3 video clips, 3 audio files (multimodal references like [Image1], [Video1])
- Output Format: Video with native synchronized audio
- Processing Time: Fast generation suitable for iterative work, shorter cycles than models taking 8-12 minutes per clip
- Architecture: Unified multimodal model handling text, image, audio, video inputs in one pass
These specs make Bytedance | Seedance 2.0 | Text to Video | Fast versatile for various platforms.
Things to be aware of
Things to Be Aware Of

Bytedance | Seedance 2.0 | Text to Video | Fast may struggle with real faces in image/video inputs during initial rollouts due to restrictions. Complex multi-subject scenes can lose consistency without multiple references. Common mistakes include vague prompts without timeline specifics, leading to poor pacing; always reference inputs clearly. High API usage demands monitoring credits, as multimodal generations consume more resources. Outputs include invisible watermarks for identification.

Edge cases like extreme lighting or rapid actions test physics limits; preview short clips first.
Key considerations
Key Considerations

Before using Bytedance | Seedance 2.0 | Text to Video | Fast, ensure access via API providers like each::labs, as it features regional restrictions and beta rollouts in some areas. It excels in scenarios needing quick iterations, such as prototyping ideas or short-form social content, over slower models for long-form videos. Balance cost with its speed advantage for high-volume tasks, prioritizing prompts with timeline details for optimal results. Multimodal inputs require clear referencing in prompts to leverage full controllability.

Best for users with basic image/video editing knowledge; no advanced hardware needed beyond API credits.
Limitations
Limitations

Bytedance | Seedance 2.0 | Text to Video | Fast has regional access locks and beta constraints, limiting availability. It restricts real-face inputs in some versions to prevent misuse. Max 60-second clips and up to 12 input files cap long-form or ultra-complex scenes. May falter in highly abstract or unprecedented physics without strong references. API costs can add up for frequent iterations.