Example inputhover

prompt: "Ultra-realistic BBC nature documentary style video. A majestic male lion stands roaring on a rocky outcrop at golden hour, dusty African savanna stretching behind him, a herd of elephants moving through the haze in the background. Warm amber light rakes across his mane. Camera slowly pushes in from wide shot to medium close-up as the narrator speaks. A weathered British male narrator in his 50s, calm and authoritative voice, says directly to camera in voiceover: "He has ruled this land for seven years. Every creature within twenty miles knows his roar. But today, something has changed — the elephants are moving, and he knows why." Cut to extreme close-up of the lion is eye. Dust particles float in the air. The roar fades into silence. Cinematic 4K, shallow depth of field, natural lighting."
image_url
resolution: "720p"
duration: "auto"
aspect_ratio: "auto"
generate_audio: true

Bytedance Seedance 2.0 Image to Video · Fast

Video·seedance-2.0·by Bytedance

An advanced video model delivering cinematic visuals with native audio, realistic physics, and precise camera control, supporting text, image, audio, and video inputs.

Try it now →

API reference

Runtime (p50): 3m
Estimated price: From $0.1

Call the API

prediction.sh

curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "bytedance-seedance-2-0-image-to-video-fast",
    "version": "0.0.1",
    "input": {
        "prompt": "Ultra-realistic BBC nature documentary style video. A majestic male lion stands roaring on a rocky outcrop at golden hour, dusty African savanna stretching behind him, a herd of elephants moving through the haze in the background. Warm amber light rakes across his mane. Camera slowly pushes in from wide shot to medium close-up as the narrator speaks. A weathered British male narrator in his 50s, calm and authoritative voice, says directly to camera in voiceover: \"He has ruled this land for seven years. Every creature within twenty miles knows his roar. But today, something has changed — the elephants are moving, and he knows why.\" Cut to extreme close-up of the lion is eye. Dust particles float in the air. The roar fades into silence. Cinematic 4K, shallow depth of field, natural lighting.",
        "image_url": "https://storage.googleapis.com/magicpoint/inputs/bytedance-seedance-2-0-image-to-video-fast-input.png",
        "resolution": "720p",
        "duration": "auto",
        "aspect_ratio": "auto",
        "generate_audio": true
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/

Documentation8 sections

Overview
Bytedance | Seedance 2.0 | Image to Video | Fast Overview

Bytedance | Seedance 2.0 | Image to Video | Fast is ByteDance's speed-optimized endpoint for converting static images into dynamic video content with synchronized audio and cinematic motion. This model solves the creator's dilemma between quality and iteration speed by delivering production-ready video output without sacrificing core motion quality. Built on a unified multimodal architecture, Bytedance | Seedance 2.0 | Image to Video | Fast accepts images alongside text prompts, audio references, and video clips to generate coherent, audio-synced video in a single pass. The Fast tier prioritizes rapid turnaround for high-throughput creative pipelines while maintaining the character consistency and realistic physics that define the Seedance 2.0 family.
Capabilities
Capabilities
- Native audio-video co-generation with lip-sync and contextual sound effects in a single pass
- Identity locking and motion transfer simultaneously—maintain character facial features and clothing while applying new movement patterns
- Multi-shot storyboarding with seamless cuts and transitions from a single prompt
- Reference-based character consistency across multiple generated clips
- Cinematic camera control including push-in, pan, orbit, and tracking shots via natural language
- Multimodal input binding—combine up to 9 images, 3 videos, and 3 audio files with precise asset control
- Realistic physics rendering for complex interactions including sports, dancing, and object collisions
- Beat-aware audio synchronization for music-driven content
Use cases
Use Cases for Bytedance | Seedance 2.0 | Image to Video | Fast

Content Creator Rapid Prototyping: Creators can test multiple video concepts from early sketches or storyboard images before committing to full production. Use a reference image of your scene concept with a prompt like "cinematic establishing shot of a modern office with natural lighting and subtle camera movement" to validate visual direction in seconds.

Marketing and Product Demos: Marketers generate product overview videos and business demonstrations with consistent branding by uploading product images and logos as references. The Fast tier enables rapid iteration across multiple product angles: "360-degree product reveal of [Image1] with professional lighting and smooth rotation."

Fitness and Educational Content: Instructors create tutorial videos by animating reference images of exercise positions or instructional diagrams. Example: "Fitness trainer [Image1] performing a squat exercise with slow, controlled motion and clear form demonstration."

Social Media Content Pipelines: High-volume creators leverage the Fast tier to generate multiple short-form clips for platforms like TikTok and Instagram Reels, using character reference images to maintain visual consistency across a content series.
Tips & tricks
Tips and Tricks

Leverage the @ symbol syntax to bind specific uploaded assets to your text prompt—this "binding logic" tells the model exactly which part of your prompt should be governed by which image, video, or audio file. When using multiple image references, organize them hierarchically in your reference cluster to establish visual consistency across generated frames. For motion-heavy content like dancing or sports, provide a video reference that demonstrates the desired movement pattern; Seedance 2.0 excels at motion transfer while maintaining character identity. Use descriptive camera direction keywords in your prompt such as "push-in," "pan," "orbit," or "tracking shot" to control cinematic framing. Example prompts: "A woman in a red dress [Image1] dancing to upbeat music [Audio1]" or "Product showcase [Image1] with smooth camera pan and professional lighting."
Technical spec
Technical Specifications
- Maximum clip duration: 15 seconds
- Maximum resolution: 1080p
- Supported aspect ratios: 16:9, 9:16, 4:3, 3:4, 21:9, 1:1
- Input formats: Images (up to 9 references), video clips (up to 3), audio files (up to 3), plus text prompts
- Output: Video with native audio co-generation in a single render pass
- Architecture: Unified multimodal quad-modal system with binding logic for precise asset control
- Processing tier: Fast tier optimized for lower latency and cost compared to standard quality tier
Things to be aware of
Things to Be Aware Of

The 15-second maximum duration requires planning for longer narratives—consider generating multiple clips and composing them in post-production. Motion-heavy content like sports or dancing benefits from video references; without them, the model may produce less dynamic results. The Fast tier prioritizes speed over visual refinement, so expect slightly lower detail fidelity compared to the standard quality tier. Character consistency improves significantly when you provide facial reference images; generic prompts alone may produce variable results across generations. Be aware that generated content includes an invisible watermark for identification purposes.
Key considerations
Key Considerations

Bytedance | Seedance 2.0 | Image to Video | Fast is purpose-built for creators prioritizing speed and iteration over maximum resolution. The 15-second maximum duration suits short-form content, social media clips, and rapid prototyping workflows rather than long-form video production. This model excels when you need to test creative concepts quickly or generate high-throughput content for marketing campaigns. The Fast tier trades some visual polish for reduced latency, making it ideal for workflows where turnaround time matters more than cinematic perfection. Regional availability and API access may be limited depending on your location.
Limitations
Limitations

Bytedance | Seedance 2.0 | Image to Video | Fast cannot exceed 1080p resolution, limiting use cases requiring 4K output. The 15-second clip length restricts long-form storytelling and requires segmentation for extended narratives. Regional restrictions and limited beta access may prevent availability in certain geographic areas. The model performs best with clear, well-lit reference images; low-quality or ambiguous source images may produce inconsistent results. Complex physics interactions involving multiple objects or extreme motion may still face challenges despite improvements over earlier versions.