Example inputhover

prompt: "Ultra-realistic UGC-style vertical video of a woman in her early 30s with short wavy auburn hair, blue-grey eyes, light freckles and natural glowy skin, wearing a casual oversized white tee and light wash jeans, filmed as if she's holding her phone at arm's length in a bright airy room with soft natural window light. She's holding up two washed dad caps — one in faded rust red, one in vintage denim blue — both embroidered with "ship happens." on the front panel. She holds them up side by side toward the camera and says: "Okay so I finally got both colorways and I genuinely can't pick — the rust one is so warm and vintage, the blue is giving total coastal cool girl, both are washed cotton so they're already broken in straight out of the bag — and honestly? Ship happens, so I just got both. Which one would you wear?" She tilts each hat slightly as she mentions it, laughs softly at her own joke, smiling naturally, not overly posed. Handheld slightly shaky camera feel, authentic TikTok/Instagram Reel style, warm indoor lighting."
duration: "13"
image_url
resolution: "720p"
aspect_ratio: "auto"
generate_audio: true

Bytedance Seedance 2.0 · Image to Video

Video·seedance-2.0·by Bytedance

A next-generation video model delivering cinematic visuals with native audio, realistic physics, and precise camera control, supporting text, image, audio, and video inputs.

Try it now →

API reference

Runtime (p50): 3m
Estimated price: From $0.12

Call the API

prediction.sh

curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "bytedance-seedance-2-0-image-to-video",
    "version": "0.0.1",
    "input": {
        "prompt": "Ultra-realistic UGC-style vertical video of a woman in her early 30s with short wavy auburn hair, blue-grey eyes, light freckles and natural glowy skin, wearing a casual oversized white tee and light wash jeans, filmed as if she's holding her phone at arm's length in a bright airy room with soft natural window light. She's holding up two washed dad caps — one in faded rust red, one in vintage denim blue — both embroidered with \"ship happens.\" on the front panel. She holds them up side by side toward the camera and says: \"Okay so I finally got both colorways and I genuinely can't pick — the rust one is so warm and vintage, the blue is giving total coastal cool girl, both are washed cotton so they're already broken in straight out of the bag — and honestly? Ship happens, so I just got both. Which one would you wear?\" She tilts each hat slightly as she mentions it, laughs softly at her own joke, smiling naturally, not overly posed. Handheld slightly shaky camera feel, authentic TikTok/Instagram Reel style, warm indoor lighting.",
        "duration": "13",
        "image_url": "https://storage.googleapis.com/magicpoint/inputs/bytedance-seedance-2-0-image-to-video-input.jpg",
        "resolution": "720p",
        "aspect_ratio": "auto",
        "generate_audio": true
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/

Documentation8 sections

Overview
Bytedance | Seedance 2.0 | Image to Video Overview

Bytedance | Seedance 2.0 | Image to Video transforms static images into dynamic, cinematic videos with native audio synchronization, realistic physics, and precise motion control. Developed by ByteDance's Seed research team as part of the Seedance family, this flagship model excels in multimodal workflows, accepting images alongside text, video, and audio inputs for superior reference handling.

Its primary differentiator is the ability to combine up to 9 images, 3 video clips, and 3 audio files in a single generation pass, enabling role-based asset tagging like "@Image1 as main character" for unmatched consistency in identity locking and motion transfer. Creators gain directorial control over complex scenes, from character animations to beat-synced performances, making Bytedance | Seedance 2.0 | Image to Video ideal for professional video production on each::labs.

Released in early 2026, it supports image-to-video animation up to 1080p, powering applications in marketing, tutorials, and storytelling where visual fidelity and audio alignment are critical.
Capabilities
Capabilities
- Animates static images into videos, using them as first frame with optional end-frame control.
- Multimodal inputs: Up to 9 images, 3 videos, 3 audios, referenced via @tags or [Image1] for binding.
- Native audio generation and sync, including lip movements for quoted dialogue and beat-aware music alignment.
- Identity locking and motion transfer: Preserves facial features, clothing across frames using reference clusters.
- Realistic physics for interactions like sports, dancing, collisions.
- Cinematic camera control: Push-in, pan, orbit, tracking shots via prompt keywords.
- Multi-shot storyboarding and clip extension for longer narratives.
- Character consistency frame-to-frame and across generations.
Use cases
Use Cases for Bytedance | Seedance 2.0 | Image to Video

Content Creators: Animate character sketches into talking-head videos. Example: "@Image1 as host explains recipe, lip-sync to 'Stir gently,' with kitchen physics." Leverages identity locking for consistent branding.

Marketers: Generate product demos from photos. Example: "@Image2 product on table rotates 360 degrees, camera orbits, adds 'Now available' voiceover." Uses motion transfer for engaging visuals.

Developers: Prototype app interfaces with motion. Example: "@Image3 UI screen transitions via swipe gesture from @Video4 reference, subtle sound effects." Fast tier speeds API iterations via each::labs.

Designers: Create fitness tutorials from pose images. Example: "@Image5 athlete in starting pose jumps rope, realistic physics and upbeat audio sync." Ensures frame-to-frame consistency.

These scenarios highlight Bytedance | Seedance 2.0 | Image to Video's strengths in multimodal precision and audio-visual coherence.
Tips & tricks
Tips and Tricks

For Bytedance | Seedance 2.0 | Image to Video, use role-based tagging in prompts: "@Image1 as dancer performs a spin with realistic physics." Reference multiple assets hierarchically in a "Reference Cluster" to lock identity and transfer motion from videos.

Optimize parameters by specifying camera moves like "push-in shot" or "orbit pan" for cinematic control, and enclose dialogue in quotes for lip-synced audio: "The chef says, 'Perfect timing,' as ingredients mix." Start with Fast tier for iterations, then refine in Standard.

Workflow tip: Animate a single image as the first frame, add an end-frame image for controlled transitions, and include audio for beat-aware sync. Example prompts:
- "@Image1 as athlete jumps over hurdle, @Video2 motion reference, energetic music sync."
- "Animate @Image3 portrait speaking: 'Welcome to our product,' with smooth head turns."
- "@Image4 landscape at sunset, camera tracks right with wind physics and ambient sounds."
These leverage the model's multimodal strengths for consistent, professional results.
Technical spec
Technical Specifications
- Resolution Support: Up to 1080p (standard), with cinematic 2K quality in select tiers.
- Max Duration: 4-15 seconds per clip, with multi-shot storyboarding and extension capabilities; some reports note up to 60 seconds.
- Aspect Ratios: 16:9, 9:16, 4:3, 3:4, 21:9, 1:1.
- Input Formats: Images (up to 9), video clips (up to 3), audio files (up to 3), text prompts; references tagged as [Image1], [Video1], etc.
- Output Formats: Video with native synchronized audio in one pass; includes invisible watermark.
- Processing Tiers: Standard for cinematic quality, Fast for speed-optimized generation.
- Architecture: Unified multimodal audio-video system with binding logic and reference clusters for asset control.
Average processing time varies by tier, with Fast options suited for rapid iteration.
Things to be aware of
Things to Be Aware Of

Bytedance | Seedance 2.0 | Image to Video may struggle with highly complex multi-subject interactions beyond references provided, leading to minor inconsistencies in crowded scenes. Common mistakes include vague prompts without @tagging, causing ignored assets—always bind explicitly.

Edge cases like extreme deformations or abstract art inputs can reduce physics accuracy; test with realistic images first. Outputs carry invisible watermarks for traceability, visible in detection tools.

Resource needs scale with Standard tier; use Fast for low-latency previews. Regional beta limits may affect direct access outside platforms like each::labs.
Key considerations
Key Considerations

Before using Bytedance | Seedance 2.0 | Image to Video on each::labs, ensure inputs are high-quality images for optimal animation, as the model preserves input style while adding motion. It shines in scenarios needing multimodal references, like consistent character videos, over pure text-to-video alternatives.

Prerequisites include clear prompt tagging for references (e.g., @Image1) and awareness of regional access limits in some ecosystems. Cost-performance tradeoffs favor Fast tier for quick prototypes versus Standard for production-grade output with audio sync.

Best for creators prioritizing physics realism and camera control, but test short clips first due to duration caps.
Limitations
Limitations

Bytedance | Seedance 2.0 | Image to Video caps at 15 seconds per clip (extendable but not native 60s in all cases), with max 1080p resolution below some 4K competitors. It cannot handle unlimited references beyond 9 images/3 videos/3 audios.

Performance dips in non-reference heavy scenes or without clear prompts; abstract or low-quality inputs yield less coherent motion. Strict input binding required—loose prompts ignore multimodality.

Regional locks and high API costs limit casual use.
---