How do I use Seedance v1.5 Pro Text to Video via API?

Seedance v1.5 Pro Text to Video is available through the eachlabs unified API. Provide a text prompt describing the scene, motion, or style; the model returns a generated video. eachlabs manages authentication and billing on a pay-as-you-go basis.

What is Seedance v1.5 Pro Text to Video best suited for?

Seedance v1.5 Pro Text to Video is best suited for social media content creators, marketers, and developers building video generation features. It excels in rapid video prototyping and content production for platforms where short-form, visually engaging video is the primary format.

Example inputhover

prompt: "Ultra-photorealistic studio scene of a professional presenter in a clean modern setting, softly lit with cinematic lighting, facing the camera and speaking clearly with natural lip sync, subtle head and facial movement, smooth camera motion, and no on-screen text or logos, saying: “Welcome. This system transforms ideas into powerful visual experiences.”"
aspect_ratio: "16:9"
resolution: "720p"
duration: "5"
generate_audio: true

Seedance V1.5 Pro · Text to Video

Video·seedance-v1.5·by Bytedance

Seedance 1.5 Text to Video Pro generates high-quality videos with synchronized audio from text prompts, delivering smooth motion, cinematic visuals, and immersive sound in a single creation pipeline.

Try it now →

API reference

Runtime (p50): -
Estimated price: From $1.20

Call the API

prediction.sh

curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "seedance-v1-5-pro-text-to-video",
    "version": "0.0.1",
    "input": {
        "prompt": "Ultra-photorealistic studio scene of a professional presenter in a clean modern setting, softly lit with cinematic lighting, facing the camera and speaking clearly with natural lip sync, subtle head and facial movement, smooth camera motion, and no on-screen text or logos, saying: “Welcome. This system transforms ideas into powerful visual experiences.”",
        "aspect_ratio": "16:9",
        "resolution": "720p",
        "duration": "5",
        "generate_audio": true
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/

Documentation8 sections

Overview
seedance-v1.5-pro-text-to-video — Text to Video AI Model

Developed by Bytedance as part of the seedance-v1.5 family, seedance-v1.5-pro-text-to-video revolutionizes video creation by generating high-quality videos with natively synchronized audio directly from text prompts, eliminating post-production editing for dialogue, sound effects, and ambient noise.

This Bytedance text-to-video model excels in text-to-video AI model tasks, producing cinematic visuals with professional camera controls like dolly zooms and tracking shots, all in a single pipeline. Ideal for creators seeking fast, immersive content, it supports 5-10 second clips at up to 1080p resolution, making it a top choice for text-to-video with audio sync workflows.
Capabilities
- Generates high-fidelity 1080p videos with native synchronized audio, including speech, sound effects, and music
- Precise multilingual lip-sync across 6+ languages with natural pronunciation, emotional expression, and minimal artifacts
- Advanced cinematic camera control: dolly zooms, long takes, smooth transitions, and dynamic motion
- Strong prompt adherence for complex, multi-layered instructions involving visuals, audio, and narrative pacing
- Excellent audio quality: clear voices, spatial reverb, balanced expressiveness without over-emotion
- Versatile for T2V and I2V, with automatic duration adaptation and 10x+ speed for efficient workflows
- High motion vividness, aesthetic quality, and temporal synchronization in benchmarks
Use cases
Use Cases for seedance-v1.5-pro-text-to-video

Content creators producing TikTok or Reels can input detailed prompts for fast-paced clips with multi-shot transitions and native audio, like "A barista pours espresso into a cup with steam rising, cafe chatter and espresso machine hiss, tracking shot from side angle," yielding perfectly synced 9:16 videos ready for social media.

Marketers building Bytedance text-to-video ads benefit from precise product movements and camera controls; feed a product image plus "Zoom in on smartphone displaying app demo, executive voiceover in Spanish with lip-sync, professional lighting," to create multilingual commercials without studios.

Developers integrating seedance-v1.5-pro-text-to-video API into apps use its complex instruction following for script visualization, generating animatics with dialogue and effects from storyboards, streamlining pre-production for films or games.

Filmmakers testing short scenes leverage multilingual lip-sync for diverse casts, turning text scripts into 1080p clips with natural dialects and cinematic framing, accelerating narrative prototyping.
Tips & tricks
How to Use seedance-v1.5-pro-text-to-video on Eachlabs

Access seedance-v1.5-pro-text-to-video seamlessly on Eachlabs via the Playground for instant testing, API for production-scale text-to-video AI model integrations, or SDK for custom apps. Input text prompts describing scenes, actions, camera moves, and audio needs; add image references for consistency. Select duration (5-12s), resolution (up to 1080p), and aspect ratio to generate MP4 outputs with embedded synchronized audio in under a minute.
---
Technical spec
What Sets seedance-v1.5-pro-text-to-video Apart

seedance-v1.5-pro-text-to-video stands out with its native joint audio-video generation using a dual-branch Diffusion-Transformer architecture, processing visuals and sound simultaneously for millisecond-level lip-sync and environmental audio alignment. This enables creators to produce ready-to-use videos without separate dubbing or syncing, saving hours in post-production.

Unlike many competitors, it supports multilingual and dialect-specific lip-sync, adjusting mouth movements and expressions to match phonetic patterns in languages beyond English. Users can generate localized content for global markets efficiently, reducing localization costs dramatically.

It masters complex multi-subject prompts and 15+ cinematic camera techniques, such as crane shots and orbits, while maintaining subject consistency. This allows precise control over dynamic scenes, perfect for professional ads or narratives that other models often muddle.
- Resolutions: 480p, 720p, 1080p with aspect ratios like 16:9, 9:16, 21:9
- Duration: 5-10 seconds (up to 12s in some modes), ~41s generation for 5s 1080p clip
- Inputs: Text prompts up to 5,000 characters; optional image references for image-to-video
- Outputs: MP4 with embedded synchronized audio (voice, SFX, music)
Things to be aware of
- Excels in audio-visual sync for long dialogues and rapid lip movements, outperforming stitched pipelines per benchmarks
- Users note top-tier natural voices, reduced mechanical artifacts, and realistic spatial audio, especially in Chinese dialects
- Cinematic understanding allows dramatic storytelling with controlled emotional tones for professional stability
- Resource-efficient with 10x speedups via optimizations, suitable for real-world workflows
- Strong in prompt following and visuals, competitive in I2V tasks
- Motion stability improves but may waver in extreme high-intensity scenarios, per evaluations
- Community feedback highlights reliable deployment readiness and benchmark-leading performance
Key considerations
- Prioritize prompts with clear dialogue, camera instructions, and audio elements for best synchronization and adherence
- Use high-quality, detailed prompts specifying style, motion, emotions, and languages to leverage multilingual lip-sync strengths
- Avoid extremely high-intensity motion scenarios where stability may degrade
- Balance quality and speed by utilizing acceleration features, but test iterations for complex narratives
- For optimal results, set video length to -1 for automatic adaptation based on narrative rhythm and completeness
- Common pitfalls include over-specifying conflicting elements; refine prompts iteratively to maintain coherence
Limitations
- Motion stability can degrade in extremely high-intensity or complex action scenarios
- Less precise character consistency across multiple shots compared to models with reference image support
- Primarily optimized for short clips (4-12 seconds), with potential challenges in extending to longer formats without extensions

Related models

4 models

ByteDance Seedance 2.0 Mini · Text to Video AI model preview

ByteDance Seedance 2.0 Mini · Text to VideoBytedance

Heygen · Avatar VHeyGEN

Kling o3 Standard · Text to VideoKling

Google Gemini Omni Flash · Text to Video AI model preview

Google Gemini Omni Flash · Text to VideoGoogle

* FAQ

About Seedance V1.5 Pro · Text to Video

01 / 03

What is Seedance v1.5 Pro Text to Video?

Seedance v1.5 Pro Text to Video is an AI video generation model by ByteDance that creates short video clips from natural language prompts. Built on ByteDance's short-form video expertise, it generates high-quality, dynamic scenes with consistent visual style from descriptive text input.

Seedance V1.5 Pro · Text to Video