inference · 30.0s

Example inputhover

seed: 1
prompt: "A cinematic shot of a motorcyclist riding along a coastal road during sunset. The sky is filled with vibrant orange, pink, and purple hues, reflecting on the ocean beside the road. The camera smoothly tracks from the side, capturing the silhouette of the rider against the glowing horizon. The motorcycle moves steadily, wind flowing through the rider’s clothes and helmet reflections catching the sunlight. Waves softly crashing in the background. Warm golden light, dramatic shadows, lens flare, ultra realistic, 4K, shallow depth of field, cinematic color grading, smooth motion, powerful and emotional atmosphere."
quality: "720p"
duration: "10"
resolution: "720p"
aspect_ratio: "16:9"
thinking_type: "auto"
generate_audio_switch: true

PixVerse V6 Text to Video API

Name: PixVerse V6 Text to Video
Brand: PixVerse
Availability: InStock

Video·PixVerse V6·by Pixverse

PixVerse V6 transforms prompts into high-quality videos with synchronized audio, supporting multiple aspect ratios, single or multi-clip storytelling, and enhanced prompt understanding for more accurate and dynamic results.

Try it now →

API reference

Runtime (p50): 2m
Estimated price: $0.005 / credit

Call the API

prediction.sh

curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "pixverse-v6-text-to-video",
    "version": "0.0.1",
    "input": {
        "seed": 1,
        "prompt": "A cinematic shot of a motorcyclist riding along a coastal road during sunset. The sky is filled with vibrant orange, pink, and purple hues, reflecting on the ocean beside the road. The camera smoothly tracks from the side, capturing the silhouette of the rider against the glowing horizon. The motorcycle moves steadily, wind flowing through the rider’s clothes and helmet reflections catching the sunlight. Waves softly crashing in the background. Warm golden light, dramatic shadows, lens flare, ultra realistic, 4K, shallow depth of field, cinematic color grading, smooth motion, powerful and emotional atmosphere.",
        "quality": "720p",
        "duration": "10",
        "resolution": "720p",
        "aspect_ratio": "16:9",
        "thinking_type": "auto",
        "generate_audio_switch": true
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/

Documentation8 sections

Overview
PixVerse | V6 | Text to Video Overview

PixVerse | V6 | Text to Video transforms detailed text prompts into high-quality videos up to 1080p resolution and 15 seconds long, solving the challenge of creating professional-grade short-form content without complex editing. Developed by PixVerse, a Singapore-based AI video platform founded in 2023, this model stands out with its single-pass generation of multi-shot storytelling, native synchronized audio, and cross-frame facial emotion consistency. The primary differentiator is its ability to produce seamless 15-second 1080p videos in one generation, eliminating clip-stitching artifacts common in earlier models. Ideal for creators needing cinematic outputs with precise camera controls and realistic physics, PixVerse | V6 | Text to Video powers marketing clips, social media reels, and product demos directly from text or images on each::labs. Access it via the PixVerse | V6 | Text to Video API for streamlined integration.
Capabilities
Capabilities
- Generates 15-second 1080p multi-shot videos from a single text prompt with seamless transitions.
- Native audio synchronization including dialogue, background music, and sound effects.
- Precise camera controls: dolly tracking, pans, zooms, and reveal shots with reliable execution.
- Cross-frame facial emotion consistency for characters across scenes.
- Realistic physics simulation for fabrics, fluids, collisions, and object interactions.
- Supports text-to-video and image-to-video modes with strong subject fidelity.
- Extended aspect ratios including 16:9, 9:16, 21:9 for diverse formats.
- Multilingual text overlays and enhanced prompt reasoning for complex narratives.
Use cases
Use Cases for PixVerse | V6 | Text to Video

Marketers creating product ads: Leverage multi-shot storytelling and native audio for a 10-second demo: "Sleek smartphone rotates on pedestal, camera circles 360, user smiles excitedly unboxing, triumphant music swells." Ensures emotion consistency across reveal shots.

Content creators for social reels: Use 9:16 vertical format with physics accuracy: "Dancer flips through urban street, fabric flows realistically, crowd cheers with synced SFX, fast pans." Perfect for TikTok engagement.

Designers prototyping visuals: Image-to-video mode animates sketches: "Static logo design morphs into animated brand intro, dolly out to full scene, orchestral BGM." Maintains fidelity for quick iterations.

Developers integrating via API: Build apps with PixVerse | V6 | Text to Video API for dynamic trailers: "Epic fantasy hero battles dragon, cross-frame rage to victory, 16:9 cinematic." Scales for personalized user content on each::labs.
Tips & tricks
Tips and Tricks

For best results with PixVerse | V6 | Text to Video, craft prompts with specific camera actions, emotions, and physics details to leverage its precise controls. Use negative prompts to avoid artifacts, as supported from prior versions. Optimize by selecting 1080p only for finals—start with 720p for speed. Structure prompts as "scene 1: [description], camera dolly in; scene 2: [action with emotion continuity]" for multi-shot flow.

Example prompts:
- "A confident entrepreneur pitches in a modern office, dolly zoom on smiling face, cross-frame excitement building to product reveal, upbeat BGM, 16:9."
- "Serene ocean waves crash on rocks at sunset, slow pan right with fluid physics, seagulls calling overhead, 9:16 vertical."
- "Cartoon fox chases butterfly through forest, jumping collisions realistic, joyful expressions consistent, multi-clip narrative, 1:1."
Combine image inputs for character consistency in image-to-video mode. Test aspect ratios for social platforms early.
Technical spec
Technical Specifications
- Resolution: 360p to 1080p, supporting high-definition single-pass generation.
- Max Duration: 15 seconds for multi-shot videos; standard options include 5, 8, and 10 seconds.
- Aspect Ratios: 16:9, 9:16, 1:1, 3:4, 4:3, 21:9, and others for platform-optimized outputs.
- Input Formats: Text prompts for text-to-video; images for image-to-video with subject fidelity.
- Output: MP4 videos with native audio (dialogue, BGM, SFX), multilingual text overlays.
- Processing Time: Varies by resolution and duration; 1080p clips generate in under a minute on optimized platforms.
- Release Date: March 30, 2026.
These specs enable enterprise-grade quality with enhanced semantic understanding and physics simulation.
Things to be aware of
Things to Be Aware Of

PixVerse | V6 | Text to Video may struggle with highly complex multi-character interactions beyond refined V4.5 capabilities, leading to minor inconsistencies. Edge cases like extreme weather physics or rapid cuts can introduce subtle artifacts despite improvements. Users often overlook negative prompts, causing unwanted elements; always specify exclusions. High-resolution 1080p demands more credits and processing time—monitor quotas on each::labs. Vague prompts yield generic outputs; detailed scene breakdowns are essential for camera and emotion precision. Test on shorter durations first to refine before full 15-second renders.
Key considerations
Key Considerations

Before using PixVerse | V6 | Text to Video, ensure prompts are detailed for optimal multi-shot narratives and camera movements. It excels in short-form content like ads or reels but may require credits scaling with resolution—e.g., 75 credits for 5-second 1080p. Best for scenarios needing quick, consistent character emotions and audio sync over longer edits; choose alternatives for videos exceeding 15 seconds. On each::labs, factor in API rate limits for high-volume production. Prerequisites include a clear creative vision, as the model thrives on descriptive inputs rather than vague ideas. Balance cost by starting at lower resolutions for drafts.
Limitations
Limitations

PixVerse | V6 | Text to Video caps at 15 seconds, unsuitable for longer narratives without external stitching. Audio add-ons increase costs, and generation quality dips below 720p for drafts. Complex physics in crowded scenes may not match fully manual VFX. Limited to supported aspect ratios; custom ratios unavailable. No real-time generation—R1 handles that separately. Input images must align with prompts for optimal fidelity.