LTX
LTX-Video is the first DiT-based video generation model that can generate high-quality videos in real time
Avg Run Time: 21.000s
Model Slug: ltx-video
Playground
Input
Enter a URL or choose a file from your computer.
Click to upload or drag and drop
image/jpeg, image/png, image/jpg, image/webp (Max 50MB)
Output
Example Result
Preview and download your result.
API & SDK
Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
Readme
Overview
ltx-video — Text to Video AI Model
Developed by Lightricks as part of the ltx family, ltx-video is a pioneering DiT-based text-to-video AI model that generates high-quality videos with synchronized audio in real time, solving the challenge of creating production-ready clips without separate audio post-production. This first-of-its-kind model produces up to 20-second videos at 1080p by default or up to 4K resolution, complete with lip-synced dialogue, ambient sounds, and music—all from a single text prompt or reference image.
Unlike traditional silent video generators, ltx-video uses an asymmetric dual-stream architecture with bidirectional cross-attention to ensure audio and visuals align naturally, enabling creators and developers to build text-to-video AI models for fast workflows like marketing clips or app prototypes.
Technical Specifications
What Sets ltx-video Apart
ltx-video stands out in the competitive Lightricks text-to-video landscape through its native synchronized audio-video generation, where a 19 billion parameter model (14B for video, 5B for audio) creates clips with perfect lip sync and sound timing in one pass, allowing users to produce voiced scenes without manual editing.
This enables rapid prototyping for AI video generator API integrations, cutting production time from hours to seconds on consumer GPUs like NVIDIA 4090.
- Real-time 4K output with 50 fps support: Generates cinematic videos up to 20 seconds at true 4K/50 fps, outperforming smaller models in detail and motion consistency for professional workflows.
- Image-to-video with style preservation: Inputs a reference image to maintain lighting, composition, and style while adding motion and audio, ideal for extending static assets into dynamic content.
- Quantized efficiency for consumer hardware: FP8 quantization reduces size by 30% and doubles speed without quality loss, supporting high-res generation on 32GB+ VRAM setups.
- Advanced controls like depth, pose, and LoRAs: IC-LoRAs for precise motion guidance and fine-tuning in under an hour, enabling custom styles or video extensions.
Key Considerations
Resource Requirements: Higher values for parameters like steps or larger target sizes (e.g., 1024px) require more computational resources. Adjust these values based on available capacity.
Aspect Ratio and Target Size: Selecting mismatched aspect ratios and target sizes may lead to visual distortions or cropping issues.
Seed Value: Using a fixed seed ensures repeatability. Changing the seed generates diverse outputs for experimentation.
Tips & Tricks
How to Use ltx-video on Eachlabs
Access ltx-video seamlessly through Eachlabs Playground for instant testing with text prompts, optional reference images, duration up to 20 seconds, and resolution settings from 1080p to 4K. Integrate via API or SDK for production apps, specifying parameters like frame rate (15-50 fps), audio sync, and LoRA controls to output MP4 videos with embedded synchronized sound in seconds.
---Capabilities
High-Quality Video Generation with LTX-Video: Create visually appealing videos based on detailed prompts.
Customizability: Tailor outputs to specific artistic needs using an extensive range of inputs.
Consistency Across Frames: Ensures smooth transitions and coherence within video sequences.
What Can I Use It For?
Use Cases for ltx-video
Content creators producing social media reels can input a text prompt like "A barista pouring espresso into a white cup in slow motion, with steaming foam rising and soft cafe jazz playing," generating a 10-second 1080p clip with synced pouring sounds and music in seconds—perfect for quick viral content without audio libraries.
Marketers building text-to-video AI campaigns feed product images plus prompts for dynamic ads, such as placing a smartphone on a futuristic desk with glowing effects and explanatory voiceover, maintaining brand style while adding motion and narration for e-commerce promotions.
Developers integrating ltx-video API into apps use image-to-video mode with pose or depth controls to create avatar videos from photos, ensuring consistent identity and lip-synced speech for personalized user experiences in virtual assistants or training tools.
Filmmakers extending footage leverage sequence conditioning and LoRAs for seamless video prolongation, inputting initial frames to generate longer scenes with matching motion, lighting, and ambient audio, streamlining pre-visualization for studio pipelines.
Things to Be Aware Of
Experiment with Seeds:
- Generate diverse results by modifying the seed value while keeping other inputs constant.
Adjusting Cfg and Steps:
- Balance between creativity and fidelity by experimenting with cfg values and steps.
Combining Prompts and Images:
- Use both textual prompts and reference images for more nuanced outputs.
Negative Prompts:
- Improve output quality by excluding undesirable features like "low resolution" or "noisy."
Optimizing Aspect Ratio and Target Size:
- For cinematic content, combine 16:9 with 1024px. For social media stories, use 9:16 and 512px.
Limitations
Processing Time for LTX-Video: Higher parameter values can significantly increase the time required to generate outputs. Plan accordingly for complex projects.
Detail Complexity: Excessively complex prompts may overwhelm the model, leading to inconsistent results.
Aspect Ratio Compatibility: Outputs may appear distorted if aspect ratio and target size are mismatched.
Output Format: MP4
Pricing
Pricing Detail
This model runs at a cost of $0.001080 per second.
The average execution time is 21 seconds, but this may vary depending on your input data.
The average cost per run is $0.022680
Pricing Type: Execution Time
Cost Per Second means the total cost is calculated based on how long the model runs. Instead of paying a fixed fee per run, you are charged for every second the model is actively processing. This pricing method provides flexibility, especially for models with variable execution times, because you only pay for the actual time used.
Related AI Models
You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.
