lightricks/ltx
Lightricks' open video model family. Focused on accessibility and speed in video creation.Readme
ltx by Lightricks — AI Model Family
The ltx family from Lightricks represents an advanced open-source collection of AI models specializing in video generation, with a strong emphasis on synchronized audio-video output for accessible and rapid content creation. Developed by Lightricks, known for apps like Facetune and Videoleap, ltx models solve the challenge of producing high-quality, multimodal videos efficiently, enabling creators to generate dynamic clips from text, images, or control signals without complex setups. This family centers on the flagship LTX-2 model and related variants like LTX-Video, encompassing text-to-video, image-to-video, and conditioned generation modes such as pose-to-video, depth-to-video, and canny-to-video, all unified under a single Diffusion Transformer (DiT) architecture with 19 billion parameters.
LTX-2 stands as the core of the family, supporting seamless audiovisual generation in one framework, while LTX-Video provides foundational text-to-video capabilities. These models prioritize practical video workflows, from quick prototypes to professional edits, making them ideal for individual creators, marketers, and teams scaling video production.
ltx Capabilities and Use Cases
The ltx family excels in versatile video generation modes, all featuring native synchronized audio, expressive lip sync, natural motion, and efficient performance. Key models include LTX-2 for multimodal inputs and LTX-Video focused on text-to-video, with distilled variants for faster inference.
-
Text-to-Video (LTX-2 and LTX-Video): Generate complete videos with audio from descriptive prompts. Supports up to 4K resolution at 50fps in Fast, Pro, and Ultra modes. Use case: Marketing teams creating social media ads. Example prompt: "A bustling city street at dusk with neon lights reflecting on wet pavement, people walking with umbrellas, accompanied by upbeat electronic music and distant traffic sounds."
-
Image-to-Video (LTX-2): Animate static images into videos with synced audio, preserving details like lip sync for talking heads. Ideal for product demos or storyboarding extensions. Distilled versions optimize speed without quality loss.
-
Pose-to-Video, Depth-to-Video, and Canny-to-Video (LTX-2): Conditioned generation using pose guidance, depth maps, or edge detection for precise control over motion, structure, and spatial awareness. Perfect for character animation or scene reconstruction.
These models integrate flexibly into pipelines—for instance, start with text-to-video in LTX-Video to create a base clip, then use LTX-2's image-to-video to extend it with keyframes or poses for refined motion. Technical specs include video dimensions divisible by 32 (width/height), frame counts divisible by 8 plus 1, and full trainability for custom LoRAs, enabling tailored adaptations. Workflows shine in ComfyUI via dedicated nodes or Hugging Face Diffusers for programmatic access, supporting creative applications like personalized videos, campaign assets, and training content.
What Makes ltx Stand Out
ltx distinguishes itself through its unified DiT architecture, which generates synchronized video and audio in a single pass, ensuring temporal coherence and reducing artifacts common in separate modality models. With 19 billion parameters, LTX-2 delivers cinematic quality—expressive lip sync, dynamic natural motion, and high fidelity up to 4K/50fps—while maintaining speed via distilled variants for real-time use.
Key strengths include exceptional controllability with poses, depth, edges, and canny inputs for structural consistency; multimodal flexibility from text, images, or controls; and efficiency for practical deployment. Unlike fragmented tools, ltx enables seamless pipeline creation, such as combining image-to-video with LoRAs for stylized effects like "squish" animations. It's fully open-source under a community license, trainable for customization, and optimized for creative consistency in motion and audio.
This family suits content creators, indie filmmakers, marketers, and enterprises needing scalable video production— from solo artists prototyping ideas to teams generating personalized campaigns or explainer videos with version control and rapid iterations.
Access ltx Models via each::labs API
each::labs is the premier platform for integrating the full ltx family, offering unified API access to LTX-2, LTX-Video, and all variants through a single endpoint. Effortlessly deploy text-to-video, image-to-video, or conditioned modes in your apps, with support for Playground for interactive testing and SDKs for seamless Python or JavaScript workflows.
Build production-grade pipelines on eachlabs.ai, leveraging ltx's speed and quality without infrastructure hassles. Sign up to explore the full ltx model family on each::labs.