MOCHI
Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in a preliminary evaluation.
Avg Run Time: 261.000s
Model Slug: mochi-1
Playground
Input
Output
Example Result
Preview and download your result.
API & SDK
Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
Readme
Overview
mochi-1 — Text to Video AI Model
Developed by Genmo as part of the mochi family, mochi-1 is a state-of-the-art open-source text-to-video AI model that generates high-fidelity short videos with exceptional prompt adherence and smooth motion dynamics. This 10-billion-parameter diffusion model, powered by the Asymmetric Diffusion Transformer (AsymmDiT) architecture, excels in producing photorealistic clips up to 5.4 seconds at 30 fps and 480p resolution (640x480), making it ideal for creators seeking precise Genmo text-to-video outputs without proprietary limitations. Whether you're exploring text-to-video AI model options or building custom workflows, mochi-1 stands out for its customizable fine-tuning on personal videos, bridging open-source accessibility with professional-grade results.
Technical Specifications
What Sets mochi-1 Apart
mochi-1 differentiates itself through its AsymmDiT architecture and AsymmVAE compression (128:1 ratio), enabling efficient high-fidelity video synthesis that outperforms many closed models in motion consistency during bold camera moves like pans and orbits. This allows developers and creators to generate steady, shape-preserving videos quickly via command line or Gradio UI, ideal for mochi-1 API integrations in production pipelines. Its intuitive fine-tuning process supports training on user videos, delivering tailored outputs for unique styles that generic models can't match. Additionally, mochi-1 produces 162-frame clips at 480p with strong prompt precision, supporting artistic filters and nuanced motion for expressive storytelling in advertising or concept art.
- AsymmDiT for top-tier efficiency: Handles complex dynamics like steady camera tracks, enabling seamless text-to-video AI model generation without shaking or distortion.
- Custom fine-tuning: Train on your own videos for personalized high-fidelity results, perfect for bespoke creative experiments.
- Prompt adherence and compression: Ensures accurate, fast outputs at 30 fps up to 5.4 seconds, with Apache 2.0 openness for full customization.
Key Considerations
- Frame Count Limitations: Mochi-1 supports a range of 30-170 frames. Exceeding these limits may result in errors or degraded performance.
- Frame Rate (FPS): Set between 10-60 FPS for smooth playback. Higher FPS values require additional computational power.
- Guidance Scale: Ranges from 1 to 10, controlling the adherence to the textual prompt. Extreme values may reduce output quality.
- Prompt Strength: Adjusted between 0-1, impacting the influence of image-based prompts relative to text.
- Seed Consistency: The seed value determines output reproducibility. Keep it consistent for identical results across runs.
Tips & Tricks
How to Use mochi-1 on Eachlabs
Access mochi-1 seamlessly through Eachlabs' Playground for instant text-to-video generation—enter a detailed prompt, adjust styles or motion via sliders, and generate 480p clips up to 5.4 seconds at 30 fps. Integrate via the mochi-1 API or SDK for scalable apps, supporting fine-tuning inputs and outputs in MP4 format with high prompt fidelity. Eachlabs delivers fast, customizable access to this Genmo powerhouse without setup hassles.
---Capabilities
- Text-to-Video: Mochi-1 converts descriptive text into high-quality video clips.
- Customizable Parameters: Provides extensive control over frame count, prompt strength, FPS, and more.
- Reproducibility: Seed control enables consistent outputs for the same configuration.
- Dynamic Visuals: Smooth transitions and coherent sequences.
What Can I Use It For?
Use Cases for mochi-1
Content creators can leverage mochi-1's artistic filters and motion details to animate static images into dynamic sequences, such as converting a product photo into a looping ad clip with smooth pans—saving hours on manual editing for social media campaigns.
Developers building Genmo text-to-video apps fine-tune mochi-1 on branded footage to generate consistent video assets, like "a sleek smartphone orbiting on a futuristic desk with neon glows," ensuring prompt-aligned outputs for e-commerce demos without external tools.
Marketers use its camera motion handling for professional shorts, inputting prompts like "a dancing peacock in a neon jungle with steady zoom-in," to craft vibrant, watermark-free visuals for ads that maintain object integrity across frames.
Designers experiment with its 480p 5.4-second clips for storyboarding, fine-tuning on reference videos to produce nuanced, expressive animations tailored to concept art needs in advertising workflows.
Things to Be Aware Of
- Creative Storytelling: Use vivid and imaginative prompts to craft compelling narratives.
- Dynamic Compositions: Experiment with various FPS and frame counts to suit different styles.
- Prompt Strength Balance: Adjust the image and text prompt strengths for hybrid inspirations.
- Reproducibility: Use a fixed seed to iterate on a consistent baseline.
Limitations
- Prompt Sensitivity: Ambiguous or overly complex prompts may result in inconsistent outputs.
- Balance Challenge: Finding the ideal parameter configuration may require multiple iterations.
- Output Consistency: While seeds ensure reproducibility, varying parameter combinations may lead to unexpected results.
Output Format: MP4
Pricing
Pricing Detail
This model runs at a cost of $0.001677 per second.
The average execution time is 261 seconds, but this may vary depending on your input data.
The average cost per run is $0.437827
Pricing Type: Execution Time
Cost Per Second means the total cost is calculated based on how long the model runs. Instead of paying a fixed fee per run, you are charged for every second the model is actively processing. This pricing method provides flexibility, especially for models with variable execution times, because you only pay for the actual time used.
Related AI Models
You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.
