I2V
Images turn into clear, consistent videos with Minimax Hailuo I2V-01, delivering stable and reliable results.
Official Partner
Avg Run Time: 240.000s
Model Slug: minimax-i2v-01
Playground
Input
Enter a URL or choose a file from your computer.
Invalid URL.
(Max 50MB)
Output
Example Result
Preview and download your result.
API & SDK
Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
Readme
Overview
Minimax Hailuo I2V-01 is a text-and-image-driven video generation model. It synthesizes short video clips based on an input prompt and a reference image, with the goal of creating realistic or stylized animated scenes. The process starts from a static image, which is animated according to the semantic intent of the text prompt. This allows for dynamic, visually engaging outputs that match creative or narrative goals defined by the user.
Technical Specifications
Input Combination: Combines a static image with a text prompt to guide motion and scene dynamics
Motion Interpretation: Visual motion is inferred based on both the prompt and visual elements of the input image
Visual Consistency: Preserves key features of the input image, including composition, characters, colors, and style
Temporal Coherence: Generates smooth and stable frame transitions across the video
Rendering Style: Naturalistic or stylized depending on the input image and the phrasing of the prompt
Image Size Handling: Automatically resizes and centers images to match expected dimensions while preserving key content
Key Considerations
Image Content Quality
The better the image clarity and composition, the more stable and visually appealing the output becomes.
Prompt-Image Alignment
Minimax Hailuo I2V-01 performs best when the image and prompt are semantically aligned. For example, an image of a person should be paired with prompts that describe their motion, mood, or interaction with the environment.
Motion Simplicity
Avoid overly complex or contradictory motion descriptions. Stick to short descriptions like "walking through a field" or "looking up at the sky" for optimal motion rendering.
Repetitive Elements
Do not use redundant phrasing (e.g., “a boy running running running fast fast fast”), as this causes unstable animations.
Use of prompt_optimizer
Enable it when using short prompts or general phrases. Disable it if the prompt is already detailed or custom-stylized.
Legal Information for Minimax Hailuo I2V-01
By using this Minimax Hailuo I2V-01, you agree to:
Minimax: Privacy Policy
Minimax: Terms of Service
Tips & Tricks
prompt
- Recommended word count: 10–20 words
- Include verbs and motion cues: such as "runs", "spins", "jumps", "waves", "floats", etc.
- Avoid abstract concepts: Use specific and visualizable scenarios like "a girl dances in the rain" rather than "freedom in nature".
first_frame_image
- Use clear and focused images as the starting point.
- Background should be consistent with the prompt if the animation includes environmental motion (e.g., wind, walking).
- Recommended format: .png or .jpg
- Ideal resolution: around 512x512 or 768x768. Avoid extreme crops.
prompt_optimizer
- True (enabled): When using generic prompts like "a man walking in a forest", Minimax Hailuo I2V-01 automatically enhances and expands the motion semantics for better animation.
- False (disabled): When using precise, manually-tuned prompts. Disable to preserve exact input phrasing and structure.
Guideline:
- If unsure, start with prompt_optimizer = true and compare with false to see which aligns best with your use case.
Capabilities
Generates short, coherent video clips based on static images and text.
Can animate natural movements such as walking, waving, looking around, etc.
Preserves visual identity and style of the input image throughout the video.
Capable of generating realistic, stylized, or cinematic mini-scenes depending on the input.
What Can I Use It For?
Creating animated visual stories from illustrations or portraits
Generating character motion samples for creative content
Making short cinematic loops from digital artworks
Adding motion to still AI-generated portraits or scenes
Enhancing storytelling in digital or multimedia presentations
Things to Be Aware Of
Upload a landscape photo and use a prompt like:
"a deer slowly walking through the misty forest"
Upload a stylized portrait and use a prompt like:
"a woman blinking and looking around as her hair flows in the wind"
Try prompt variations with movement direction:
"walking toward the camera", "looking to the left", "turning around slowly"
Limitations
Cannot handle long or multi-scene narratives.
Sound, dialogue, or multi-character interaction is not supported.
May produce blurry motion if the prompt is vague or the input image is unclear.
Does not support video input or frame-by-frame animation control.
Limited understanding of complex physics or abstract choreography.
Output Format: MP4
Pricing
Pricing Detail
This model runs at a cost of $0.43 per execution.
Pricing Type: Fixed
The cost remains the same regardless of which model you use or how long it runs. There are no variables affecting the price. It is a set, fixed amount per run, as the name suggests. This makes budgeting simple and predictable because you pay the same fee every time you execute the model.
Related AI Models
You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.
