How do I use Wan v2.2 A14B Image to Video via API?

Wan v2.2 A14B Image to Video is available through the eachlabs unified API. Submit a source image with optional motion or style parameters; the model returns an animated video. Billing is pay-as-you-go through eachlabs no Alibaba account is required.

How does Wan v2.2 A14B Image to Video differ from the Turbo variant?

Wan v2.2 A14B Image to Video delivers maximum output quality at standard inference speed. The Turbo variant uses accelerated inference for faster generation at a slight quality trade-off. Choose the standard A14B for high-fidelity production work; choose Turbo for speed-sensitive or high-volume pipelines.

Example inputhover

image_url
prompt: "A cinematic video of a basketball player preparing to shoot. The camera view is steady and locked. The player lifts the ball smoothly with natural arm motion, while his head and face remain stable, consistent, and lifelike. Subtle body shifts and ball movement show dynamic energy, but facial details stay sharp and unchanged. Bright daylight and soft shadows enhance realism without distracting from the action."
num_frames: 81
frames_per_second: 16
resolution: "720p"
aspect_ratio: "auto"
num_inference_steps: 27
enable_safety_checker: true
enable_prompt_expansion: false
acceleration: "regular"
guidance_scale: 3.5
guidance_scale_2: 3.5
shift: 5
interpolator_model: "film"
num_interpolated_frames: 1
adjust_fps_for_interpolation: true

Wan v2.2 A14B · Image to Video

Video·wan-v2.2·by Alibaba

Transforms static images into dynamic short videos with natural movement and sharp detail.

Try it now →

API reference

Runtime (p50): 1m
Estimated price: From $0.04

Call the API

prediction.sh

curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "wan-v2-2-a14b-image-to-video",
    "version": "0.0.1",
    "input": {
        "image_url": "https://storage.googleapis.com/magicpoint/inputs/wan-v2.2-a14b-i2v-inputt.jpg",
        "prompt": "A cinematic video of a basketball player preparing to shoot. The camera view is steady and locked. The player lifts the ball smoothly with natural arm motion, while his head and face remain stable, consistent, and lifelike. Subtle body shifts and ball movement show dynamic energy, but facial details stay sharp and unchanged. Bright daylight and soft shadows enhance realism without distracting from the action.",
        "num_frames": 81,
        "frames_per_second": 16,
        "resolution": "720p",
        "aspect_ratio": "auto",
        "num_inference_steps": 27,
        "enable_safety_checker": true,
        "enable_prompt_expansion": false,
        "acceleration": "regular",
        "guidance_scale": 3.5,
        "guidance_scale_2": 3.5,
        "shift": 5,
        "interpolator_model": "film",
        "num_interpolated_frames": 1,
        "adjust_fps_for_interpolation": true
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/

Documentation8 sections

Overview
wan-v2.2-a14b-image-to-video — Image-to-Video AI Model

Developed by Alibaba as part of the wan-v2.2 family, wan-v2.2-a14b-image-to-video transforms static images into dynamic short videos with natural movement and sharp detail. This image-to-video AI model solves a critical creative challenge: converting single product photos, portraits, or concept art into compelling video content without manual frame-by-frame animation or expensive video production workflows. The model excels at generating cinematic-quality motion from a single image input, making it ideal for creators and developers building AI video generation tools that demand both speed and visual fidelity.

The wan-v2.2 architecture introduces a Mixture-of-Experts (MoE) design that splits the diffusion denoising process across specialized pathways, dramatically increasing model capacity without raising inference costs. This efficiency gain means faster processing on consumer hardware while maintaining the aesthetic precision that distinguishes professional video output from generic AI synthesis.
Capabilities
- Transforms static images into short, dynamic videos with natural-looking motion and sharp detail.
- Supports both image-to-video and text-to-video generation, offering flexibility in creative workflows.
- Delivers fluid camera movements and object motion, though with some variability in control.
- Maintains good temporal consistency and reduces flickering compared to baseline models.
- Open-source and customizable, suitable for research and professional applications.
- Efficient MoE architecture allows for high parameter counts without proportionally increasing inference cost.
- Integrates well with existing AI video tools and pipelines for enhanced video editing and generation.
Use cases
Use Cases for wan-v2.2-a14b-image-to-video

E-Commerce Product Visualization: Marketing teams can feed product photos plus a text prompt like "rotate the watch slowly under studio lighting, showing the dial and band details" to generate short product videos for social media and storefronts. This eliminates expensive product photography sessions and enables rapid A/B testing of different visual angles and lighting scenarios without reshooting.

Content Creator Workflow Acceleration: Designers and video editors working on short-form content can use wan-v2.2-a14b-image-to-video to convert static concept art, storyboard frames, or mood boards into animated sequences. The model's cinematic motion generation preserves artistic intent while automating the tedious keyframing process, reducing production time from hours to minutes.

Developers Building AI Video Platforms: Engineers integrating image-to-video capabilities into their applications benefit from the model's efficient MoE architecture and consumer GPU compatibility. Developers can deploy wan-v2.2-a14b-image-to-video as a core feature in AI image editor APIs or video generation platforms without requiring enterprise-grade GPU clusters, lowering infrastructure costs while maintaining professional output quality.

Portrait and Character Animation: Content creators can transform headshots or character illustrations into subtle animated videos with natural head movement and expression shifts. A prompt like "gentle head turn left, soft smile, natural eye movement" generates realistic motion that brings static portraits to life for streaming, presentations, or social media content.
Tips & tricks
How to Use wan-v2.2-a14b-image-to-video on Eachlabs

Access wan-v2.2-a14b-image-to-video through Eachlabs via the Playground for interactive testing or the API for production integration. Provide a static image and optional text prompt describing desired motion, select your output resolution (720p or 1080p) and duration (5s, 10s, or 15s), and the model generates synchronized video output ready for immediate use. The API supports batch processing, making it efficient for high-volume content creation workflows.
---END_CONTENT---
Technical spec
What Sets wan-v2.2-a14b-image-to-video Apart

Aesthetic-Driven Cinematic Generation: Unlike generic image-to-video models, wan-v2.2-a14b-image-to-video harnesses meticulously labeled aesthetic data covering lighting, composition, contrast, and color tone. This enables precise control over cinematic-style motion, allowing users to generate videos that maintain professional visual consistency rather than producing jarring or unnatural movement artifacts.

Massive Training Scale: The model was trained on 65% more images and 83% more videos than its predecessor, delivering superior motion, semantic, and aesthetic generalization. This extensive training translates to better handling of diverse image types—from product photography to fine art—without quality degradation across different visual styles.

Consumer GPU Accessibility: The compact TI2V-5B variant runs at 720p/24 fps on consumer-grade GPUs such as the RTX 4090, with a 16×16×4 compression ratio VAE. This democratizes high-quality image-to-video generation, enabling developers and creators to deploy the model locally without expensive cloud infrastructure, while the full A14B checkpoint delivers enhanced quality for production workflows.

Technical Specifications:
- Resolution: 720p and 1080p output support
- Duration: 5s, 10s, or 15s video generation
- Input: Static images with optional text prompts for motion direction
- Output: MP4 video with audio synchronization capability
Things to be aware of
- The model is resource-intensive, requiring high-end GPUs and significant VRAM for the 14B variant.
- Generation times are long compared to smaller models, which may limit real-time or batch applications.
- Motion and camera control are influenced by prompts but not fully deterministic; results can vary.
- Output frame rate is typically 16 fps, which is lower than some commercial alternatives.
- The model is open-source, offering flexibility but also requiring more setup and maintenance than turnkey solutions.
- Users report that the model handles complex scenes and textures well, but may struggle with very fine details or highly specific motions.
- Positive feedback highlights the natural-looking motion and sharp detail in outputs, especially compared to earlier models.
- Some users note occasional artifacts or inconsistencies, particularly in longer generations or with less optimal prompts.
- The community values the model’s adaptability and the ability to integrate it into custom workflows.
Key considerations
- High VRAM requirement: The 14B model needs at least 20GB of GPU memory, making it suitable only for high-end hardware.
- Generation time: Video synthesis can take over an hour on powerful GPUs, so plan for longer processing times compared to smaller models.
- Quality vs. speed: The 14B model offers higher quality but is slower; a 5B variant is faster and less resource-intensive but produces slightly lower quality output.
- Prompt engineering: Describing desired motion and camera movements in the prompt can influence results, but precise control is not guaranteed.
- Best practices: Keep ComfyUI (or your chosen interface) updated, ensure all required model files are correctly installed, and use high-quality source images for best results.
- Common pitfalls: Inconsistent motion, occasional artifacts, and limited control over specific camera angles or object movements are noted by users.
- Iterative refinement: Multiple generations with adjusted prompts or parameters may be needed to achieve desired results.
Limitations
- High computational and memory requirements limit accessibility for users without powerful hardware.
- Limited control over precise motion and camera angles; results can be somewhat unpredictable.
- Output is generally restricted to short clips at 720p resolution, with a frame rate lower than some commercial alternatives.
- The model may produce artifacts or inconsistencies, especially in complex or ambiguous scenes.
- Not optimized for real-time or interactive applications due to long generation times.
- While open-source and flexible, it requires technical expertise to deploy and tune effectively.

Related models

4 models

Bytedance Seedance 2.0 · Image to Video AI model preview

Bytedance Seedance 2.0 · Image to VideoBytedance

Skyreels v4 · Image to Video AI model preview

Skyreels v4 · Image to VideoSkywork AI

Ltx v2.3 · LipsyncLTX

P Video AnimatePruna AI

* FAQ

About Wan v2.2 A14B · Image to Video

01 / 03

What is Wan v2.2 A14B Image to Video?

Wan v2.2 A14B Image to Video is a 14-billion parameter image-to-video model by Alibaba that generates high-quality animated video from still images. Its large model scale enables detailed motion generation, consistent scene dynamics, and realistic visual output for professional video production.

Wan v2.2 A14B · Image to Video