How do I use Wan v2.2 A14B Text to Video Turbo via API?

Wan v2.2 A14B Text to Video Turbo is accessible via the eachlabs unified API. Provide a text prompt describing the scene or content; the model returns a generated video with accelerated processing. Billing is pay-as-you-go through eachlabs.

What is Wan v2.2 A14B Text to Video Turbo best suited for?

Wan v2.2 A14B Text to Video Turbo is best suited for high-volume text-to-video pipelines, real-time content generation platforms, and applications where rapid video output is more important than maximum quality. It is ideal for social media automation and fast-turnaround content production workflows.

Example inputhover

prompt: "A hero bursts through a metal door, sprinting forward as a massive explosion erupts behind him, fire and debris blasting outward. The camera follows in dynamic motion, showing dust and sparks flying as the blast lights up the scene. In slow motion, the hero dives forward while the fiery glow illuminates his silhouette, creating an intense cinematic escape moment."
resolution: "720p"
aspect_ratio: "16:9"
enable_safety_checker: true
enable_prompt_expansion: false
acceleration: "none"

Wan v2.2 A14B Text to Video · Turbo

Video·wan-v2.2·by Alibaba

Wan 2.2 a14b Text to Video Turbo transforms plain text descriptions into dynamic short videos. It creates realistic motion and cinematic visuals directly from text prompts.

Try it now →

API reference

Runtime (p50): 1m
Estimated price: From $0.05

Call the API

prediction.sh

curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "wan-v2-2-a14b-text-to-video-turbo",
    "version": "0.0.1",
    "input": {
        "prompt": "A hero bursts through a metal door, sprinting forward as a massive explosion erupts behind him, fire and debris blasting outward. The camera follows in dynamic motion, showing dust and sparks flying as the blast lights up the scene. In slow motion, the hero dives forward while the fiery glow illuminates his silhouette, creating an intense cinematic escape moment.",
        "resolution": "720p",
        "aspect_ratio": "16:9",
        "enable_safety_checker": true,
        "enable_prompt_expansion": false,
        "acceleration": "none"
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/

Documentation8 sections

Overview
wan-v2-2-a14b-text-to-video-turbo — Text to Video AI Model

Developed by Alibaba as part of the wan-v2.2 family, wan-v2-2-a14b-text-to-video-turbo transforms plain text prompts into dynamic short videos with realistic motion and cinematic visuals, enabling creators to produce high-quality clips without complex setups. This 14B parameter turbo variant stands out for its optimized speed and efficiency, delivering film-grade outputs ideal for rapid prototyping in text-to-video workflows. As a leading Alibaba text-to-video solution, it supports developers seeking a text-to-video AI model with turbo-fast inference for short-form content like social media reels or ads.
Capabilities
- Generates high-definition 720P videos at professional 24fps frame rates
- Supports both text-to-video and image-to-video generation in a unified framework
- Produces realistic motion and cinematic visuals from textual descriptions
- Handles complex scene compositions with multiple objects and characters
- Maintains temporal consistency across video frames
- Supports various aspect ratios and resolution configurations
- Achieves superior performance compared to leading commercial models on benchmark evaluations
- Enables efficient deployment on consumer-grade hardware through optimization techniques
- Provides flexible inference options for different computational budgets
- Supports distributed processing for enterprise-scale applications
Use cases
Use Cases for wan-v2-2-a14b-text-to-video-turbo

Content creators can use wan-v2-2-a14b-text-to-video-turbo's minute-level generation to produce full musical performance videos from text prompts describing scenes with synchronized expressions and body movements, streamlining production for YouTube shorts or TikTok series. For example, input a prompt like "A cartoon fox dancing energetically in a forest clearing at dusk, with dynamic camera pans and rustling leaves" to generate a coherent 45-second clip with natural motion.

Marketers building Alibaba text-to-video campaigns leverage its multi-format support to animate product visuals across styles, such as turning a static shoe image description into a full-body runway walk video, maintaining brand consistency without manual animation. Developers integrating the wan-v2-2-a14b-text-to-video-turbo API into apps create custom video tools for e-commerce, generating personalized promo clips from user text like dynamic unboxings in seconds.

Filmmakers experiment with its enhanced motion control for pre-visualization, crafting multi-shot sequences with precise environmental actions from text, ideal for storyboarding complex narratives efficiently.
Tips & tricks
How to Use wan-v2-2-a14b-text-to-video-turbo on Eachlabs

Access wan-v2-2-a14b-text-to-video-turbo seamlessly on Eachlabs via the Playground for instant text-to-video testing, API for scalable integrations, or SDK for custom apps. Provide a detailed text prompt, optional resolution (up to 1024x1024), and duration settings; the model outputs high-quality MP4 videos with realistic motion in turbo timeframes. Eachlabs delivers optimized inference with fp8 or bf16 variants for your workflow needs.
---
Technical spec
What Sets wan-v2-2-a14b-text-to-video-turbo Apart

The wan-v2-2-a14b-text-to-video-turbo excels in generating minute-level videos from text with superior motion control via AdaIN and CrossAttention mechanisms, enabling precise actions and environments that maintain cinematic quality. This allows users to create extended narratives up to 60 seconds in a single pass, far surpassing typical short-clip limits of competitors. It supports multi-format inputs including real people, cartoons, and animals in portrait or full-body views, with resolutions up to 1024x1024 and high metrics like FID 15.66 for identity consistency.
- Turbo-optimized 14B architecture with fp8_scaled models for low VRAM usage (around 83-89%), achieving generation times as low as 138 seconds with 4-step LoRA acceleration—perfect for wan-v2-2-a14b-text-to-video-turbo API integrations in resource-constrained environments.
- Lightning LoRA support reduces steps to 4 while preserving quality, enabling fast text-to-video AI for iterative workflows without quality loss.
- High-resolution output (512x512 to 1024x1024) and multi-resolution adaptability for diverse scenarios, from mobile clips to professional edits.
Things to be aware of
- The model requires significant computational resources, with 80GB VRAM recommended for optimal single-GPU performance
- Generation times can be substantial on consumer hardware, requiring patience for high-quality outputs
- Memory optimization techniques may impact generation quality and should be tested for specific use cases
- The model performs best with well-structured, detailed prompts rather than simple or vague descriptions
- Multi-GPU setups require careful configuration of distributed processing parameters
- Performance varies significantly across different GPU architectures and memory configurations
- The model may exhibit inconsistencies in complex scenes with multiple moving elements
- Users report excellent results for cinematic and artistic content generation
- Community feedback indicates strong performance for creative applications
- Some users note learning curve requirements for optimal prompt engineering
Key considerations
- Memory optimization is crucial for consumer hardware deployment, requiring careful use of model offloading and dtype conversion options
- The model performs best with detailed, descriptive prompts that specify visual elements, motion, and scene composition
- Generation time varies significantly based on hardware configuration, with single consumer GPUs requiring approximately 9 minutes for 5-second 720P videos
- Multi-GPU setups can dramatically reduce inference time through distributed processing techniques
- Prompt extension features are available but may be disabled for faster inference when not needed
- The model benefits from warm-up phases before achieving optimal performance metrics
- FlashAttention3 optimization is specifically available for Hopper architecture GPUs
Limitations
- Requires substantial computational resources with minimum 80GB VRAM for optimal performance without memory optimization techniques
- Limited to 5-second video duration, which may not be sufficient for longer-form content applications
- Generation times on consumer hardware can be prohibitively long for real-time or interactive applications

Related models

4 models

Kling o3 Pro · Text to VideoKling

XAI Grok Imagine · Text to Video AI model preview

XAI Grok Imagine · Text to VideoxAI

Kling v3 Standard · Text to VideoKling

Kling o3 4K · Text to Video AI model preview

Kling o3 4K · Text to VideoKling

* FAQ

About Wan v2.2 A14B Text to Video · Turbo

01 / 03

What is Wan v2.2 A14B Text to Video Turbo?

Wan v2.2 A14B Text to Video Turbo is an accelerated text-to-video generation model by Alibaba built on a 14-billion parameter architecture. It generates high-quality video from natural language prompts with significantly faster inference than the standard variant, enabling rapid video production at scale.

Wan v2.2 A14B Text to Video · Turbo