WAN-2.1

Wan is an advanced and powerful video generation model developed by Tongyi Lab of Alibaba Group to generate 5-second, 480p videos.

Avg Run Time: 25.000s

Model Slug: wan-2-1-1-3b

Playground

Input

Prompt*

aspect_ratio

frame_num

Advanced Controls

Output

Example Result

Preview and download your result.

Each execution costs $0.2200. With $1 you can run this model about 4 times.

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents

Overview

Technical Specifications

Key Considerations

Tips & Tricks

Capabilities

What Can I Use It For?

Things to Be Aware Of

Limitations

Overview

wan-2.1-1.3b — Text to Video AI Model

Developed by Alibaba's Tongyi Lab, wan-2.1-1.3b is a text-to-video AI model designed to generate short-form video content from natural language prompts. The model transforms text descriptions into 5-second videos at 480p resolution, enabling creators and developers to produce video content without requiring traditional filming or editing workflows. As part of the wan-2.1 family, this 1.3 billion parameter model balances generation quality with computational efficiency, making it accessible for integration into applications and creative workflows that demand fast text-to-video generation.

The primary strength of wan-2.1-1.3b lies in its efficiency-to-quality ratio. Unlike larger text-to-video models that demand substantial computational resources, wan-2.1-1.3b delivers consistent video generation with a smaller parameter footprint, reducing latency and infrastructure costs for developers building AI video generation features into their platforms.

Technical Specifications

What Sets wan-2.1-1.3b Apart

Optimized for Efficiency: The 1.3 billion parameter architecture of wan-2.1-1.3b is engineered for rapid inference without sacrificing output coherence. This makes it ideal for applications requiring real-time or near-real-time text-to-video generation, where larger models would introduce unacceptable latency.

Standardized Output Format: wan-2.1-1.3b generates videos at 480p resolution with a fixed 5-second duration, providing predictable output specifications for developers integrating the model into production systems. This consistency simplifies pipeline design and quality assurance for teams building AI video generation APIs or creative tools.

Text-Driven Video Creation: The model accepts natural language prompts as input, allowing users to describe scenes, actions, and visual styles without requiring reference images or complex parameter tuning. This accessibility makes wan-2.1-1.3b suitable for non-technical creators exploring AI-assisted video production.

Technical Specifications: Maximum output duration is 5 seconds at 480p resolution. The model processes text prompts and generates video frames sequentially, with generation speed dependent on available hardware resources. wan-2.1-1.3b is compatible with standard video formats and integrates with the Eachlabs platform for both interactive playground testing and programmatic API access.

Key Considerations

Higher frame counts improve fluidity but require more processing.
More sample steps lead to higher quality but increase generation time.
Changing aspect ratio affects the composition and framing of the video.
Sample shift modifies motion dynamics; careful tuning is recommended.

Tips & Tricks

How to Use wan-2.1-1.3b on Eachlabs

Access wan-2.1-1.3b through Eachlabs via the interactive Playground or programmatic API. Provide a text prompt describing your desired video content, and the model generates a 5-second 480p video output. The Playground interface allows real-time experimentation with different prompts and parameters, while the API enables seamless integration into production applications. Outputs are delivered in standard video formats ready for immediate use or further editing.

Capabilities

Wan 2.1-1.3B generates short AI-driven videos from textual descriptions.
Supports adjustable resolution, aspect ratio, and motion settings.
Provides control over video dynamics through sampling and guiding parameters.

What Can I Use It For?

Use Cases for wan-2.1-1.3b

Social Media Content Creators: Creators producing short-form content for TikTok, Instagram Reels, or YouTube Shorts can use wan-2.1-1.3b to rapidly prototype video ideas from text descriptions. A creator might input a prompt like "A minimalist desk workspace with a steaming coffee cup, soft morning light through a window, subtle keyboard typing sounds" to generate a 5-second establishing shot for a productivity-focused video without filming.

E-Commerce Product Visualization: Marketing teams can generate product demonstration videos by describing scenarios in text. Rather than scheduling studio shoots, a team can prompt wan-2.1-1.3b with descriptions like "A sleek wireless headphone rotating slowly on a white surface with soft studio lighting" to create quick product preview videos for listings and advertisements.

Developers Building Video Generation Features: Software engineers integrating text-to-video capabilities into their applications benefit from wan-2.1-1.3b's efficient inference and predictable output specifications. The model's smaller parameter count reduces server costs compared to larger alternatives, making it economically viable for platforms offering AI video generation to end users.

Storyboard and Concept Visualization: Filmmakers and animators can use wan-2.1-1.3b to quickly visualize narrative concepts or scene descriptions before committing to full production. The model enables rapid iteration on visual ideas, helping teams validate creative directions early in the development process.

Things to Be Aware Of

Social Media Content: Create quick video clips for TikTok, Instagram, and YouTube Shorts.

Concept Visualization: Turn written ideas into moving images for storytelling or marketing.

Artistic Experiments: Explore creative possibilities with AI-generated motion.

Limitations

Variability in Output: Small changes in parameters can lead to significantly different results.
Motion Artifacts: Some animations may appear unnatural, requiring careful tuning of sample shift and frame count.

Output Format: MP4

Pricing

Pricing Detail

This model runs at a cost of $0.22 per execution.

Pricing Type: Fixed

The cost remains the same regardless of which model you use or how long it runs. There are no variables affecting the price. It is a set, fixed amount per run, as the name suggests. This makes budgeting simple and predictable because you pay the same fee every time you execute the model.

AI TRENDS

Related AI Models

You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.

Text to Video

Kling 3.0 Pro delivers premium text-to-video generation with cinematic visuals, smooth motion, native audio, and support for multi-shot sequences.

Kling | v3 | Pro | Text to Video

200 s

Text to Video

PixVerse v5.5 generates high-quality video clips directly from text prompts, delivering smooth motion, sharp details.

Pixverse v5.5 | Text to Video

60 s

Text to Video

Kling 3.0 Standard delivers high-quality text-to-video with cinematic visuals, smooth motion, native audio, and multi-shot support.

Kling | v3 | Standard | Text to Video

260 s

Text to Video

Create high resolution, long duration cinematic scenes faithful to your script by simply entering text prompts with minimax hailuo v2 3 pro text to video.

Minimax Hailuo V2.3 | Pro | Text to Video

230 s

Explore More