Runway | Act-Two

Runway Act-Two turns performance videos into realistic character animations by transferring gestures and expressions.

Avg Run Time: 200.000s

Model Slug: runway-act-two

Category: Image to Video

Input

Character Type*

Character URL*

Enter an URL or choose a file from your computer.

Click to upload or drag and drop

(Max 50MB)

Reference URL

Enter an URL or choose a file from your computer.

Click to upload or drag and drop

(Max 50MB)

Aspect Ratio*

Advanced Controls

Output

Example Result

Preview and download your result.

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Table of Contents

Overview

Technical Specifications

Key Considerations

Tips & Tricks

Capabilities

What Can I Use It For?

Things to Be Aware Of

Limitations

Overview

Runway Act-Two is an advanced AI model developed by Runway, designed to transform short performance videos into highly realistic character animations by transferring gestures, facial expressions, and body movements from a source video (the "driving performance") to a target character (provided as an image or video). The model is part of Runway's next-generation suite of AI video tools and is positioned as an evolution of their earlier Act models, offering significant improvements in expressive fidelity and animation realism.

Key features of Act-Two include full-body performance transfer, detailed facial and hand gesture mapping, and the ability to animate both static images and reference videos as target characters. The model is engineered to democratize high-fidelity animation, making it accessible for creators, animators, and professionals who need to generate animated character performances without traditional motion capture setups. Act-Two is notable for its ability to add plausible environmental motion when animating from a single image, helping to avoid unnatural "floating" effects.

The underlying technology leverages advanced deep learning architectures for pose estimation, facial expression transfer, and motion synthesis. Act-Two is tightly integrated with Runway's Gen-4 video toolset, supporting a range of aspect ratios and resolutions, and is accessible via API for automated workflows. Its unique combination of expressive fidelity, flexibility in character input, and ease of use distinguishes it from other AI animation tools on the market.

Technical Specifications

Architecture: Advanced deep learning model for pose, gesture, and expression transfer (specific architecture details not publicly disclosed)
Parameters: Not publicly specified
Resolution: Supports 1280×720 (16:9), 720×1280 (9:16), 960×960 (1:1), and other preset aspect ratios
Input/Output formats: Inputs are short video clips (driving performance) and character reference (image or video); outputs are animated video sequences at 24 FPS with auto-cropping to match aspect ratio
Performance metrics: Optimized for short clips (3–30 seconds); best results when source and target are similarly framed and oriented

Key Considerations

Ensure the driving performance and character reference face the same general direction and occupy similar screen space for optimal results
The model is optimized for short clips (minimum 3 seconds, typically under 30 seconds); longer sequences may require chunking or traditional motion capture
Inputs with extreme perspective mismatches, low resolution, or distant subjects can degrade output quality
Highly complex scenes (multiple actors, heavy occlusion, ultra-stylized references) may introduce artifacts such as jitter or incorrect hand poses
Manual cleanup or hybrid workflows (e.g., light rotoscoping) may be necessary for professional-grade results
Content moderation is enforced; flagged or non-compliant content may be rejected or result in account restrictions
Quality and speed trade-off: higher fidelity may require more processing time, especially for high-resolution outputs

Tips & Tricks

Use well-lit, high-resolution driving performance videos with clear, unobstructed gestures and expressions for best transfer fidelity
Align the character reference and driving video in terms of pose, orientation, and scale to minimize artifacts
For image-based character references, use images with neutral backgrounds and clear facial features to improve animation quality
Adjust gesture influence settings when animating from images to fine-tune the expressiveness of the output
Break longer scenes into shorter segments and process them individually to maintain consistency and quality
Experiment with different character references to achieve varied stylistic results; subtle changes in the reference can significantly affect the animation
Review outputs for hand and facial artifacts, especially in scenes with rapid or complex movements, and plan for manual correction if needed

Capabilities

Transfers full-body, facial, and hand gestures from a driving video to a character reference with high expressive fidelity
Animates both static images and video references as target characters
Adds plausible environmental motion to image-based characters to avoid static or floating effects
Supports multiple aspect ratios and resolutions suitable for social media, film, and professional workflows
Delivers high-quality, realistic character animations suitable for prototyping, short-form content, and creative projects
Flexible input options and API integration enable automated and scalable animation pipelines

What Can I Use It For?

Rapid prototyping of animated character performances for film, TV, and advertising
Creating animated avatars or digital doubles for virtual production and live streaming
Generating expressive character animations for video games, VR/AR experiences, and interactive media
Producing short-form animated content for social media, marketing, and brand storytelling
Enabling artists and creators to animate illustrations or concept art without traditional rigging or motion capture
Academic research and experimentation in AI-driven animation and performance transfer
Personal creative projects, such as animating portraits or bringing static characters to life

Things to Be Aware Of

Some users report that the model excels with solo performances but may struggle with multi-person scenes or heavy occlusion
Artifacts such as jitter, incorrect hand poses, or expression mismatches can occur in challenging inputs or with highly stylized references
The model is not a full replacement for traditional motion capture in high-end, precision-critical workflows (e.g., feature films with multiple interacting actors)
Resource requirements are moderate; processing time increases with resolution and clip length
Consistency across long sequences may require careful planning and post-processing
Positive feedback highlights the model’s ease of use, expressive fidelity, and ability to animate from a single image
Some concerns include occasional moderation rejections, need for manual cleanup, and limitations with complex or long-duration scenes

Limitations

Optimized for short clips (3–30 seconds); not suitable for long-form or feature-length animation without segmentation
May produce artifacts or reduced quality with complex scenes, multiple actors, or highly stylized references
Not a full substitute for traditional motion capture in scenarios requiring sub-millimeter accuracy or precise physical interactions

Related AI Models

You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.

Image to Video

Generates realistic talking videos by combining an input image and an audio file. Lip-syncs the character naturally to match the voice, producing smooth and lifelike results.