Runway | Act-Two
Runway Act-Two turns performance videos into realistic character animations by transferring gestures and expressions.
Avg Run Time: 200.000s
Model Slug: runway-act-two
Category: Image to Video
Input
Enter an URL or choose a file from your computer.
Click to upload or drag and drop
(Max 50MB)
Enter an URL or choose a file from your computer.
Click to upload or drag and drop
(Max 50MB)
Output
Example Result
Preview and download your result.
Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
Overview
Runway Act-Two is an advanced AI model developed by Runway, designed to transform short performance videos into highly realistic character animations by transferring gestures, facial expressions, and body movements from a source video (the "driving performance") to a target character (provided as an image or video). The model is part of Runway's next-generation suite of AI video tools and is positioned as an evolution of their earlier Act models, offering significant improvements in expressive fidelity and animation realism.
Key features of Act-Two include full-body performance transfer, detailed facial and hand gesture mapping, and the ability to animate both static images and reference videos as target characters. The model is engineered to democratize high-fidelity animation, making it accessible for creators, animators, and professionals who need to generate animated character performances without traditional motion capture setups. Act-Two is notable for its ability to add plausible environmental motion when animating from a single image, helping to avoid unnatural "floating" effects.
The underlying technology leverages advanced deep learning architectures for pose estimation, facial expression transfer, and motion synthesis. Act-Two is tightly integrated with Runway's Gen-4 video toolset, supporting a range of aspect ratios and resolutions, and is accessible via API for automated workflows. Its unique combination of expressive fidelity, flexibility in character input, and ease of use distinguishes it from other AI animation tools on the market.
Technical Specifications
- Architecture: Advanced deep learning model for pose, gesture, and expression transfer (specific architecture details not publicly disclosed)
- Parameters: Not publicly specified
- Resolution: Supports 1280×720 (16:9), 720×1280 (9:16), 960×960 (1:1), and other preset aspect ratios
- Input/Output formats: Inputs are short video clips (driving performance) and character reference (image or video); outputs are animated video sequences at 24 FPS with auto-cropping to match aspect ratio
- Performance metrics: Optimized for short clips (3–30 seconds); best results when source and target are similarly framed and oriented
Key Considerations
- Ensure the driving performance and character reference face the same general direction and occupy similar screen space for optimal results
- The model is optimized for short clips (minimum 3 seconds, typically under 30 seconds); longer sequences may require chunking or traditional motion capture
- Inputs with extreme perspective mismatches, low resolution, or distant subjects can degrade output quality
- Highly complex scenes (multiple actors, heavy occlusion, ultra-stylized references) may introduce artifacts such as jitter or incorrect hand poses
- Manual cleanup or hybrid workflows (e.g., light rotoscoping) may be necessary for professional-grade results
- Content moderation is enforced; flagged or non-compliant content may be rejected or result in account restrictions
- Quality and speed trade-off: higher fidelity may require more processing time, especially for high-resolution outputs
Tips & Tricks
- Use well-lit, high-resolution driving performance videos with clear, unobstructed gestures and expressions for best transfer fidelity
- Align the character reference and driving video in terms of pose, orientation, and scale to minimize artifacts
- For image-based character references, use images with neutral backgrounds and clear facial features to improve animation quality
- Adjust gesture influence settings when animating from images to fine-tune the expressiveness of the output
- Break longer scenes into shorter segments and process them individually to maintain consistency and quality
- Experiment with different character references to achieve varied stylistic results; subtle changes in the reference can significantly affect the animation
- Review outputs for hand and facial artifacts, especially in scenes with rapid or complex movements, and plan for manual correction if needed
Capabilities
- Transfers full-body, facial, and hand gestures from a driving video to a character reference with high expressive fidelity
- Animates both static images and video references as target characters
- Adds plausible environmental motion to image-based characters to avoid static or floating effects
- Supports multiple aspect ratios and resolutions suitable for social media, film, and professional workflows
- Delivers high-quality, realistic character animations suitable for prototyping, short-form content, and creative projects
- Flexible input options and API integration enable automated and scalable animation pipelines
What Can I Use It For?
- Rapid prototyping of animated character performances for film, TV, and advertising
- Creating animated avatars or digital doubles for virtual production and live streaming
- Generating expressive character animations for video games, VR/AR experiences, and interactive media
- Producing short-form animated content for social media, marketing, and brand storytelling
- Enabling artists and creators to animate illustrations or concept art without traditional rigging or motion capture
- Academic research and experimentation in AI-driven animation and performance transfer
- Personal creative projects, such as animating portraits or bringing static characters to life
Things to Be Aware Of
- Some users report that the model excels with solo performances but may struggle with multi-person scenes or heavy occlusion
- Artifacts such as jitter, incorrect hand poses, or expression mismatches can occur in challenging inputs or with highly stylized references
- The model is not a full replacement for traditional motion capture in high-end, precision-critical workflows (e.g., feature films with multiple interacting actors)
- Resource requirements are moderate; processing time increases with resolution and clip length
- Consistency across long sequences may require careful planning and post-processing
- Positive feedback highlights the model’s ease of use, expressive fidelity, and ability to animate from a single image
- Some concerns include occasional moderation rejections, need for manual cleanup, and limitations with complex or long-duration scenes
Limitations
- Optimized for short clips (3–30 seconds); not suitable for long-form or feature-length animation without segmentation
- May produce artifacts or reduced quality with complex scenes, multiple actors, or highly stylized references
- Not a full substitute for traditional motion capture in scenarios requiring sub-millimeter accuracy or precise physical interactions
Related AI Models
You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.