Eachlabs | AI Workflows for app builders
alibaba-happyhorse-1.0-video-edit

HAPPYHORSE-1.0

Enables advanced video editing through natural language instructions, supporting local or global modifications with up to 5 reference images while preserving the original motion dynamics.

Avg Run Time: 130.000s

Model Slug: alibaba-happyhorse-1-0-video-edit

Playground

Input

Enter a URL or choose a file from your computer.

Output

Example Result

Preview and download your result.

1080P pricing: $0.24/sec (input + output video billed, default)

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents
Overview
Technical Specifications
Key Considerations
Tips & Tricks
Capabilities
What Can I Use It For?
Things to Be Aware Of
Limitations

Overview

Alibaba | HappyHorse 1.0 | Video Edit Overview

Alibaba | HappyHorse 1.0 | Video Edit empowers users to transform existing videos using natural language instructions, enabling precise local or global modifications while maintaining original motion dynamics. Developed by Alibaba as part of the HappyHorse 1.0 family, this video-to-video model leverages a unified 15B-parameter Transformer architecture for high-quality edits at 1080p resolution. Its standout feature is single-pass joint audio-video processing, ensuring seamless lip-sync and motion preservation during edits, setting it apart in the Alibaba video-to-video landscape. Ideal for creators needing efficient video refinement without losing temporal consistency, Alibaba | HappyHorse 1.0 | Video Edit supports up to 5 reference images for guided changes, making it a top choice on platforms like each::labs for professional workflows.

Technical Specifications

Technical Specifications
  • Resolution Support: Up to 1080p in 16:9, 9:16, and 1:1 aspect ratios
  • Max Duration: 5-10 second clips
  • Input/Output Formats: Text prompts, reference images (up to 5), input videos; outputs edited videos with native audio
  • Processing Time: Approximately 38 seconds for 1080p clip on single H100 GPU (8-step denoising)
  • Architecture: 15B-parameter, 40-layer self-attention Transformer; unified pipeline for text/image-to-video with joint audio generation
  • Language Support: Multilingual lip-sync in English, Chinese, Japanese, Korean, German, French

Key Considerations

Key Considerations

Before using Alibaba | HappyHorse 1.0 | Video Edit, ensure access to high-end GPUs like H100 for optimal speed, as consumer-grade variants are in development. This model excels in human-centric edits with preserved motion, making it preferable over multi-stage pipelines for quick iterations on short clips. On each::labs, integrate via the Alibaba | HappyHorse 1.0 | Video Edit API for seamless workflows. Consider cost tradeoffs: faster inference suits production but caps at 1080p, balancing quality and efficiency against longer or higher-res alternatives. Prerequisites include clear input videos and descriptive prompts for best results.

Tips & Tricks

Tips and Tricks

Optimize prompts for Alibaba | HappyHorse 1.0 | Video Edit by specifying edit scope (local/global), reference images, and motion preservation explicitly. Use detailed descriptions of changes while referencing original elements to maintain dynamics. For lip-sync edits, include dialogue in prompts with target language.

  • Start prompts with action: "Edit the background to a sunset while keeping the dancer's exact movements."
  • Leverage references: Provide 2-3 images for style transfer, e.g., "Replace clothing with formal attire from reference image 1, preserve all motion."
  • Control intensity: Add "subtle change" or "dramatic edit" to fine-tune modifications without disrupting audio sync.

Chain generations on each::labs for iterative refinements, testing short clips first to iterate prompts efficiently. Avoid vague terms; specificity boosts output fidelity in this unified pipeline.

Capabilities

Capabilities
  • Performs natural language-driven video edits, supporting local (e.g., object replacement) and global (e.g., style transfer) modifications
  • Preserves original video motion dynamics during edits for temporal consistency
  • Integrates up to 5 reference images for guided changes like appearance or environment swaps
  • Generates native audio in single pass with multilingual lip-sync across 6 languages
  • Handles text-to-video and image-to-video in unified pipeline, adaptable for edit tasks
  • Supports 1080p output with cinematic quality, especially in human-motion scenarios
  • Offers precise camera movement and motion intensity controls post-edit
  • Excels in delicate facial performance, realistic body motion, and speech coordination

What Can I Use It For?

Use Cases for Alibaba | HappyHorse 1.0 | Video Edit

Content Creators: Edit raw footage for social media by swapping backgrounds while preserving performer motion. Example prompt: "Change the studio to a beach setting using reference image 1, keep lip-sync and dance exact."

Marketers: Customize ad videos globally for branding, adding product placements without re-recording. Example: "Replace the actor's shirt with our logo apparel from references 2-3, maintain speech and gestures."

Designers: Refine prototype demos with style transfers for client previews. Example: "Edit video to futuristic aesthetic via reference images, preserve mechanical motion dynamics."

Developers: Prototype via Alibaba | HappyHorse 1.0 | Video Edit API on each::labs for app-integrated edits, like local face swaps in user videos with perfect audio sync.

Things to Be Aware Of

Things to Be Aware Of

Alibaba | HappyHorse 1.0 | Video Edit may struggle with complex multi-object scenes where motion preservation conflicts with heavy edits, leading to minor artifacts. Users often overlook reference image quality—low-res inputs degrade outputs. Edge cases like rapid motions or occluded faces can weaken lip-sync accuracy. Common mistakes include overly ambiguous prompts, causing unintended global changes. Resource needs: H100 recommended; slower on lesser hardware. Test iteratively on each::labs to catch sync drifts early.

Limitations

Limitations

Alibaba | HappyHorse 1.0 | Video Edit caps at 1080p and 5-10 second clips, unsuitable for 4K or long-form content. No confirmed support for durations beyond 10 seconds or non-standard aspect ratios. Lacks classifier-free guidance, potentially limiting fine control in ambiguous edits. Audio generation, while native, may falter in non-human-centric scenarios. Vendor-reported specs unverified independently; consumer GPU support pending.

Pricing

Pricing Type: Dynamic

720P pricing: $0.14/sec (input + output video billed)

Current Pricing

720P pricing: $0.14/sec (input + output video billed)

Pricing Rules

ConditionPricing
resolution matches "720P"(Active)720P pricing: $0.14/sec (input + output video billed)
Rule 21080P pricing: $0.24/sec (input + output video billed, default)