What can I build with FFmpeg API Compose on each::labs?

FFmpeg API Compose on each::labs helps developers and product teams automate video production at scale, from stitching clips to layering overlays and syncing audio with visuals. It fits pipelines that generate personalized videos, social cuts, or templated content where every render combines different media sources.

How is FFmpeg API Compose different from a simple video merger?

A basic merger joins clips end to end, while FFmpeg API Compose positions each track independently on a timeline with its own timestamp, duration, and keyframe transformations. The result is layered, synchronized video rather than a single concatenation, and you get both the rendered file and a thumbnail for previews.

ffmpeg-api-compose

Video·ffmpeg·by Ffmpeg Api

FFmpeg API Compose merges video, audio, and image tracks into one timeline-based clip, ideal for developers automating video composition workflows.

Try it now →

API reference

Runtime (p50): 1m
Estimated price: $0.0002 / sec

Call the API

prediction.sh

curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "ffmpeg-api-compose",
    "version": "0.0.1",
    "input": {
        "tracks": [
            {
                "id": "Video1",
                "type": "video",
                "keyframes": [
                    {
                        "url": "https://cdn-us.eachlabs.ai/defaults/e3fd3b0157af4773801f1ed8b90b5fb3.mp4",
                        "duration": 5000,
                        "timestamp": 0
                    }
                ]
            },
            {
                "id": "Audio1",
                "type": "audio",
                "keyframes": [
                    {
                        "url": "https://cdn-us.eachlabs.ai/defaults/9cf1d78015d4492db8576b5536a799cc.mp3",
                        "duration": 5000,
                        "timestamp": 0
                    }
                ]
            }
        ]
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/

Documentation8 sections

Overview
ffmpeg-api-compose Overview

The ffmpeg-api-compose model from Ffmpeg Api is a timeline-based video composition API that programmatically merges video, audio, and image tracks into a single rendered clip. Built on the proven ffmpeg toolchain, it focuses on deterministic, scriptable video-to-video assembly rather than generative synthesis. This makes ffmpeg-api-compose ideal for developers who need to automate complex editing tasks such as layering multiple assets, trimming, positioning overlays, and syncing sound without opening a traditional NLE. Within the Ffmpeg Api video-to-video family, its primary differentiator is precise track-level control through a simple HTTP API, enabling repeatable, infrastructure-ready workflows that integrate cleanly into backends, batch jobs, and microservices at scale.
Capabilities
Capabilities
- Multi-track composition: Merge multiple video, audio, and image assets into a single rendered timeline-controlled output.
- Precise trimming and sequencing: Set in/out points for each clip, concatenate segments, and control the exact playback order.
- Layered overlays: Place logos, watermarks, or static graphics on top of video with custom positioning and duration.
- Audio management: Replace, mix, or mute audio tracks, adjust levels, and align sound with specific video segments.
- Canvas control: Define the output resolution and aspect ratio, including letterboxing, cropping, or scaling to fit.
- Transition support: Use ffmpeg filters to apply simple transitions such as fades, crossfades, and opacity ramps between clips.
- Programmatic automation: Drive large-scale batch rendering or dynamic, per-user compositions via the ffmpeg-api-compose API.
- Deterministic results: Given the same inputs and parameters, the model produces consistent outputs suitable for production pipelines.
Use cases
Use Cases for ffmpeg-api-compose

Content creators can use ffmpeg-api-compose to automatically assemble episode intros and outros by concatenating a main recording with branded bumpers and a music bed, leveraging its precise trimming and sequencing. A typical setup: "Combine this main MP4 with this intro and outro, normalize audio, and export at 1080p 30fps."

Marketers can generate personalized ad variations at scale by swapping end cards, logos, or offer graphics via layered overlays, while keeping a consistent master video. For example: "Render a vertical 15-second ad, overlay a region-specific price image during the last 5 seconds."

Developers can integrate Ffmpeg Api video-to-video workflows into their backends, stitching user-generated clips, background music, and captions into a single video for social sharing, fully automated.

Designers and product teams can prototype interface demos by composing screen recordings, zoomed-in crops, and annotation layers into one explanatory clip without manual editing.
Tips & tricks
Tips and Tricks

To get the most from ffmpeg-api-compose, treat it like a programmable timeline. Define a clear canvas resolution and aspect ratio first, then map each track’s position, duration, and z‑order relative to that canvas. When calling the ffmpeg-api-compose API, keep file paths and URLs clean and stable, and normalize frame rates on input where possible to avoid sync issues. Start with simpler compositions—one background video, one overlay image, one audio track—before scaling up to multi-layer edits.

Example request concepts (expressed as prompts for clarity):
- "Compose a 15-second 1080x1920 clip using this vertical video as background, mute its audio, and overlay a PNG logo at the top-right for the full duration."
- "Create a 30-second 1920x1080 montage by concatenating three MP4 clips, adding 0.5-second crossfades, and mixing a background music track at -12 dB."
- "Generate a square 1080x1080 video from a 16:9 source by cropping center, then place caption text as a burned-in subtitle band at the bottom."
Technical spec
Technical Specifications
- Engine: Backed by the open‑source ffmpeg multimedia framework for composition and rendering.
- Typical Resolution Support: Designed to handle common HD and vertical formats (such as 1920x1080, 1080x1920, 1280x720), with exact limits determined by the underlying ffmpeg deployment and infrastructure.
- Aspect Ratios: Supports standard ratios including 16:9, 9:16, and 1:1, plus custom ratios where inputs and canvas dimensions are explicitly defined.
- Max Duration: Practical limits are governed by server resources and timeout settings; the API is best suited for short to medium-length social and marketing clips.
- Input Formats: Common video (e.g., MP4, MOV, WebM), image (e.g., PNG, JPEG), and audio (e.g., MP3, WAV) formats typically supported by ffmpeg.
- Output Formats: Usually exported as MP4/H.264 or similar web‑friendly codecs, with profiles configurable via ffmpeg parameters.
- Processing Time: Dependent on clip length, resolution, and filter complexity; most short compositions complete in seconds to a few minutes on typical cloud hardware.
Things to be aware of
Things to Be Aware Of

Because ffmpeg-api-compose relies on underlying ffmpeg filters, misconfigured parameters can lead to out-of-sync audio, unexpected cropping, or black bars. Mixed frame rates or variable frame rate (VFR) sources may require pre-processing for best results. Very high resolutions or long durations can stress CPU resources and increase render times, so consider capping output size for web delivery. Network-accessible media URLs must be stable and reachable from the API environment, or jobs may fail. Users sometimes forget to explicitly set output codecs and bitrates, which can result in larger-than-expected files or incompatible formats for target platforms.
Key considerations
Key Considerations

Because ffmpeg-api-compose is built on ffmpeg rather than a generative model, quality depends entirely on the source media and your composition parameters, not on AI inference. Users should be comfortable specifying timelines, start/end offsets, and layer ordering via JSON or similar structured payloads. The model is best for deterministic edits like concatenation, overlays, and audio replacement, while creative generation is better handled upstream by other models. For production workloads, consider encoding presets and resolution tradeoffs to balance quality, bandwidth, and render time. Access typically requires basic API authentication, adequate storage for media assets, and a workflow that can manage asynchronous job completion or polling.
Limitations
Limitations

ffmpeg-api-compose does not generate new visual content from text; it only composes existing media assets. It is not a creative AI like text-to-video or image generation models, and it cannot infer edits from high-level natural language alone—you must describe timelines and layers structurally. Handling extremely long-form content or 4K+ multi-track timelines may be constrained by server resources and processing time. Some advanced ffmpeg filters or exotic codecs might not be exposed through the API surface, so specialized workflows can require custom ffmpeg setups outside the managed ffmpeg-api-compose API.

Related models

4 models

PixVerse V6 ExtendPixverse

PixVerse ModifyPixverse

Infinitalk · Video to Video AI model preview

Infinitalk · Video to Videoinfinitetalk

PixVerse Lip Sync v2Pixverse

* FAQ

About ffmpeg-api-compose

01 / 03

What is FFmpeg API Compose?

FFmpeg API Compose is a video composition model that assembles multiple media tracks into a single output. You provide video, audio, and image sources, each placed on a timeline with its own start designed for programmatic, repeatable video assembly.

ffmpeg-api-compose