# MM Audio MMAudio generates synchronized audio given video and/or text inputs. ## API Information - **Model Slug:** mmaudio - **Branded URL:** https://www.eachlabs.ai/meta/mm-audio/mmaudio - **Provider:** Meta - **Category:** Video to Video - **Output Type:** video - **Status:** active - **Version:** 0.0.1 - **Base Cost:** Per-second pricing based on provider predict_time. Rate: $0.00108/sec from GPU tier. - **Estimated Processing Time:** 5 seconds - **Last Updated:** 2026-04-06 - **Interactive Demo:** https://www.eachlabs.ai/ai-models/mmaudio ## Pricing - **Charge Type:** dynamic - **Pricing Details:** Per-second pricing based on provider predict_time. Rate: $0.00108/sec from GPU tier. ### Pricing Rules | Condition | Pricing | | --- | --- | | Rule 1 | Per-second pricing based on provider predict_time. Rate: $0.00108/sec from GPU tier. | ## Input Schema | Parameter | Type | Required | Default | Constraints | Description | |-----------|------|----------|---------|-------------|-------------| | prompt | string | No | - | - | Text prompt for generated audio | | negative_prompt | string | No | music | - | Negative prompt to avoid certain sounds | | video | string | Yes | - | video/mp4 | Optional video file for video-to-audio generation | | duration | number | No | 8 | - | Duration of output in seconds | | num_steps | integer | No | 25 | - | Number of inference steps | | cfg_strength | number | No | 4.5 | - | Guidance strength (CFG) | | seed | integer | No | - | - | Seed | ## Example Request ```bash curl -X POST https://api.eachlabs.ai/v1/prediction/ \ -H "X-API-Key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "mmaudio", "input": { "video": "https://storage.googleapis.com/magicpoint/inputs/mmaudio-input.mp4" } }' ``` ## Output Schema Response returned by `GET /v1/prediction/{id}` when the job completes: ```json { "status": "success", "predictionID": "string", "output": "string (URL of generated video)", "metrics": { "predict_time": "number (seconds)" } } ``` ## Polling ```bash curl https://api.eachlabs.ai/v1/prediction/{PREDICTION_ID} \ -H "X-API-Key: YOUR_API_KEY" ``` | Status | Meaning | |--------|---------| | `processing` | Still running — poll again | | `success` | Done — read `output` | | `error` | Failed — read `message` / `details` | ## Webhook (alternative to polling) Pass `"webhook_url": "https://your.host/path"` in the create request. Eachlabs POSTs this payload when the job ends: ```json { "exec_id": "prediction-uuid", "status": "succeeded", "output": "https://...", "error": "" } ``` `status` is `"succeeded"` or `"failed"`. `exec_id` equals the `predictionID` from create. Return 2xx within 30 seconds. ## Errors Error body: `{ "status": "error", "message": "...", "details": "..." }` | Code | Meaning | |------|---------| | `400` | Invalid input | | `401` | Missing / invalid `X-API-Key` | | `404` | Unknown model or prediction id | | `429` | Rate limit — 100 creates / min, 10 concurrent per key | | `5xx` | Retry with backoff | ## Overview **mmaudio — Video-to-Audio AI Model** Developed by Meta as part of the mm-audio family, mmaudio is a multimodal audio generation model that creates synchronized audio from video and/or text inputs. Rather than treating video-to-audio and text-to-audio as separate tasks, mmaudio unifies both capabilities within a single architecture, enabling developers to generate high-quality, temporally aligned audio for video content, narration, ambient soundscapes, and interactive applications. The core problem mmaudio solves is the challenge of creating audio that perfectly synchronizes with visual content. Whether you're working with existing video footage or generating audio from text descriptions, temporal misalignment between audio and visual events breaks immersion and reduces perceived quality. mmaudio addresses this through fine-grained temporal synchronization, ensuring every audio event aligns precisely with its corresponding visual moment—critical for video-to-audio AI model applications where even millisecond-level drift becomes noticeable. This unified approach eliminates the need to manage multiple models or APIs for different input types, making mmaudio ideal for developers building AI video audio synthesis tools that need flexibility across use cases. ## Usage Notes - API Base URL: `https://api.eachlabs.ai/v1` - Authentication: send `X-API-Key: YOUR_API_KEY`. Generate a key from the Eachlabs dashboard at https://www.eachlabs.ai/dashboard/api-keys. - File-typed parameters (`*_url`, `image_url`, `video_url`, `audio_url`, etc.) accept publicly-reachable HTTPS URLs only. Upload your asset first (GCS / S3 / your CDN) and pass the resulting URL. Data-URIs and localhost URLs are rejected. - For structured parameters (arrays / objects) send real JSON values, not stringified payloads. - Monetary values are reported in USD; per-token / per-megapixel rates may be billed in micro-cents internally. - Prefer `webhook_url` over polling for long-running predictions — see the Webhook Callback section.