Minimax Music v2

Audio·minimax-music·by Minimax

MiniMax Music 2.0 transforms text prompts into high-fidelity, diverse musical compositions, blending advanced AI composition, sound design, and arrangement to deliver studio-quality tracks in seconds.

Runtime (p50)
1m
Estimated price
$0.03
Call the API
prediction.sh
sh
curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "minimax-music-v2",
    "version": "0.0.1",
    "input": {
        "prompt": "indie acoustic, wistful, reflective mood, quiet neighborhood streets, autumn chill, soft guitar echoes, warm lamplight through windows",
        "lyrics_prompt": "[verse]\nFallen leaves gather by my worn-out shoes\nThe dusk hums low, like it is humming the blues\nBreath in the cold, memories stirring slow\nFootsteps echo places I used to know\n\n[pre-chorus]\nA quiet heart, but the world still turns\nEvery window glows, while my longing burns\n\n[chorus]\nI push open the café door, warmth meets my skin\nSoft chatter, soft music, a world within\nSteam rising gently, like thoughts I can not phrase\nI sit by the window, lost in a foggy haze",
        "audio_setting": {
            "bitrate": 32000,
            "format": "mp3",
            "sample_rate": 16000
        }
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/
Documentation8 sections
  • Overview

    minimax-music-v2 — Text-to-Audio AI Model

    MiniMax Music 2.0, accessible as minimax-music-v2, revolutionizes music creation by transforming text prompts into studio-quality tracks with vocals, lyrics, and full instrumentation in seconds, eliminating the need for expensive studios or years of production expertise. Developed by Minimax as part of the minimax-music family, this text-to-audio AI model excels in generating professional-grade songs across genres like pop, indie, electronic, and folk, supporting durations up to 5 minutes. Users searching for "Minimax text-to-audio" or "AI music generator with lyrics" will find minimax-music-v2 delivers unmatched vocal nuance and structural logic, producing complete compositions from simple prompts or detailed lyrics.

  • Capabilities
    • Generates full-length, studio-quality music tracks from text prompts, including both instrumental and vocal compositions
    • Supports a wide range of genres, from pop, rock, and jazz to electronic, classical, and traditional music
    • Can synthesize natural-sounding vocals in multiple languages, aligning melody and rhythm to provided lyrics
    • Offers advanced voice controls, including emotion, pitch, speed, and vocal effects (e.g., echo, robotic, lo-fi)
    • Delivers rapid generation with low latency, suitable for real-time creative workflows
    • Adapts to diverse creative needs, from background music to complete songs with custom lyrics
    • Maintains high audio fidelity and professional arrangement quality across outputs
  • Use cases

    Use Cases for minimax-music-v2

    Songwriters use minimax-music-v2 to prototype demos instantly; input lyrics tagged with [Intro][Verse 1][Chorus] and a style prompt like "upbeat indie folk with acoustic guitar and harmonious vocals," yielding a polished 3-minute track ready for refinement. Content creators producing videos or podcasts leverage its vocal separation for custom background music, generating instrumental or full songs that sync perfectly without post-production muddiness.

    Marketers crafting brand jingles input scenario prompts such as "energetic electronic theme for tech startup with synth leads and driving bass," creating original sonic identities in under 2 minutes via the minimax-music-v2 API. Developers building "AI music generator apps" integrate its precise structure controls for apps serving musicians needing quick, high-quality outputs in diverse styles.

  • Tips & tricks

    How to Use minimax-music-v2 on Eachlabs

    Access minimax-music-v2 seamlessly on Eachlabs via the Playground for instant testing with text prompts, lyrics, and style tags, or through the API and SDK for scalable integrations. Provide a music description, optional structured lyrics, and parameters like duration up to 5 minutes to receive high-fidelity audio files in supported formats, with outputs featuring crisp vocals and instrumentation in 1-2 minutes.

    ---
  • Technical spec

    What Sets minimax-music-v2 Apart

    minimax-music-v2 stands out in the text-to-audio AI model landscape through its paragraph-level precision control, enabling detailed song structures with tags like [Verse], [Chorus], and [Bridge] for coherent, professional arrangements that most competitors lack. This capability allows creators to craft full songs with exact pacing and transitions, generating tracks in 1-2 minutes that rival human productions. Unlike generic music AIs, it supports over 100 instruments with studio-grade mixing that separates vocals from accompaniment, reducing muddiness in complex arrangements for crisp, high-fidelity output in multiple audio formats.

    • Advanced vocal synthesis: Delivers smooth pitch transitions, natural vibrato, and resonance shifts for expressive, lifelike singing in 40+ languages, enabling authentic global tracks without manual editing.
    • Style-aware mixing: Automatically adapts to genres like rock or jazz, reproducing power, distortion, or warm tones with professional spatiality and dynamic range, perfect for "Minimax music API" integrations.
    • Customizable duration and structure: Handles prompts from brief ideas to full lyrics up to 5 minutes, with rapid generation times ideal for high-volume "AI song generator from text" workflows.
  • Things to be aware of
    • Some users report that highly complex or ambiguous prompts may produce less coherent or musically focused results
    • The model’s vocal synthesis is generally praised for naturalness, but may occasionally sound synthetic or lack emotional nuance in certain languages or genres
    • Performance benchmarks indicate fast generation times, but resource requirements may increase with longer or higher-quality tracks
    • Consistency across multiple generations can vary; iterative refinement is often necessary for optimal results
    • Positive feedback highlights the model’s versatility, ease of use, and ability to quickly generate professional-sounding music
    • Common concerns include occasional artifacts in vocal tracks, limited fine-grained control over arrangement details, and the need for post-processing in some cases
    • Experimental features, such as advanced voice cloning or multi-language support, may be subject to ongoing updates and improvements
  • Key considerations
    • The quality of the generated music is highly dependent on the specificity and clarity of the input prompt; detailed prompts yield more targeted results
    • For best results, provide both a descriptive prompt and lyrics if vocal output is desired
    • Adjusting parameters such as sample rate and bitrate can impact both quality and generation speed
    • Overly vague or conflicting prompts may result in less coherent or generic outputs
    • Iterative refinement—regenerating with adjusted prompts—can significantly improve final results
    • Prompt engineering is crucial: specifying genre, mood, tempo, and instrumentation leads to more predictable outcomes
    • There is a trade-off between generation speed and output complexity; higher quality or longer tracks may take slightly longer to generate
  • Limitations
    • The model may struggle with highly intricate musical structures or unconventional genres not well represented in its training data
    • Fine control over specific arrangement elements (e.g., precise instrument placement, advanced mixing) is limited compared to manual production
    • Not optimal for scenarios requiring human-level emotional depth or nuanced vocal performance in all languages and styles

Related models

4 models
* FAQ

About Minimax Music v2

01 / 03

What is MiniMax Music v2?

MiniMax Music v2 is an AI music generation model by MiniMax that creates high-quality audio compositions from text prompts. It generates complete music tracks with vocal and instrumental elements across multiple genres, with improved audio fidelity and stylistic range over the previous version.