EACHLABS
ACE-Step 1.5 is a diffusion and language model–based text-to-music system that generates music with vocals from natural-language prompts and optional custom lyrics. It supports Chain-of-Thought reasoning for higher quality, multi-output batches, multilingual vocals, and automatic detection of BPM, musical key, and time signature. Use markers like [verse], [chorus], [bridge] or [inst] and [instrumental] to structure songs. Outputs FLAC audio with user-defined duration and is billed per output second, with thinking mode charged at double rate.
Avg Run Time: 30.000s
Model Slug: ace-step-1-5-text-to-music
Playground
Input
Output
Example Result
Preview and download your result.
API & SDK
Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
Readme
Overview
ACE-Step 1.5 | Text to Music Overview
ACE-Step 1.5 | Text to Music revolutionizes music creation by transforming natural-language prompts into full songs with vocals, complete with custom lyrics and structured sections. Hosted on each::labs, part of the eachlabs family, this diffusion and language model-based system stands out with its Chain-of-Thought reasoning, enabling higher-quality outputs through step-by-step musical composition logic. Users can generate professional-grade tracks in FLAC format, supporting multilingual vocals and automatic detection of BPM, key, and time signature.
Ideal for creators seeking instant music without instruments or studios, ACE-Step 1.5 | Text to Music handles everything from verses to choruses using simple markers like [verse] or [chorus]. It offers multi-output batches for efficiency and user-defined durations, billed per output second—making it a cost-effective choice for each::labs music-generation workflows. Whether prototyping ideas or producing final tracks, this model delivers coherent, structured music from text alone.
Technical Specifications
Technical Specifications
- Model Type: Diffusion + language model for text-to-music generation
- Output Format: High-fidelity FLAC audio files
- Max Duration: User-defined, up to practical limits based on billing (per second of output)
- Audio Features: Automatic BPM, musical key, and time signature detection; multilingual vocals
- Batch Support: Multi-output generation for efficiency
- Reasoning Mode: Chain-of-Thought for enhanced quality (billed at double rate)
- Input: Natural-language prompts with optional custom lyrics and structure markers
- Processing Time: Varies by duration and complexity; typically seconds to minutes per track
- API Access: Available via each::labs ACE-Step 1.5 | Text to Music API
Key Considerations
Key Considerations
Before using ACE-Step 1.5 | Text to Music, note it's billed per output second, with Chain-of-Thought mode doubling costs for superior results—perfect for final productions but consider basic mode for drafts. No prerequisites beyond a each::labs account; prompts work best in English but support multilingual vocals. Opt for this model over simpler generators when needing structured songs with vocals and auto-analysis of musical elements.
Performance shines in creative workflows but may vary with prompt complexity. Test short durations first to optimize costs in each::labs music-generation pipelines. Ideal for users valuing vocal coherence and song structure without manual editing.
Tips & Tricks
Tips and Tricks
Master ACE-Step 1.5 | Text to Music with precise prompt engineering: Use markers like [verse], [chorus], [bridge], [inst] for instrumental, or [instrumental] to structure songs explicitly. Include genre, mood, tempo hints, and custom lyrics for best results. Enable Chain-of-Thought for complex tracks to leverage step-by-step reasoning.
Optimize parameters by specifying duration upfront and starting with batches of 2-4 outputs. For multilingual vocals, prefix with language, e.g., "French pop ballad." Example prompts:
- "[intro] Soft piano [verse] Heartbreak in the rain, she left me alone [chorus] I'll never love again, 90 BPM pop ballad with female vocals"
- "[instrumental] Epic orchestral build-up to [drop] heavy EDM synths, 128 BPM, no lyrics"
- "[verse 1] Waking up early [chorus] Coffee and dreams, upbeat folk with male vocals, detect key"
Iterate by refining based on auto-detected BPM and key outputs.
Capabilities
Capabilities
- Generates complete songs with vocals from text prompts and optional custom lyrics
- Supports song structure via markers: [verse], [chorus], [bridge], [inst], [instrumental]
- Chain-of-Thought reasoning for improved musical coherence and quality
- Automatic detection of BPM, musical key, and time signature
- Multilingual vocal generation for global music creation
- Multi-output batches to produce variations efficiently
- User-defined track durations with FLAC output for professional use
- Accessible via each::labs music-generation API for seamless integration
What Can I Use It For?
Use Cases for ACE-Step 1.5 | Text to Music
Content Creators: Produce custom background tracks for YouTube videos. Example: "[intro] Calm ambient [verse] Exploring new worlds [chorus] Adventure calls, cinematic orchestral with soft vocals, 80 BPM"—leveraging auto-BPM detection for perfect sync.
Marketers: Generate branded jingles quickly. Example: "[chorus] each::labs AI magic, upbeat electronic pop 120 BPM, male rap vocals"—using structure markers for catchy hooks in ads.
Music Producers: Prototype song ideas with vocals. Enable Chain-of-Thought for "[bridge] Emotional guitar solo [outro] Fade with echoes, indie rock 100 BPM"—iterating batches for refinements.
Developers: Integrate into apps via ACE-Step 1.5 | Text to Music API. Example prompt for user-generated multilingual tracks: "Spanish flamenco [verse] Noche de pasión, detect key"—powering dynamic soundtracks.
Things to Be Aware Of
Things to Be Aware Of
ACE-Step 1.5 | Text to Music excels with clear, structured prompts but may produce inconsistent vocals in overly abstract descriptions. Edge cases like extreme genres (e.g., avant-garde noise) or very long durations (>5 minutes) increase processing time and variability. Common mistakes include omitting markers, leading to unstructured outputs—always specify [verse]/[chorus].
Resource needs are low, but Chain-of-Thought mode suits high-end hardware indirectly via cloud. Monitor costs for iterative workflows on each::labs.
Limitations
Limitations
ACE-Step 1.5 | Text to Music cannot import existing audio or MIDI for remixing; it's purely text-driven. Vocals may lack perfect pitch accuracy in complex polyphony, and outputs cap at user-defined durations without infinite loops. Rare prompt ambiguities cause genre drifts. No real-time generation—processing takes time. Multilingual support varies by language prominence.
---Pricing
Pricing Type: Dynamic
Thinking-enabled (default): $0.0006 per output second (2x rate), multiplied by num_outputs
Current Pricing
Pricing Rules
| Condition | Pricing |
|---|---|
thinking matches "false" | Non-thinking mode: $0.0003 per output second, multiplied by num_outputs |
Default (fallback)(Active) | Thinking-enabled (default): $0.0006 per output second (2x rate), multiplied by num_outputs |
Related AI Models
You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.
