ACE-Step 1.5 · Text to Music
ACE-Step 1.5 is a diffusion and language model–based text-to-music system that generates music with vocals from natural-language prompts and optional custom lyrics. It supports Chain-of-Thought reasoning for higher quality, multi-output batches, multilingual vocals, and automatic detection of BPM, musical key, and time signature. Use markers like [verse], [chorus], [bridge] or [inst] and [instrumental] to structure songs. Outputs FLAC audio with user-defined duration and is billed per output second, with thinking mode charged at double rate.
- Runtime (p50)
- 30s
- Estimated price
- Usage-based
Overview
ACE-Step 1.5 | Text to Music Overview
ACE-Step 1.5 | Text to Music revolutionizes music creation by transforming natural-language prompts into full songs with vocals, complete with custom lyrics and structured sections. Hosted on each::labs, part of the eachlabs family, this diffusion and language model-based system stands out with its Chain-of-Thought reasoning, enabling higher-quality outputs through step-by-step musical composition logic. Users can generate professional-grade tracks in FLAC format, supporting multilingual vocals and automatic detection of BPM, key, and time signature.
Ideal for creators seeking instant music without instruments or studios, ACE-Step 1.5 | Text to Music handles everything from verses to choruses using simple markers like [verse] or [chorus]. It offers multi-output batches for efficiency and user-defined durations, billed per output second—making it a cost-effective choice for each::labs music-generation workflows. Whether prototyping ideas or producing final tracks, this model delivers coherent, structured music from text alone.
Capabilities
Capabilities
- Generates complete songs with vocals from text prompts and optional custom lyrics
- Supports song structure via markers: [verse], [chorus], [bridge], [inst], [instrumental]
- Chain-of-Thought reasoning for improved musical coherence and quality
- Automatic detection of BPM, musical key, and time signature
- Multilingual vocal generation for global music creation
- Multi-output batches to produce variations efficiently
- User-defined track durations with FLAC output for professional use
- Accessible via each::labs music-generation API for seamless integration
Use cases
Use Cases for ACE-Step 1.5 | Text to Music
Content Creators: Produce custom background tracks for YouTube videos. Example: "[intro] Calm ambient [verse] Exploring new worlds [chorus] Adventure calls, cinematic orchestral with soft vocals, 80 BPM"—leveraging auto-BPM detection for perfect sync.
Marketers: Generate branded jingles quickly. Example: "[chorus] each::labs AI magic, upbeat electronic pop 120 BPM, male rap vocals"—using structure markers for catchy hooks in ads.
Music Producers: Prototype song ideas with vocals. Enable Chain-of-Thought for "[bridge] Emotional guitar solo [outro] Fade with echoes, indie rock 100 BPM"—iterating batches for refinements.
Developers: Integrate into apps via ACE-Step 1.5 | Text to Music API. Example prompt for user-generated multilingual tracks: "Spanish flamenco [verse] Noche de pasión, detect key"—powering dynamic soundtracks.
Tips & tricks
Tips and Tricks
Master ACE-Step 1.5 | Text to Music with precise prompt engineering: Use markers like [verse], [chorus], [bridge], [inst] for instrumental, or [instrumental] to structure songs explicitly. Include genre, mood, tempo hints, and custom lyrics for best results. Enable Chain-of-Thought for complex tracks to leverage step-by-step reasoning.
Optimize parameters by specifying duration upfront and starting with batches of 2-4 outputs. For multilingual vocals, prefix with language, e.g., "French pop ballad." Example prompts:
- "[intro] Soft piano [verse] Heartbreak in the rain, she left me alone [chorus] I'll never love again, 90 BPM pop ballad with female vocals"
- "[instrumental] Epic orchestral build-up to [drop] heavy EDM synths, 128 BPM, no lyrics"
- "[verse 1] Waking up early [chorus] Coffee and dreams, upbeat folk with male vocals, detect key"
Iterate by refining based on auto-detected BPM and key outputs.
Technical spec
Technical Specifications
- Model Type: Diffusion + language model for text-to-music generation
- Output Format: High-fidelity FLAC audio files
- Max Duration: User-defined, up to practical limits based on billing (per second of output)
- Audio Features: Automatic BPM, musical key, and time signature detection; multilingual vocals
- Batch Support: Multi-output generation for efficiency
- Reasoning Mode: Chain-of-Thought for enhanced quality (billed at double rate)
- Input: Natural-language prompts with optional custom lyrics and structure markers
- Processing Time: Varies by duration and complexity; typically seconds to minutes per track
- API Access: Available via each::labs ACE-Step 1.5 | Text to Music API
Things to be aware of
Things to Be Aware Of
ACE-Step 1.5 | Text to Music excels with clear, structured prompts but may produce inconsistent vocals in overly abstract descriptions. Edge cases like extreme genres (e.g., avant-garde noise) or very long durations (>5 minutes) increase processing time and variability. Common mistakes include omitting markers, leading to unstructured outputs—always specify [verse]/[chorus].
Resource needs are low, but Chain-of-Thought mode suits high-end hardware indirectly via cloud. Monitor costs for iterative workflows on each::labs.
Key considerations
Key Considerations
Before using ACE-Step 1.5 | Text to Music, note it's billed per output second, with Chain-of-Thought mode doubling costs for superior results—perfect for final productions but consider basic mode for drafts. No prerequisites beyond a each::labs account; prompts work best in English but support multilingual vocals. Opt for this model over simpler generators when needing structured songs with vocals and auto-analysis of musical elements.
Performance shines in creative workflows but may vary with prompt complexity. Test short durations first to optimize costs in each::labs music-generation pipelines. Ideal for users valuing vocal coherence and song structure without manual editing.
Limitations
Limitations
ACE-Step 1.5 | Text to Music cannot import existing audio or MIDI for remixing; it's purely text-driven. Vocals may lack perfect pitch accuracy in complex polyphony, and outputs cap at user-defined durations without infinite loops. Rare prompt ambiguities cause genre drifts. No real-time generation—processing takes time. Multilingual support varies by language prominence.
---