inference · 31.0s

Example inputhover

bpm: 30
shift: 3
lyrics: "[verse 1] Where ideas spark and the future begins, Eachlabs is the place where creation wins. From text to image, from sound to light, We turn imagination into something bright. [chorus] Eachlabs, the power, the speed, the way, From a single thought to a product in a day. Build it, shape it, let it rise, Bring your vision to life before your eyes. [verse 2] Models in motion, connected as one, APIs flowing till the work is done. Creators and builders, side by side, Turning bold dreams into tools with pride."
prompt: "Energetic R&B track with smooth groove, punchy bass, modern drums, and emotional yet powerful vocals."
duration: 60
thinking: true
num_outputs: 1
infer_method: "ode"
lm_cfg_scale: 1
guidance_scale: 7
lm_temperature: 0.85
vocal_language: "unknown"
lm_negative_prompt: "NO USER INPUT"
num_inference_steps: 8
use_constrained_decoding: true

ACE-Step 1.5 · Text to Music

Audio·eachlabs·by eachlabs

ACE-Step 1.5 is a diffusion and language model–based text-to-music system that generates music with vocals from natural-language prompts and optional custom lyrics. It supports Chain-of-Thought reasoning for higher quality, multi-output batches, multilingual vocals, and automatic detection of BPM, musical key, and time signature. Use markers like [verse], [chorus], [bridge] or [inst] and [instrumental] to structure songs. Outputs FLAC audio with user-defined duration and is billed per output second, with thinking mode charged at double rate.

Try it now →

API reference

Runtime (p50): 30s
Estimated price: Usage-based

Call the API

prediction.sh

curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "ace-step-1-5-text-to-music",
    "version": "0.0.1",
    "input": {
        "bpm": 30,
        "shift": 3,
        "lyrics": "[verse 1]\nWhere ideas spark and the future begins,\nEachlabs is the place where creation wins.\nFrom text to image, from sound to light,\nWe turn imagination into something bright.\n\n[chorus]\nEachlabs, the power, the speed, the way,\nFrom a single thought to a product in a day.\nBuild it, shape it, let it rise,\nBring your vision to life before your eyes.\n\n[verse 2]\nModels in motion, connected as one,\nAPIs flowing till the work is done.\nCreators and builders, side by side,\nTurning bold dreams into tools with pride.",
        "prompt": "Energetic R&B track with smooth groove, punchy bass, modern drums, and emotional yet powerful vocals.",
        "duration": 60,
        "thinking": true,
        "num_outputs": 1,
        "infer_method": "ode",
        "lm_cfg_scale": 1,
        "guidance_scale": 7,
        "lm_temperature": 0.85,
        "vocal_language": "unknown",
        "lm_negative_prompt": "NO USER INPUT",
        "num_inference_steps": 8,
        "use_constrained_decoding": true
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/

Documentation8 sections

Overview
ACE-Step 1.5 | Text to Music Overview

ACE-Step 1.5 | Text to Music revolutionizes music creation by transforming natural-language prompts into full songs with vocals, complete with custom lyrics and structured sections. Hosted on each::labs, part of the eachlabs family, this diffusion and language model-based system stands out with its Chain-of-Thought reasoning, enabling higher-quality outputs through step-by-step musical composition logic. Users can generate professional-grade tracks in FLAC format, supporting multilingual vocals and automatic detection of BPM, key, and time signature.

Ideal for creators seeking instant music without instruments or studios, ACE-Step 1.5 | Text to Music handles everything from verses to choruses using simple markers like [verse] or [chorus]. It offers multi-output batches for efficiency and user-defined durations, billed per output second—making it a cost-effective choice for each::labs music-generation workflows. Whether prototyping ideas or producing final tracks, this model delivers coherent, structured music from text alone.
Capabilities
Capabilities
- Generates complete songs with vocals from text prompts and optional custom lyrics
- Supports song structure via markers: [verse], [chorus], [bridge], [inst], [instrumental]
- Chain-of-Thought reasoning for improved musical coherence and quality
- Automatic detection of BPM, musical key, and time signature
- Multilingual vocal generation for global music creation
- Multi-output batches to produce variations efficiently
- User-defined track durations with FLAC output for professional use
- Accessible via each::labs music-generation API for seamless integration
Use cases
Use Cases for ACE-Step 1.5 | Text to Music

Content Creators: Produce custom background tracks for YouTube videos. Example: "[intro] Calm ambient [verse] Exploring new worlds [chorus] Adventure calls, cinematic orchestral with soft vocals, 80 BPM"—leveraging auto-BPM detection for perfect sync.

Marketers: Generate branded jingles quickly. Example: "[chorus] each::labs AI magic, upbeat electronic pop 120 BPM, male rap vocals"—using structure markers for catchy hooks in ads.

Music Producers: Prototype song ideas with vocals. Enable Chain-of-Thought for "[bridge] Emotional guitar solo [outro] Fade with echoes, indie rock 100 BPM"—iterating batches for refinements.

Developers: Integrate into apps via ACE-Step 1.5 | Text to Music API. Example prompt for user-generated multilingual tracks: "Spanish flamenco [verse] Noche de pasión, detect key"—powering dynamic soundtracks.
Tips & tricks
Tips and Tricks

Master ACE-Step 1.5 | Text to Music with precise prompt engineering: Use markers like [verse], [chorus], [bridge], [inst] for instrumental, or [instrumental] to structure songs explicitly. Include genre, mood, tempo hints, and custom lyrics for best results. Enable Chain-of-Thought for complex tracks to leverage step-by-step reasoning.

Optimize parameters by specifying duration upfront and starting with batches of 2-4 outputs. For multilingual vocals, prefix with language, e.g., "French pop ballad." Example prompts:
- "[intro] Soft piano [verse] Heartbreak in the rain, she left me alone [chorus] I'll never love again, 90 BPM pop ballad with female vocals"
- "[instrumental] Epic orchestral build-up to [drop] heavy EDM synths, 128 BPM, no lyrics"
- "[verse 1] Waking up early [chorus] Coffee and dreams, upbeat folk with male vocals, detect key"
Iterate by refining based on auto-detected BPM and key outputs.
Technical spec
Technical Specifications
- Model Type: Diffusion + language model for text-to-music generation
- Output Format: High-fidelity FLAC audio files
- Max Duration: User-defined, up to practical limits based on billing (per second of output)
- Audio Features: Automatic BPM, musical key, and time signature detection; multilingual vocals
- Batch Support: Multi-output generation for efficiency
- Reasoning Mode: Chain-of-Thought for enhanced quality (billed at double rate)
- Input: Natural-language prompts with optional custom lyrics and structure markers
- Processing Time: Varies by duration and complexity; typically seconds to minutes per track
- API Access: Available via each::labs ACE-Step 1.5 | Text to Music API
Things to be aware of
Things to Be Aware Of

ACE-Step 1.5 | Text to Music excels with clear, structured prompts but may produce inconsistent vocals in overly abstract descriptions. Edge cases like extreme genres (e.g., avant-garde noise) or very long durations (>5 minutes) increase processing time and variability. Common mistakes include omitting markers, leading to unstructured outputs—always specify [verse]/[chorus].

Resource needs are low, but Chain-of-Thought mode suits high-end hardware indirectly via cloud. Monitor costs for iterative workflows on each::labs.
Key considerations
Key Considerations

Before using ACE-Step 1.5 | Text to Music, note it's billed per output second, with Chain-of-Thought mode doubling costs for superior results—perfect for final productions but consider basic mode for drafts. No prerequisites beyond a each::labs account; prompts work best in English but support multilingual vocals. Opt for this model over simpler generators when needing structured songs with vocals and auto-analysis of musical elements.

Performance shines in creative workflows but may vary with prompt complexity. Test short durations first to optimize costs in each::labs music-generation pipelines. Ideal for users valuing vocal coherence and song structure without manual editing.
Limitations
Limitations

ACE-Step 1.5 | Text to Music cannot import existing audio or MIDI for remixing; it's purely text-driven. Vocals may lack perfect pitch accuracy in complex polyphony, and outputs cap at user-defined durations without infinite loops. Rare prompt ambiguities cause genre drifts. No real-time generation—processing takes time. Multilingual support varies by language prominence.
---