Eachlabs | AI Workflows for app builders
ace-step-1.5-text-to-music

EACHLABS

ACE-Step 1.5 is a diffusion and language model–based text-to-music system that generates music with vocals from natural-language prompts and optional custom lyrics. It supports Chain-of-Thought reasoning for higher quality, multi-output batches, multilingual vocals, and automatic detection of BPM, musical key, and time signature. Use markers like [verse], [chorus], [bridge] or [inst] and [instrumental] to structure songs. Outputs FLAC audio with user-defined duration and is billed per output second, with thinking mode charged at double rate.

Avg Run Time: 30.000s

Model Slug: ace-step-1-5-text-to-music

Playground

Input

Advanced Controls

Output

Example Result

Preview and download your result.

Thinking-enabled (default): $0.0006 per output second (2x rate), multiplied by num_outputs

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents
Overview
Technical Specifications
Key Considerations
Tips & Tricks
Capabilities
What Can I Use It For?
Things to Be Aware Of
Limitations

Overview

ACE-Step 1.5 | Text to Music Overview

ACE-Step 1.5 | Text to Music revolutionizes music creation by transforming natural-language prompts into full songs with vocals, complete with custom lyrics and structured sections. Hosted on each::labs, part of the eachlabs family, this diffusion and language model-based system stands out with its Chain-of-Thought reasoning, enabling higher-quality outputs through step-by-step musical composition logic. Users can generate professional-grade tracks in FLAC format, supporting multilingual vocals and automatic detection of BPM, key, and time signature.

Ideal for creators seeking instant music without instruments or studios, ACE-Step 1.5 | Text to Music handles everything from verses to choruses using simple markers like [verse] or [chorus]. It offers multi-output batches for efficiency and user-defined durations, billed per output second—making it a cost-effective choice for each::labs music-generation workflows. Whether prototyping ideas or producing final tracks, this model delivers coherent, structured music from text alone.

Technical Specifications

Technical Specifications
  • Model Type: Diffusion + language model for text-to-music generation
  • Output Format: High-fidelity FLAC audio files
  • Max Duration: User-defined, up to practical limits based on billing (per second of output)
  • Audio Features: Automatic BPM, musical key, and time signature detection; multilingual vocals
  • Batch Support: Multi-output generation for efficiency
  • Reasoning Mode: Chain-of-Thought for enhanced quality (billed at double rate)
  • Input: Natural-language prompts with optional custom lyrics and structure markers
  • Processing Time: Varies by duration and complexity; typically seconds to minutes per track
  • API Access: Available via each::labs ACE-Step 1.5 | Text to Music API

Key Considerations

Key Considerations

Before using ACE-Step 1.5 | Text to Music, note it's billed per output second, with Chain-of-Thought mode doubling costs for superior results—perfect for final productions but consider basic mode for drafts. No prerequisites beyond a each::labs account; prompts work best in English but support multilingual vocals. Opt for this model over simpler generators when needing structured songs with vocals and auto-analysis of musical elements.

Performance shines in creative workflows but may vary with prompt complexity. Test short durations first to optimize costs in each::labs music-generation pipelines. Ideal for users valuing vocal coherence and song structure without manual editing.

Tips & Tricks

Tips and Tricks

Master ACE-Step 1.5 | Text to Music with precise prompt engineering: Use markers like [verse], [chorus], [bridge], [inst] for instrumental, or [instrumental] to structure songs explicitly. Include genre, mood, tempo hints, and custom lyrics for best results. Enable Chain-of-Thought for complex tracks to leverage step-by-step reasoning.

Optimize parameters by specifying duration upfront and starting with batches of 2-4 outputs. For multilingual vocals, prefix with language, e.g., "French pop ballad." Example prompts:

  • "[intro] Soft piano [verse] Heartbreak in the rain, she left me alone [chorus] I'll never love again, 90 BPM pop ballad with female vocals"
  • "[instrumental] Epic orchestral build-up to [drop] heavy EDM synths, 128 BPM, no lyrics"
  • "[verse 1] Waking up early [chorus] Coffee and dreams, upbeat folk with male vocals, detect key"

Iterate by refining based on auto-detected BPM and key outputs.

Capabilities

Capabilities
  • Generates complete songs with vocals from text prompts and optional custom lyrics
  • Supports song structure via markers: [verse], [chorus], [bridge], [inst], [instrumental]
  • Chain-of-Thought reasoning for improved musical coherence and quality
  • Automatic detection of BPM, musical key, and time signature
  • Multilingual vocal generation for global music creation
  • Multi-output batches to produce variations efficiently
  • User-defined track durations with FLAC output for professional use
  • Accessible via each::labs music-generation API for seamless integration

What Can I Use It For?

Use Cases for ACE-Step 1.5 | Text to Music

Content Creators: Produce custom background tracks for YouTube videos. Example: "[intro] Calm ambient [verse] Exploring new worlds [chorus] Adventure calls, cinematic orchestral with soft vocals, 80 BPM"—leveraging auto-BPM detection for perfect sync.

Marketers: Generate branded jingles quickly. Example: "[chorus] each::labs AI magic, upbeat electronic pop 120 BPM, male rap vocals"—using structure markers for catchy hooks in ads.

Music Producers: Prototype song ideas with vocals. Enable Chain-of-Thought for "[bridge] Emotional guitar solo [outro] Fade with echoes, indie rock 100 BPM"—iterating batches for refinements.

Developers: Integrate into apps via ACE-Step 1.5 | Text to Music API. Example prompt for user-generated multilingual tracks: "Spanish flamenco [verse] Noche de pasión, detect key"—powering dynamic soundtracks.

Things to Be Aware Of

Things to Be Aware Of

ACE-Step 1.5 | Text to Music excels with clear, structured prompts but may produce inconsistent vocals in overly abstract descriptions. Edge cases like extreme genres (e.g., avant-garde noise) or very long durations (>5 minutes) increase processing time and variability. Common mistakes include omitting markers, leading to unstructured outputs—always specify [verse]/[chorus].

Resource needs are low, but Chain-of-Thought mode suits high-end hardware indirectly via cloud. Monitor costs for iterative workflows on each::labs.

Limitations

Limitations

ACE-Step 1.5 | Text to Music cannot import existing audio or MIDI for remixing; it's purely text-driven. Vocals may lack perfect pitch accuracy in complex polyphony, and outputs cap at user-defined durations without infinite loops. Rare prompt ambiguities cause genre drifts. No real-time generation—processing takes time. Multilingual support varies by language prominence.

---

Pricing

Pricing Type: Dynamic

Thinking-enabled (default): $0.0006 per output second (2x rate), multiplied by num_outputs

Current Pricing

Thinking-enabled (default): $0.0006 per output second (2x rate), multiplied by num_outputs
Using default pricing (no specific rule matched)

Pricing Rules

ConditionPricing
thinking matches "false"Non-thinking mode: $0.0003 per output second, multiplied by num_outputs
Default (fallback)(Active)Thinking-enabled (default): $0.0006 per output second (2x rate), multiplied by num_outputs