AI Models - kling/kling-voice

kling/kling-voice

Models

Creates custom voices optimized for use with Kling 2.6 Voice Control, enabling natural, expressive, and controllable voice output.

Kling | Voice Create

Readme

kling-voice by Kuaishou — AI Model Family

The kling-voice family represents advanced AI models from Kuaishou Technology, specializing in Voice Create capabilities integrated with video generation. These models enable seamless native audio generation, including realistic voices, dialogue, sound effects, and ambient tones synchronized with visuals, solving the challenge of creating immersive audio-visual content without separate post-production. The family falls under the broader Kling ecosystem, with Kling as the core model powering Voice to Text and related audio features, encompassing specialized tools like lipsync and multilingual speech across versions such as Kling 3.0, 2.6, and O1.

kling-voice Capabilities and Use Cases

The kling-voice family excels in Voice Create (Voice to Text), transforming text prompts into synchronized speech, emotional tones, and full soundscapes within video outputs. Key models include Kling 3.0, which supports multilingual speech in languages like Chinese, English, Japanese, Korean, and Spanish, with precise control over tone and dialogue structure. Kling 2.6 T2V Pro and I2V Pro introduce native audio generation for voices, sound effects, ambience, and emotional cues in a single pass, while Kling Lipsync (a video-to-video model) applies realistic lip movements to uploaded character videos using custom audio.

Use cases span cinematic storytelling, character animation, and social media content:

Narrative videos: Generate multi-shot scenes with dialogue-driven plots, ideal for short films or ads.
Character-driven clips: Animate avatars speaking custom lines with emotional depth.
Ambient-enhanced visuals: Add background sounds like traffic or nature to match scene mood.

A realistic example using Kling 3.0: "Dad (excited tone, English): 'Look what I found!' Mom (calm, Spanish): '¡Qué maravilloso!' with faint ocean waves in the background." This prompt produces a video where characters lip-sync perfectly, emotions align with delivery, and audio syncs natively.

Models integrate into pipelines: Start with Kling I2V Pro to animate an image into a sequence with native audio from kling-voice, then refine with Kling Lipsync for custom voiceovers, or chain into Kling O1 for high-control editing. Technical specs include 1080p resolution (up to 4K at 60 FPS in Kling 3.0), support for longer clips (6-15 seconds), multi-shot consistency, and formats handling text-to-video (T2V), image-to-video (I2V), and video-to-video (V2V).

What Makes kling-voice Stand Out

kling-voice distinguishes itself through native audio synthesis embedded in video generation, eliminating post-production friction by producing synchronized voices, SFX, and ambiences in one step—unlike traditional workflows requiring separate TTS or editing tools. Strengths include exceptional lip-sync accuracy, emotional tone control (e.g., calm, excited, sad), and character consistency across multi-shot scenes, with Kling 3.0 leading in realistic emotions and multilingual support.

Key features:

Prompt-driven audio: Structure like "<who> (<tone, language>) <dialogue>" for precise control.
Cinematic quality: High-fidelity motion, physics simulation, and 1080p+ outputs with smooth transitions.
Speed and coherence: Faster iterations for complex scenes, preserving identity in longer segments.

This family shines for filmmakers, content creators, agencies, and social media producers needing quick, professional-grade audio-visual drafts. Its Omni mode combines elements like characters and scenes for intricate narratives, while reference-driven consistency ensures stable results from images or videos.

Access kling-voice Models via each::labs API

each::labs is the premier platform for accessing the full kling-voice family through a unified API, bringing Kuaishou's cutting-edge models to developers and creators worldwide. All variants—Kling 3.0, 2.6 Pro, Lipsync, and more—are available in one seamless interface, supporting T2V, I2V, and V2V with native audio.

Experiment instantly in the each::labs Playground for prompt testing and previews, or integrate via SDK for production pipelines. Scale effortlessly with API credits tailored for high-volume use. Sign up to explore the full kling-voice model family on each::labs.

kling/kling-voice models