Kling v1 Pro Text to Video

kling-v1-pro-text-to-video

Kling v1 Pro Text to Video converts written text into high-quality videos with stable and consistent results.

Fast Inference

REST API

Try in Console API Docs Examples

Model Information

Response Time~220 sec

StatusActive

Version

0.0.1

Updatedabout 13 hours ago

Live Demo

Average runtime: ~220 seconds

Input

Configure model parameters

Prompt

Text Prompt

A stylish man walks down a Tokyo street filled with warm glowing neon and animated city signage. He wears a tailored black leather jacket over a dark turtleneck, slim-fit charcoal trousers, and polished black Chelsea boots. He carries a sleek black crossbody bag.

Duration

The duration of the generated video in seconds

Output

View generated results

Result

Preview, share or download your results with a single click.

Each execution costs $0.49 With $1 you can run this model about 2 times.

API Reference

View Full Documentation

Prerequisites

Create an API Key from the Eachlabs Console
Install the required dependencies for your chosen language (e.g., requests for Python)

API Integration Steps

1. Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

import requests
import time

API_KEY = "YOUR_API_KEY"  # Replace with your API key
HEADERS = {
    "X-API-Key": API_KEY,
    "Content-Type": "application/json"
}

def create_prediction():
    response = requests.post(
        "https://api.eachlabs.ai/v1/prediction/",
        headers=HEADERS,
        json={
            "model": "kling-v1-pro-text-to-video",
            "version": "0.0.1",
            "input": {
  "cfg_scale": 0.5,
  "negative_prompt": "blur, distort, and low quality",
  "aspect_ratio": "16:9",
  "duration": 5,
  "prompt": "your prompt here"
},
            "webhook_url": ""
        }
    )
    prediction = response.json()
    
    if prediction["status"] != "success":
        raise Exception(f"Prediction failed: {prediction}")
    
    return prediction["predictionID"]

2. Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

def get_prediction(prediction_id):
    while True:
        result = requests.get(
            f"https://api.eachlabs.ai/v1/prediction/{prediction_id}",
            headers=HEADERS
        ).json()
        
        if result["status"] == "success":
            return result
        elif result["status"] == "error":
            raise Exception(f"Prediction failed: {result}")
        
        time.sleep(1)  # Wait before polling again

3. Complete Example

Here's a complete example that puts it all together, including error handling and result processing. This shows how to create a prediction and wait for the result in a production environment.

try:
    # Create prediction
    prediction_id = create_prediction()
    print(f"Prediction created: {prediction_id}")
    
    # Get result
    result = get_prediction(prediction_id)
    print(f"Output URL: {result['output']}")
    print(f"Processing time: {result['metrics']['predict_time']}s")
except Exception as e:
    print(f"Error: {e}")

Additional Information

The API uses a two-step process: create prediction and poll for results
Response time: ~220 seconds
Rate limit: 60 requests/minute
Concurrent requests: 10 maximum
Use long-polling to check prediction status until completion

Overview

Kling v1 Pro Text to Video is a generative video model designed to convert natural language descriptions into coherent short video clips. It allows users to define the duration, aspect ratio, and visual elements of the resulting video using a prompt-based interface. The model focuses on temporal coherence, smooth motion, and accurate representation of described scenes.

Technical Specifications

Kling v1 Pro Text to Video uses a diffusion-based video generation framework optimized for short-form synthesis.

Video generation maintains temporal consistency with keyframe stabilization over multiple frames.

Model is optimized for rendering fluid motion, camera stability, and visual fidelity in 1–3 second sequences.

Kling v1 Pro Text to Video supports both horizontal (16:9) and vertical (9:16) outputs, with internal frame interpolation to maintain frame smoothness.

Model supports inference with natural language in English and can recognize various object classes, environments, and actions.

Key Considerations

Prompts must be concise and direct. Overly long or poetic descriptions may lead to abstract or distorted results.

Video outputs are limited to predefined durations (5 or 10 seconds) and cannot be extended beyond this range.

Kling v1 Pro Text to Video is not intended for use cases requiring facial accuracy, lip synchronization, or dialogue.

Adding a negative prompt can improve results by removing unwanted elements such as distortions or unwanted objects.

Output resolution and frame rate are fixed and cannot be customized at this stage.

Legal Information for Kling v1 Pro Text to Video

By using this Kling v1 Pro Text to Video, you agree to:

Kling Privacy
Kling SERVICE AGREEMENT

Tips & Tricks

Prompt: Use visually rich but concise language. Example:
“A futuristic city skyline at sunset with flying cars”
Avoid: “The most amazing futuristic scene ever imagined”
✔️ Include lighting conditions, objects, actions, and style (e.g., realistic, cinematic).
✖️ Avoid vague adjectives without context.
CFG Scale (0–1):
- Values around 0.7–0.9 are optimal for balancing prompt fidelity with creativity.
- Lower values (0.3–0.6) may yield more abstract or loosely interpreted results.
- Higher values (close to 1.0) generate literal interpretations but may reduce visual diversity.
Negative Prompt: Use this to suppress unwanted elements.
Example: “blurry, distorted, out of frame” can help refine output.
Aspect Ratio:
- 16:9: Ideal for web or desktop use.
- 9:16: Best for mobile or social media visuals.
- 1:1: Suitable for avatars or square-format content.
Duration:
- 5: Quick preview or short scene. Faster rendering.
- 10: Longer scene with more motion; may contain more content variation.

Capabilities

enerates short-form video clips from English-language text prompts.

Supports basic scene animation such as object motion, environment panning, and atmospheric changes.

Maintains temporal consistency for subjects in motion across frames.

Compatible with various prompt styles, including cinematic, realistic, abstract, or stylized.

Allows suppression of unwanted visual elements through negative prompts.

What can I use for?

Creating visual concepts or mood boards from text.

Visualizing creative ideas for short video formats.

Designing social media visuals or visual references for design and storytelling.

Rapid prototyping of motion scenes for creative projects or pitch decks.

Things to be aware of

Try describing an action paired with an environment:
"A robot walking through a neon-lit alley at night"

Experiment with negative prompts to reduce common issues like blur:
"blurry, low contrast, disfigured"

Test different aspect ratios for different publishing formats.
"16:9" for widescreen, "9:16" for vertical video.

Limitations

Does not support text overlays or subtitles within generated video.

Faces, fine object details, or small text elements may appear distorted.

No direct control over background music, audio, or frame rate.

Cannot depict complex multi-shot storytelling or scene transitions.

Lighting and color rendering may vary across outputs.

Output Format: MP4

Related AI Models