Kling v2 Text to Video

kling-v2-text-to-video

Fast Inference
REST API

Model Information

Response Time~340 sec
StatusActive
Version
0.0.1
Updated6 days ago

Prerequisites

  • Create an API Key from the Eachlabs Console
  • Install the required dependencies for your chosen language (e.g., requests for Python)

API Integration Steps

1. Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

import requests
import time
API_KEY = "YOUR_API_KEY" # Replace with your API key
HEADERS = {
"X-API-Key": API_KEY,
"Content-Type": "application/json"
}
def create_prediction():
response = requests.post(
"https://api.eachlabs.ai/v1/prediction/",
headers=HEADERS,
json={
"model": "kling-v2-text-to-video",
"version": "0.0.1",
"input": {
"cfg_scale": 0.5,
"negative_prompt": "your negative prompt here",
"aspect_ratio": "16:9",
"duration": 5,
"prompt": "your prompt here"
},
"webhook_url": ""
}
)
prediction = response.json()
if prediction["status"] != "success":
raise Exception(f"Prediction failed: {prediction}")
return prediction["predictionID"]

2. Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

def get_prediction(prediction_id):
while True:
result = requests.get(
f"https://api.eachlabs.ai/v1/prediction/{prediction_id}",
headers=HEADERS
).json()
if result["status"] == "success":
return result
elif result["status"] == "error":
raise Exception(f"Prediction failed: {result}")
time.sleep(1) # Wait before polling again

3. Complete Example

Here's a complete example that puts it all together, including error handling and result processing. This shows how to create a prediction and wait for the result in a production environment.

try:
# Create prediction
prediction_id = create_prediction()
print(f"Prediction created: {prediction_id}")
# Get result
result = get_prediction(prediction_id)
print(f"Output URL: {result['output']}")
print(f"Processing time: {result['metrics']['predict_time']}s")
except Exception as e:
print(f"Error: {e}")

Additional Information

  • The API uses a two-step process: create prediction and poll for results
  • Response time: ~340 seconds
  • Rate limit: 60 requests/minute
  • Concurrent requests: 10 maximum
  • Use long-polling to check prediction status until completion

Overview

Kling v2 Text to Video is a video generation model that converts text descriptions into short, high-quality video clips. Kling v2 Text to Video interprets descriptive prompts to produce realistic or stylized motion visuals based on the user's configurations. Designed for versatility, it supports aspect ratio customization, motion scaling, and prompt control options for targeted video outcomes.

Technical Specifications

  • Always craft clear and descriptive prompts. Avoid ambiguous language.
  • Use short, action-based phrases for better motion interpretation.
  • Limit duration values to 5 or 10 seconds for consistent video quality.
  • Balance CFG Scale values between 0.5 and 0.8 for natural prompt adherence without losing creativity.
  • When possible, pair prompts with Negative Prompts to suppress unwanted details.
  • The Aspect Ratio setting directly influences video framing and should match the intended display platform.
  • Complex scenes may require simplified phrasing for smoother video generation.

Key Considerations

Kling v2 Text to Video does not support uploading images or videos as input sources.

Kling v2 Text to Video requires well-defined prompts for coherent motion sequences.

Overly complex or abstract prompts may result in less predictable outputs.

Video duration is strictly limited to either 5 or 10 seconds.

Aspect Ratio changes significantly affect composition; test different ratios for best framing.

CFG Scale influences creativity versus strict prompt fidelity — values above 0.8 can overly restrict motion diversity.


Legal Information

By using Kling v2 Text to Video model, you agree to:

Tips & Tricks

  • Prompt: Keep language simple and direct. Use action verbs (e.g. "A cat jumping on a table"). Avoid vague terms.
  • Duration:
    • Set to 5 seconds for quick, sharp motions.
    • Set to 10 seconds for sequences needing room to develop visually.
  • Aspect Ratio:
    • Use 16:9 for wide scenes like landscapes or multi-subject action.
    • Use 9:16 for portrait or vertical video formats suitable for mobile content.
    • Use 1:1 for social media square posts or focused subject shots.
  • CFG Scale:
    • Recommended values: 0.5 to 0.8
    • Lower values (0.5) allow more creative freedom and abstract interpretation.
    • Higher values (0.8) enforce stricter alignment with the prompt description.
  • Negative Prompt: Always fill this when specific unwanted elements are to be avoided (e.g., “blurry, distorted, low quality”).

Capabilities

Generates animated video content from text instructions.

Supports dynamic motion rendering based on descriptive language.

Handles multiple scene types: nature, objects, actions, characters.

Adaptable aspect ratios for different display needs.

Can exclude unwanted elements via negative prompts.

Balances prompt faithfulness and creative output with CFG scaling.

What can I use for?

Short promotional videos.

Concept visualization clips.

Quick content creation for social media.

Prototype video generation for design previews.

Visual storytelling based on text descriptions.

Character or scene animation based solely on narrative cues.

Things to be aware of

Test the same prompt across different Aspect Ratios to see framing impact.

Adjust CFG Scale incrementally to find the optimal creativity-control balance.

Use Negative Prompts to block artifacts like “blurry faces” or “oversaturated colors.”

Create action-based prompts (e.g. “a dog chasing a ball through a park”) for best motion results.

Combine abstract and literal terms (e.g. “a dreamy floating city at sunset”) for cinematic outputs.

Compare 5-second vs 10-second durations for pacing differences.

Limitations

No support for image or video input conditioning.

Maximum video duration is capped at 10 seconds.

Excessively detailed or long prompts might not translate well into coherent motion.

Limited control over fine-grain frame-by-frame content.

Higher CFG values may reduce creative variation.

Outputs may occasionally differ in style or detail intensity based on prompt phrasing.

Output Format: MP4