Generator Autocaption

autocaption

Generator Autocaption generates accurate captions for videos, making them more accessible and engaging.

L40S 45GB
Fast Inference
REST API

Model Information

Response Time~15 sec
StatusActive
Version
0.0.1
Updatedabout 1 month ago

Prerequisites

  • Create an API Key from the Eachlabs Console
  • Install the required dependencies for your chosen language (e.g., requests for Python)

API Integration Steps

1. Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

import requests
import time
API_KEY = "YOUR_API_KEY" # Replace with your API key
HEADERS = {
"X-API-Key": API_KEY,
"Content-Type": "application/json"
}
def create_prediction():
response = requests.post(
"https://api.eachlabs.ai/v1/prediction/",
headers=HEADERS,
json={
"model": "autocaption",
"version": "0.0.1",
"input": {
"font": "Poppins/Poppins-ExtraBold.ttf",
"color": "white",
"kerning": "-5",
"opacity": "0",
"MaxChars": "20",
"fontsize": "7",
"translate": false,
"output_video": "True",
"stroke_color": "black",
"stroke_width": "2.6",
"right_to_left": false,
"subs_position": "bottom75",
"highlight_color": "yellow",
"video_file_input": "your_file.video/mp4",
"output_transcript": "True",
"transcript_file_input": "your transcript file input here"
}
}
)
prediction = response.json()
if prediction["status"] != "success":
raise Exception(f"Prediction failed: {prediction}")
return prediction["predictionID"]

2. Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

def get_prediction(prediction_id):
while True:
result = requests.get(
f"https://api.eachlabs.ai/v1/prediction/{prediction_id}",
headers=HEADERS
).json()
if result["status"] == "success":
return result
elif result["status"] == "error":
raise Exception(f"Prediction failed: {result}")
time.sleep(1) # Wait before polling again

3. Complete Example

Here's a complete example that puts it all together, including error handling and result processing. This shows how to create a prediction and wait for the result in a production environment.

try:
# Create prediction
prediction_id = create_prediction()
print(f"Prediction created: {prediction_id}")
# Get result
result = get_prediction(prediction_id)
print(f"Output URL: {result['output']}")
print(f"Processing time: {result['metrics']['predict_time']}s")
except Exception as e:
print(f"Error: {e}")

Additional Information

  • The API uses a two-step process: create prediction and poll for results
  • Response time: ~15 seconds
  • Rate limit: 60 requests/minute
  • Concurrent requests: 10 maximum
  • Use long-polling to check prediction status until completion

Overview

Generator Autocaption is a model designed to automatically add captions to videos, enhancing accessibility and viewer engagement. By utilizing advanced speech recognition, it transcribes audio from video files and overlays customizable captions.

Technical Specifications

Generator Autocaption employs state-of-the-art speech recognition technology to transcribe audio from video files. It offers various customization parameters, allowing users to adjust caption appearance and behavior to suit their needs.

Key Considerations

Audio Quality for Autocaption: High-quality audio inputs yield more accurate transcriptions. Minimize background noise for best results.

Transcript Accuracy: When providing a transcript file, ensure it aligns precisely with the video's audio to maintain synchronization.

Font Selection: Choose fonts that are legible and appropriate for your audience to enhance readability.

Tips & Tricks

  • subs_position: Select the caption position that best suits your video's content. Options include:
    • bottom75: Places captions 75% from the bottom.
    • center: Centers captions on the screen.
    • top: Positions captions at the top.
    • bottom: Places captions at the bottom.
    • left: Aligns captions to the left.
    • right: Aligns captions to the right.
  • fontsize: Adjust the font size to ensure captions are readable without obstructing important visual elements. A moderate size is often effective.
  • color and highlight_color: Select contrasting colors for text and highlights to enhance visibility against the video background.
  • opacity: Set the opacity level to make captions distinguishable without completely blocking the underlying video. A value around 0.8 is typically effective.
  • stroke_color and stroke_width: Adding a stroke can improve text readability against complex backgrounds. Choose a stroke color that contrasts with the text color and set an appropriate width to enhance clarity.
  • kerning: Adjust the spacing between characters to improve readability, especially for larger font sizes.
  • MaxChars: Limit the maximum number of characters per caption line to prevent overcrowding. Keeping it under 40 characters is advisable.
  • right_to_left: Enable this option for languages that read from right to left to ensure proper text alignment.
  • translate: If your video's language is not English, enabling translation will convert captions to English, broadening accessibility.

Capabilities

Autocaption, transcribes  utomatically and caption videos in various formats.

Autocaption, customize caption appearance, including font, color, size, position, and more.

Translate captions into English to enhance accessibility.

What can I use for?

Enhancing Accessibility: Make video content accessible to viewers who are deaf or hard of hearing by providing accurate captions.

Improving Engagement: Cater to viewers who prefer reading captions or are in environments where audio is not feasible.

Educational Content: Add captions to instructional videos to aid comprehension and retention.

Content Localization: Translate and caption videos to reach a broader, non-native audience.

Things to be aware of

Experiment with Customization: Adjust various settings like font, color, and position to see what best fits your video's aesthetic.

Use Pre-existing Transcripts: Input your own transcripts to improve caption accuracy and synchronization.

Test Translation Feature: Enable translation to generate English captions for non-English videos and assess the quality.

Optimize for Different Platforms: Customize captions to meet the specific requirements or best practices of various video-sharing platforms.

Limitations

Speech Recognition Accuracy: The Generator Autocaption's transcription accuracy may be affected by poor audio quality or heavy accents.

Customization Constraints: While offering various customization options, the Generator Autocaption may not support all font types or special characters.

Processing Time for Autocaption: Longer videos may require more processing time, especially with high customization settings.

Output Format: MP4

Related AI Models

cogvlm2-video

CogVLM2

cogvlm2-video

Video to Text
youtube-transcriptor

Youtube Transcriptor

youtube-transcriptor

Video to Text