Video Transcribe
whisperx-video-transcribe
Transform videos into accurate, text-based transcripts effortlessly with Video Transcribe
Model Information
Input
Configure model parameters
Output
View generated results
Result
Preview, share or download your results with a single click.
John, please introduce yourself. All right. I'm John Pitt. I come from England and live in Switzerland. I am 19 years old. I'm a student. I speak English and French. I play the guitar. I enjoy fishing and watching basketball on TV. I often go for a run. That's all. Thanks, John.
Prerequisites
- Create an API Key from the Eachlabs Console
- Install the required dependencies for your chosen language (e.g., requests for Python)
API Integration Steps
1. Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
import requestsimport timeAPI_KEY = "YOUR_API_KEY" # Replace with your API keyHEADERS = {"X-API-Key": API_KEY,"Content-Type": "application/json"}def create_prediction():response = requests.post("https://api.eachlabs.ai/v1/prediction/",headers=HEADERS,json={"model": "whisperx-video-transcribe","version": "0.0.1","input": {"url": "your url here","debug": false,"batch_size": "16"}})prediction = response.json()if prediction["status"] != "success":raise Exception(f"Prediction failed: {prediction}")return prediction["predictionID"]
2. Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
def get_prediction(prediction_id):while True:result = requests.get(f"https://api.eachlabs.ai/v1/prediction/{prediction_id}",headers=HEADERS).json()if result["status"] == "success":return resultelif result["status"] == "error":raise Exception(f"Prediction failed: {result}")time.sleep(1) # Wait before polling again
3. Complete Example
Here's a complete example that puts it all together, including error handling and result processing. This shows how to create a prediction and wait for the result in a production environment.
try:# Create predictionprediction_id = create_prediction()print(f"Prediction created: {prediction_id}")# Get resultresult = get_prediction(prediction_id)print(f"Output URL: {result['output']}")print(f"Processing time: {result['metrics']['predict_time']}s")except Exception as e:print(f"Error: {e}")
Additional Information
- The API uses a two-step process: create prediction and poll for results
- Response time: ~35 seconds
- Rate limit: 60 requests/minute
- Concurrent requests: 10 maximum
- Use long-polling to check prediction status until completion
Overview
WhisperX is an advanced transcription model designed for video and audio processing. Developed by adidoes, this model leverages cutting-edge AI to provide accurate, efficient, and context-aware transcriptions. Its capabilities extend beyond basic transcription, including speaker identification and timestamping, making it a powerful model for multimedia content creators and analysts.
Technical Specifications
- Architecture: Built on the Whisper architecture with enhancements for real-time processing and multi-language support.
- Speaker Diarization: Identifies and labels multiple speakers in audio, aiding in meeting transcription and interview analysis.
- Timestamping: Generates precise timestamps for each segment of the transcription, enabling easy navigation and editing.
Key Considerations
- Audio Quality: Background noise, overlapping speech, or low-quality recordings can affect transcription accuracy.
- Language Accuracy: While the model is proficient in multiple languages, certain dialects or rare languages may yield less accurate results.
Tips & Tricks
- Audio Quality:
- For best results, use clear, high-quality audio files with minimal background noise.
- Pre-process noisy or distorted audio to improve transcription accuracy.
- Timestamps:
- Ensure the audio has consistent pacing for accurate timestamp alignment.
Capabilities
- Accurate Transcription: Provides high-quality transcriptions with minimal errors for clear audio sources.
- Speaker Labeling: Identifies and tags individual speakers, aiding in multi-speaker content analysis.
What can I use for?
- Meeting Transcriptions: Record and transcribe meetings or interviews for easy documentation and review.
- Podcast Summaries: Convert podcast audio into text for blog posts, summaries, or SEO optimization.
Things to be aware of
- Podcast and Interview Transcription:
- Convert audio content into searchable, editable text for archiving or publication.
- Academic and Market Research:
- Transcribe focus groups, interviews, or lectures for data analysis and reporting.
- Language Practice and Learning:
- Use transcriptions to study pronunciation, grammar, and vocabulary in real-world contexts.
Limitations
- Background Noise Sensitivity: The model may struggle with heavily distorted or noisy audio sources.
- Complex Speaker Overlap: In scenarios with multiple speakers talking simultaneously, diarization may not be fully accurate.
Output Format: Text