Incredibly Fast Fhisper

incredibly-fast-whisper

Transcribe 150 minutes of audio in 100 seconds with Incredibly Fast Fhisper

L40S 45GB
Fast Inference
REST API

Model Information

Response Time~11 sec
StatusActive
Version
0.0.1
Updated8 days ago

Prerequisites

  • Create an API Key from the Eachlabs Console
  • Install the required dependencies for your chosen language (e.g., requests for Python)

API Integration Steps

1. Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

import requests
import time
API_KEY = "YOUR_API_KEY" # Replace with your API key
HEADERS = {
"X-API-Key": API_KEY,
"Content-Type": "application/json"
}
def create_prediction():
response = requests.post(
"https://api.eachlabs.ai/v1/prediction/",
headers=HEADERS,
json={
"model": "incredibly-fast-whisper",
"version": "0.0.1",
"input": {
"task": "transcribe",
"audio": "your_file.audio/mp3",
"hf_token": "your hf token here",
"language": "None",
"timestamp": "chunk",
"batch_size": "24",
"diarise_audio": false
}
}
)
prediction = response.json()
if prediction["status"] != "success":
raise Exception(f"Prediction failed: {prediction}")
return prediction["predictionID"]

2. Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

def get_prediction(prediction_id):
while True:
result = requests.get(
f"https://api.eachlabs.ai/v1/prediction/{prediction_id}",
headers=HEADERS
).json()
if result["status"] == "success":
return result
elif result["status"] == "error":
raise Exception(f"Prediction failed: {result}")
time.sleep(1) # Wait before polling again

3. Complete Example

Here's a complete example that puts it all together, including error handling and result processing. This shows how to create a prediction and wait for the result in a production environment.

try:
# Create prediction
prediction_id = create_prediction()
print(f"Prediction created: {prediction_id}")
# Get result
result = get_prediction(prediction_id)
print(f"Output URL: {result['output']}")
print(f"Processing time: {result['metrics']['predict_time']}s")
except Exception as e:
print(f"Error: {e}")

Additional Information

  • The API uses a two-step process: create prediction and poll for results
  • Response time: ~11 seconds
  • Rate limit: 60 requests/minute
  • Concurrent requests: 10 maximum
  • Use long-polling to check prediction status until completion

Overview

Insanely Fast Whisper is a state-of-the-art, high-performance speech-to-text model optimized for lightning-fast transcription and multilingual capabilities. It is based on OpenAI's Whisper model but has been significantly optimized for speed and efficiency, making it ideal for real-time transcription and large-scale audio processing tasks.

Technical Specifications

Baic Model : OpenAI Whisper Large v3.

Optimizations:

  • Inference speed increased through model quantization and architectural improvements.
  • Reduced latency for real-time use cases.
  • Language Support: Over 50 languages, including English, Spanish, Mandarin, and Arabic

Key Considerations

Audio Quality Matters:

  • Low-quality audio with excessive noise or low bitrates can reduce transcription accuracy.

Multilingual Transcription:

  • Audio in mixed languages may require manual post-editing for perfect accuracy.

Tips & Tricks

Use Timestamps:

  • Leverage JSON output for detailed analysis, including word-level timestamps.

Custom Fine-Tuning:

  • Fine-tune the model with domain-specific datasets for industry-specific transcription needs.

Real-Time Applications:

  • Pair with streaming services to transcribe live events or meetings.

Capabilities

Real-Time Transcription: Ideal for live meetings, conferences, and events.

Multilingual Transcription: Handles different languages and accents seamlessly.

Noise Tolerance: Performs well even with moderate background noise.

Timestamps: Includes word-level timestamps for precise tracking.

Customizability: Supports fine-tuning for specialized use cases..

What can I use for?

Content Creation: Generate subtitles or transcribe interviews and podcasts.

Live Captions: Provide accessibility for live streams or webinars.

Customer Support: Transcribe call center conversations for analysis and insights.

Education: Create transcriptions of lectures, workshops, or training sessions.

Research: Process large volumes of audio data for linguistic or market analysis.

Meeting Transcripts: Quickly document long meetings or lectures with accurate transcripts.

Things to be aware of

Transcribe a Podcast:

  • Input: A one-hour podcast with clear audio.
  • Output: A detailed transcript with timestamps.

Live Streaming:

  • Integrate the model with live video streams for real-time captions.

Multilingual Scenarios:

  • Transcribe an audio file containing English, Spanish, and German segments.

Interactive Applications:

  • Use the model to power speech-to-text features in an application.

Noisy Audio Files:

  • Test its capabilities by transcribing moderately noisy recordings (e.g., outdoor interviews).

Limitations

Accuracy Variations:

  • Heavily accented or fast speech may slightly reduce transcription quality.

Challenging Audio Scenarios:

  • Overlapping speakers may require additional tools to track speaker activity.

Language Specificity:

  • Clear language settings are necessary for accurate multilingual transcription.

Maximum Audio Length:

  • While the tool can process long audio files, it supports recordings up to 90 minutes. Extremely long recordings may require significant computational resources.

Supported Formats:

  • Supports WAV, MP3, and M4A audio formats; ensure compatibility before processing.

Known Limitations:

  • Performance may vary depending on hardware capabilities and the complexity of the audio content.

Output Format: TEXT


Related AI Models

whisperx-video-transcribe

Video Transcribe

whisperx-video-transcribe

Voice to Text