Incredibly Fast Fhisper
incredibly-fast-whisper
Transcribe 150 minutes of audio in 100 seconds with Incredibly Fast Fhisper
Model Information
Input
Configure model parameters
Output
View generated results
Result
Preview, share or download your results with a single click.
the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit hours fly by much too soon. The room was crowded
with a mild wab. The room was crowded with a wild mob. This strong arm shall shield your
honour. She blushed when he gave her a white orchid The beetle droned in the hot June sun
Prerequisites
- Create an API Key from the Eachlabs Console
- Install the required dependencies for your chosen language (e.g., requests for Python)
API Integration Steps
1. Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
import requestsimport timeAPI_KEY = "YOUR_API_KEY" # Replace with your API keyHEADERS = {"X-API-Key": API_KEY,"Content-Type": "application/json"}def create_prediction():response = requests.post("https://api.eachlabs.ai/v1/prediction/",headers=HEADERS,json={"model": "incredibly-fast-whisper","version": "0.0.1","input": {"task": "transcribe","audio": "your_file.audio/mp3","hf_token": "your hf token here","language": "None","timestamp": "chunk","batch_size": "24","diarise_audio": false}})prediction = response.json()if prediction["status"] != "success":raise Exception(f"Prediction failed: {prediction}")return prediction["predictionID"]
2. Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
def get_prediction(prediction_id):while True:result = requests.get(f"https://api.eachlabs.ai/v1/prediction/{prediction_id}",headers=HEADERS).json()if result["status"] == "success":return resultelif result["status"] == "error":raise Exception(f"Prediction failed: {result}")time.sleep(1) # Wait before polling again
3. Complete Example
Here's a complete example that puts it all together, including error handling and result processing. This shows how to create a prediction and wait for the result in a production environment.
try:# Create predictionprediction_id = create_prediction()print(f"Prediction created: {prediction_id}")# Get resultresult = get_prediction(prediction_id)print(f"Output URL: {result['output']}")print(f"Processing time: {result['metrics']['predict_time']}s")except Exception as e:print(f"Error: {e}")
Additional Information
- The API uses a two-step process: create prediction and poll for results
- Response time: ~11 seconds
- Rate limit: 60 requests/minute
- Concurrent requests: 10 maximum
- Use long-polling to check prediction status until completion
Overview
Insanely Fast Whisper is a state-of-the-art, high-performance speech-to-text model optimized for lightning-fast transcription and multilingual capabilities. It is based on OpenAI's Whisper model but has been significantly optimized for speed and efficiency, making it ideal for real-time transcription and large-scale audio processing tasks.
Technical Specifications
Baic Model : OpenAI Whisper Large v3.
Optimizations:
- Inference speed increased through model quantization and architectural improvements.
- Reduced latency for real-time use cases.
- Language Support: Over 50 languages, including English, Spanish, Mandarin, and Arabic
Key Considerations
Audio Quality Matters:
- Low-quality audio with excessive noise or low bitrates can reduce transcription accuracy.
Multilingual Transcription:
- Audio in mixed languages may require manual post-editing for perfect accuracy.
Tips & Tricks
Use Timestamps:
- Leverage JSON output for detailed analysis, including word-level timestamps.
Custom Fine-Tuning:
- Fine-tune the model with domain-specific datasets for industry-specific transcription needs.
Real-Time Applications:
- Pair with streaming services to transcribe live events or meetings.
Capabilities
Real-Time Transcription: Ideal for live meetings, conferences, and events.
Multilingual Transcription: Handles different languages and accents seamlessly.
Noise Tolerance: Performs well even with moderate background noise.
Timestamps: Includes word-level timestamps for precise tracking.
Customizability: Supports fine-tuning for specialized use cases..
What can I use for?
Content Creation: Generate subtitles or transcribe interviews and podcasts.
Live Captions: Provide accessibility for live streams or webinars.
Customer Support: Transcribe call center conversations for analysis and insights.
Education: Create transcriptions of lectures, workshops, or training sessions.
Research: Process large volumes of audio data for linguistic or market analysis.
Meeting Transcripts: Quickly document long meetings or lectures with accurate transcripts.
Things to be aware of
Transcribe a Podcast:
- Input: A one-hour podcast with clear audio.
- Output: A detailed transcript with timestamps.
Live Streaming:
- Integrate the model with live video streams for real-time captions.
Multilingual Scenarios:
- Transcribe an audio file containing English, Spanish, and German segments.
Interactive Applications:
- Use the model to power speech-to-text features in an application.
Noisy Audio Files:
- Test its capabilities by transcribing moderately noisy recordings (e.g., outdoor interviews).
Limitations
Accuracy Variations:
- Heavily accented or fast speech may slightly reduce transcription quality.
Challenging Audio Scenarios:
- Overlapping speakers may require additional tools to track speaker activity.
Language Specificity:
- Clear language settings are necessary for accurate multilingual transcription.
Maximum Audio Length:
- While the tool can process long audio files, it supports recordings up to 90 minutes. Extremely long recordings may require significant computational resources.
Supported Formats:
- Supports WAV, MP3, and M4A audio formats; ensure compatibility before processing.
Known Limitations:
- Performance may vary depending on hardware capabilities and the complexity of the audio content.
Output Format: TEXT