Eachlabs | AI Workflows for app builders
wizper

WHISPER

Wizper is a multilingual speech recognition and translation model based on Whisper v3 that quickly and accurately converts audio files into text. It is optimized for real-time transcription and translation.

Avg Run Time: 10.000s

Model Slug: wizper

Playground

Input

Enter a URL or choose a file from your computer.

Advanced Controls

Output

Example Result

Preview and download your result.

the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit for a queen's table a big wet stain was on the round carpet the kite dipped and swayed but stayed aloft the pleasant hours fly by much too soon the room was crowded with a mild wab the room was crowded with a wild mob this strong arm shall shield your honour she blushed when he gave her a white orchid The beetle droned in the hot June sun.

The total cost depends on how long the model runs. It costs $0.001080 per second. Based on an average runtime of 10 seconds, each run costs about $0.0108. With a $1 budget, you can run the model around 92 times.

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents
Overview
Technical Specifications
Key Considerations
Tips & Tricks
Capabilities
What Can I Use It For?
Things to Be Aware Of
Limitations

Overview

wizper — Voice-to-Text AI Model

Wizper, an advanced iteration in OpenAI's Whisper family based on Whisper v3, delivers fast and precise multilingual speech recognition, converting audio files to text with support for real-time transcription and translation. Developers and creators seeking a voice-to-text AI model rely on wizper to handle diverse accents and languages accurately, minimizing errors in noisy environments or live streams. Optimized for efficiency, it processes mono audio at 16 kHz sample rates, enabling seamless integration into apps for OpenAI voice-to-text workflows without quality loss.

Technical Specifications

What Sets wizper Apart

Wizper stands out in the competitive landscape of voice-to-text AI models through its robust handling of 98 languages via massive pre-training on 680,000 hours of data, outperforming many rivals in multilingual accuracy and accent robustness. This capability allows users to transcribe speech in English, Spanish, Chinese, or lesser-supported tongues with 95%+ accuracy, even amid background noise—ideal for global applications where traditional systems falter.

Built on WhisperX enhancements, wizper employs voice activity detection (VAD) to chunk audio into speech segments, drastically reducing hallucinations and boosting precision on long recordings. Developers benefit from faster processing times, with real-world benchmarks showing 50% latency cuts at 32 kbps bitrates, supporting inputs up to 25 MB (over 100 minutes at optimal settings) in MP3 or M4A formats.

Its tolerance for compressed audio—down to 16 kbps without accuracy drops—makes wizper uniquely efficient for real-time speech transcription, enabling low-latency pipelines with 1-2 second end-to-end delays in live scenarios.

  • Multilingual mastery: Automatic language detection across 98 languages with strong accent handling.
  • Optimized formats: 16 kHz mono, 32-64 kbps MP3/M4A for minimal file sizes and rapid API responses.
  • VAD-powered accuracy: Segments speech to eliminate errors in extended or noisy audio.

Key Considerations


Tips & Tricks

How to Use wizper on Eachlabs

Access wizper through Eachlabs' Playground for instant testing with audio uploads in MP3/M4A formats, or integrate via API/SDK by specifying parameters like sample rate (16 kHz), mono channels, and bitrate (32-64 kbps). Upload files up to 25 MB to get high-accuracy text outputs optimized for real-time wizper API calls, with VAD ensuring precise transcriptions ready for translation or search.

---

Capabilities


What Can I Use It For?

Use Cases for wizper

Content creators building real-time voice-to-text apps use wizper to transcribe live podcasts, leveraging its VAD for clean segmentation and multilingual support to handle guest speakers from diverse regions without manual edits.

Developers integrating OpenAI voice-to-text into mobile apps feed optimized 16 kHz MP3 clips—like a user saying "Schedule meeting for Friday at 3 PM"—and receive instant, accurate text outputs, perfect for voice note apps with automatic language detection.

Marketers analyzing global webinars rely on wizper's noise-robust transcription to convert hours-long sessions into searchable text, supporting formats up to 25 MB for detailed post-event summaries across 98 languages.

Enterprise teams developing multilingual customer service bots use wizper for low-latency translation pipelines, processing compressed audio in real-time to generate responses that maintain contextual accuracy in accented speech.

Things to Be Aware Of


Limitations


Pricing

Pricing Detail

This model runs at a cost of $0.001080 per second.

The average execution time is 10 seconds, but this may vary depending on your input data.

The average cost per run is $0.010800

Pricing Type: Execution Time

Cost Per Second means the total cost is calculated based on how long the model runs. Instead of paying a fixed fee per run, you are charged for every second the model is actively processing. This pricing method provides flexibility, especially for models with variable execution times, because you only pay for the actual time used.