WHISPER

Wizper with Timestamp is a multilingual speech recognition and translation model built on Whisper v3 that transcribes audio with precise word-level timestamps. It delivers fast, accurate, and time-aligned transcripts, making it ideal for subtitles, media indexing, and real-time transcription workflows

Avg Run Time: 0.000s

Model Slug: wizper-with-timestamp

Input

Audio Url*

Enter a URL or choose a file from your computer.

Invalid URL.

(Max 50MB)

Task

Language

Chunk Level

Max Segment Len

Merge Chunks

Version

Output

Example Result

Preview and download your result.

{"output":{"chunks":[0:{...}
1:{...}
]
"languages":[0:"en"
]
"text":"the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit for a queen's table a big wet stain was on the round carpet the kite dipped and swayed but stayed aloft the pleasant hours fly by much too soon the room was crowded with a mild wab the room was crowded with a wild mob this strong arm shall shield your honour she blushed when he gave her a white orchid The beetle droned in the hot June sun."
}
}

The total cost depends on how long the model runs. It costs $0.001080 per second. Based on an average runtime of 20 seconds, each run costs about $0.0216. With a $1 budget, you can run the model around 46 times.

AI TRENDS

Related AI Models

You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.

Voice to Text

Accurately converts spoken audio into written text. Fast, reliable, and ideal for transcripts, captions, and voice-based input.

ElevenLabs | Speech to Text

10 s

Voice to Text

Whisper is designed to turn speech into text across multiple languages.

Whisper

8 s

Voice to Text

Wizper is a multilingual speech recognition and translation model based on Whisper v3 that quickly and accurately converts audio files into text. It is optimized for real-time transcription and translation.

Wizper

10 s

Voice to Text

Transcribe 150 minutes of audio in 100 seconds with Incredibly Fast Fhisper