each::sense is live
Eachlabs | AI Workflows for app builders
rvc-v2

RVC

Voice-to-Voice with RVC v2 converts your spoken voice into any RVC v2 trained AI voice while preserving your tone, emotion, and natural delivery.

Avg Run Time: 0.000s

Model Slug: rvc-v2

Release Date: December 10, 2025

Playground

Input

Enter a URL or choose a file from your computer.

Output

Example Result

Preview and download your result.

The total cost depends on how long the model runs. It costs $0.000247 per second. Based on an average runtime of 20 seconds, each run costs about $0.004950. With a $1 budget, you can run the model around 202 times.

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents
Overview
Technical Specifications
Key Considerations
Tips & Tricks
Capabilities
What Can I Use It For?
Things to Be Aware Of
Limitations

Overview

rvc-v2 — Voice-to-Voice AI Model

Developed by RVC Project as part of the rvc family, rvc-v2 is a powerful voice-to-voice AI model that converts your spoken input into any RVC-trained AI voice while preserving tone, emotion, and natural delivery. This makes it ideal for creators and developers seeking voice-to-voice AI models that maintain linguistic content and prosody without extensive retraining. Built on a conditional variational autoencoder using HuBERT for content encoding and CREPE for pitch extraction, rvc-v2 excels in high-fidelity voice conversion, delivering outputs with UTMOS perceptual quality scores up to 4.190—outperforming alternatives like kNN-VC in naturalness.

Whether you're cloning voices for content creation or building real-time applications, rvc-v2 from RVC Project handles clean audio inputs seamlessly, supporting use cases like "RVC Project voice-to-voice" transformations that users search for daily.

Technical Specifications

What Sets rvc-v2 Apart

rvc-v2 stands out in the competitive landscape of voice conversion tools due to its architecture tailored for realistic self-voice conversion and multi-speaker support. Unlike basic retrieval methods, it uses a conditional variational autoencoder with HuBERT content features and optional CREPE pitch tracking, enabling precise disentanglement of speaker identity from linguistic content. This allows users to generate high-perceptual-quality outputs (UTMOS 4.190) even in adversarial scenarios like watermark removal, while keeping word error rates low at 0.120.

  • Pitch-preserving conversion via CREPE integration: Extracts and fuses pitch contours accurately, enabling singing voice cloning or emotional speech transfer that retains prosody—ideal for "best voice-to-voice AI" applications where natural inflection is critical.
  • ECAPA-adapted multi-speaker embeddings: Supports one-to-one and same-speaker reconstruction through finetuned embeddings, preserving speaker similarity at 0.748 while outperforming in quality over kNN-VC.
  • High-quality training from short clean audio: Requires just 3-5 minutes of mono 44.1 kHz 16-bit WAV data for effective models, with training via 200-1000 epochs for rapid deployment in "RVC v2 voice cloning" workflows.

Processing supports standard audio formats like WAV, with real-time capable inference on GPUs (Nvidia/AMD) or high-end CPUs, making rvc-v2 a top choice for rvc-v2 API integrations.

Key Considerations

  • Adjust transpose (pitch) precisely, using decimals like -4.3, to match target model tone for natural results
  • Select embedder matching the model's training (e.g., ContentVec for most models, Spin for better breath handling and noise robustness)
  • Set Protect Voiceless Consonants to 0.5 or lower to reduce breath artifacts, but avoid extremes to prevent inhumane-sounding suppression of words
  • Use Volume Envelope near 0 to preserve input loudness; closer to 1 matches training dataset volume
  • Enable Split Audio for longer files to ensure consistent volume and faster inference by processing segments individually
  • RMVPE is the go-to pitch extractor for speed and convenience, though it may sound harsh; test FCPE for fuller voices
  • Balance quality vs speed: caching skips redundant steps, but high protection or advanced embedders increase processing time

Tips & Tricks

How to Use rvc-v2 on Eachlabs

Access rvc-v2 seamlessly through Eachlabs Playground for instant testing, API for production-scale "rvc-v2 API" integrations, or SDK for custom apps. Upload clean WAV input (mono, 44.1 kHz recommended), specify your trained RVC model name, epic count (200-1000), and parameters like pitch tracking—generate high-fidelity voice outputs in seconds with preserved prosody and emotion.

---

Capabilities

  • High-fidelity voice conversion preserving tone, emotion, and delivery in speech or song
  • Real-time voice changing via microphone input with low latency using caching and efficient pitch methods
  • Text-to-speech generation using trained RVC models for audiobooks or character voices
  • Multi-step processing for song covers: vocal separation, pitch extraction, timbre swap, and remixing
  • Robust to noise with advanced embedders like Spin, separating timbre from phonetic content accurately
  • Customizable effects: pitch shift, volume matching, consonant protection, filters (low/high-pass, reverb, chorus)
  • Versatile for polyphonic audio with RMVPE+ variants; supports user-trained models via .pth uploads

What Can I Use It For?

Use Cases for rvc-v2

Content creators cloning custom voices: Podcasters or YouTubers upload 3-5 minutes of clean target voice audio to train an rvc-v2 model, then convert their narration—preserving emotion for "clone any voice RVC" results without noise or re-recording. For example, input a script read in your voice with prompt parameters like "convert to trained 'Baha' model, maintain pitch contour," yielding singing-capable outputs.

Developers building real-time voice changers: Integrate rvc-v2 via API for apps needing live "voice-to-voice AI model" conversion, leveraging its cross-platform support (Windows, Mac, Linux) and DirectML for AMD GPUs—perfect for gaming or streaming tools where low WER (0.120) ensures clear communication.

Musicians experimenting with vocal styles: Train on a singer's 32-minute WAV dataset (44.1 kHz mono), then apply to tracks for seamless style transfer, using rvc-v2's HuBERT-CREPE pipeline to match vocal range intelligently without quality loss.

Marketers personalizing audio ads: Convert spokesperson audio to brand voices quickly, capitalizing on rvc-v2's high UTMOS scores for natural delivery in targeted campaigns searching for "RVC Project voice-to-voice."

Things to Be Aware Of

  • Embedder mismatches (e.g., using Spin on ContentVec model) cause poor timbre transfer; always check model description
  • RMVPE can sound harsh on non-harmonic voices; users recommend FCPE or RMVPEGPU forks for smoother results
  • Volume drops in long audio fixed by Split Audio, per inference guides; improves speed and consistency
  • Resource needs: GPU acceleration via RMVPEGPU variants reduces CPU load for real-time use
  • Breaths and noise handled better by Spin embedder, but less common in older models
  • Positive feedback: Fast setup, quality improvements in v2 with autotuning; users praise caching for efficiency
  • Common concerns: Over-protection makes speech robotic; test intermediates to avoid artifacts

Limitations

  • Dependent on training data quality; poor datasets lead to inconsistent timbre or pronunciation bleed
  • Real-time mode sensitive to input noise or mic quality, potentially causing artifacts without preprocessing
  • Limited to RVC-compatible .pth models; incompatible formats like gpt-sovits fail silently