Separate spoken word from background music, tuned for dialogue.
Start free trialVocal Remover for Podcast Editing
Separate spoken word from background music, tuned for dialogue.
Tuned for speech, not song
A dedicated speech-separation mode prioritizes dialogue intelligibility, not stereo width or harmonic preservation.
Preview before you commit
Preview a 15-second sample before running your full episode’s processing minutes.
Isolate dialogue or the bed
Export the clean dialogue track, or the music bed alone if you want to re-score with different, cleared music.
Upload the recording
Dialogue and background music mixed together, as recorded.
Choose Speech isolate mode
Not the default music-separation mode — this one is tuned for spoken word.
Export what you need
The isolated dialogue track, the music bed on its own, or both.
Why speech separation isn’t just "vocal removal for talking"
Sung vocals and speech have very different acoustic signatures — pitch contour, sustain, harmonic structure. A model tuned on singing will under-perform on spoken word, sometimes leaving a faint "underwater" music-bed residue behind dialogue. This mode is trained specifically on speech-plus-music mixtures.
What it won’t do
This isn’t a noise-reduction, de-esser, or de-reverb tool — for room echo, mic hiss, or plosives, pair this with your existing podcast editing software after separation. It solves one specific problem: pulling speech and music apart when they’re mixed together.
- Does it work on two overlapping speakers at once?
- It separates speech from music reliably; separating two overlapping human voices from each other is a different, harder problem this mode doesn’t attempt.
- What if I need both the dialogue and the music bed, just rebalanced?
- Export both stems separately and remix them at whatever balance you want in your existing editor.