Separate spoken word from background music, tuned for dialogue.

Vocal Remover for Podcast Editing

Separate spoken word from background music, tuned for dialogue.

Tuned for speech, not song

A dedicated speech-separation mode prioritizes dialogue intelligibility, not stereo width or harmonic preservation.

Preview before you commit

Preview a 15-second sample before running your full episode’s processing minutes.

Isolate dialogue or the bed

Export the clean dialogue track, or the music bed alone if you want to re-score with different, cleared music.

Upload the recording

Dialogue and background music mixed together, as recorded.

Choose Speech isolate mode

Not the default music-separation mode — this one is tuned for spoken word.

Export what you need

The isolated dialogue track, the music bed on its own, or both.

Why speech separation isn’t just "vocal removal for talking"

Sung vocals and speech have very different acoustic signatures — pitch contour, sustain, harmonic structure. A model tuned on singing will under-perform on spoken word, sometimes leaving a faint "underwater" music-bed residue behind dialogue. This mode is trained specifically on speech-plus-music mixtures.

What it won’t do

This isn’t a noise-reduction, de-esser, or de-reverb tool — for room echo, mic hiss, or plosives, pair this with your existing podcast editing software after separation. It solves one specific problem: pulling speech and music apart when they’re mixed together.

Does it work on two overlapping speakers at once?: It separates speech from music reliably; separating two overlapping human voices from each other is a different, harder problem this mode doesn’t attempt.
What if I need both the dialogue and the music bed, just rebalanced?: Export both stems separately and remix them at whatever balance you want in your existing editor.

Ready to get started?

Start free — no credit card required.

Get started free