Cloned Voices, and the Call That Sounds Right

The Desk, 7 min read

A few seconds of recorded speech is now enough to reproduce a person's voice convincingly. The clip can come from an interview, a conference talk, a podcast, a voice note or a video posted online. From it, software generates new sentences in that voice, saying things the person never said.

This matters most for people whose voices are easy to find and whose instructions move money or open doors. A finance manager who takes a call from what sounds like the principal, asking for an urgent transfer, has little to go on if the voice is right and the detail is plausible.

Early this year a finance employee at a large company paid out tens of millions after a video call with colleagues who turned out to be synthetic, convincing enough in face and voice to carry the instruction. The same method works just as well at the scale of a single family.

The defence is procedural, not technical. A second channel to confirm any instruction that moves money. A pre-agreed word or question a stranger could not answer. A habit of slowing down when a request arrives with urgency attached, because urgency is the part the caller controls.

A convincing voice is no longer proof of who is speaking, and the clips needed to copy one are often already public. We map how much of a principal's voice and likeness sits in open circulation, and help put the simple confirmation habits in place that a synthetic call cannot pass. The technology will keep improving; the discipline that defeats it does not have to.

Get in touch