Professional voice cloning

Professional voice cloning is only available to Enterprise customers.

Introduction

Professional voice cloning lets you train a highly realistic model of a voice. We achieve this by training a dedicated model on scripted speaker data, such as narrated articles.

A professional voice clone will mirror the speaker data it is trained on. For an optimal clone, we require speakers to record a tailored script, such as articles, to ensure the model captures your desired speaking style. It's important that unwanted artefacts or sounds are not present during the recording. Otherwise, the model will replicate unwanted features, resulting in a subpar voice clone.

Professional voice cloning is currently available in 62 languages and accents, including: Afrikaans, Arabic, Bulgarian, Catalan, Chinese (Cantonese), Chinese (Mandarin), Czech, Danish, German, Greek, English, Spanish, Finnish, French, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Malay, Norwegian, Dutch, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Swedish, Tamil, Telugu, Thai, Turkish, and Vietnamese.

Interested in cloning a voice to narrate your articles? Book a meeting.

Script

To help capture your ideal speaking style, we’ll create a script using 10-25 of your own articles. Articles serve as a natural example of the content the voice clone will narrate, making it easier for the speaker to match your desired style.

Voice cloning process

Professional voice clones require 10-25 article recordings, equating to approximately 30 to 90 minutes of voice data, depending on the language. Training can be completed in as little as 24 hours.

Stage
Description
Estimate

Speaker selection

You select a speaker.

-

Script generation

We create a script using at least 25 of your articles.

0.5 days

Script recording

Your selected speaker records the script.

See script recording guidelines and specifications.

1 day

Pre-processing

We review and preprocess the speaker data to prepare it for training.

0.5 days

Fine-tuning

We train a single-speaker voice model using the pre-processed speaker data.

1 day

Deployment

The trained voice model is deployed to your account for immediate use.

Immediate

Last updated