Professional voice cloning

Voice cloning is only available to Enterprise customers.


Professional voice cloning lets you train a highly realistic model of a voice. We achieve this by training a dedicated model on a large set of scripted speaker data.

A professional voice clone will mirror the speaker data it is trained on. For an optimal clone, we require speakers to record a tailored script based on your content to help you achieve your desired speaking style. It's important that unwanted artefacts or sounds are not present during the recording. Otherwise, the model will replicate unwanted features, resulting in a subpar voice clone.

Professional voice cloning is currently available in English, Norwegian, Swedish, Danish, German, and Afrikaans. More languages will be added soon.

To speak to a member of our team about professional voice cloning please book a meeting.

Custom script

To help capture your ideal speaking style, we’ll use our Script Generator build a custom script based on your articles. Articles serve as a natural example of the content that the voice clone will be used to narrate, making it easier for the speaker to achieve your desired speaking style. The number of articles selected (typically between 50-200) will depend on their features, length and voice language.

Voice cloning process

Professional voice clones can require between 2-6 hours of speaker data depending on the language, training can take up to 2 weeks.


Script data collection

Submit up to 2,000 article URLs. For existing BeyondWords users, this step can be skipped if the data is already available in your BeyondWords audio CMS.

24 hours

Speaker and studio validation

Provide a sample audio recording from your selected speaker, recorded in the intended studio environment. This allows our team to evaluate both the speaker and studio quality to ensure they meet our standards.

24 hours

Script generation

Our Script generator will generate a custom script with a shareable URL.

24 hours

Script recording

Your selected speaker will record the script. This process may vary in duration based on script length and complexity.

See script recording guidelines and specifications.

1-5 days


We will review and preprocess the speaker data to prepare it for training.

1-2 days


We will train a single-speaker voice model using the pre-processed speaker data.

5-10 days


The voice will undergo thorough testing to evaluate its performance and ensure it meets our quality standards.

5 days

Voice deployment

After successful testing and approval, the finalized voice model will be deployed to your BeyondWords account for immediate use.

24 hours

Last updated