Script recording guidelines

Learn how to read and record the voice cloning script effectively.

In order to create a professional voice clone, we need the speaker to record a tailored script based on your content. This script is specially designed to provide all the speech data required for effective voice training.

The quality of the script reading — both in terms of performance and sound quality — will have an impact on the resulting voice clone. We have put together these speaking and recording guidelines to help you achieve the best possible result.

Speaking guidelines

Our scripts contain thousands of utterances, which we use to train a dedicated voice model. We require a minimum of 1,000 high-quality utterances to train production-ready voices, but this number may vary depending on the language.

We recommend that you record 100 utterances per session, which will take around 45 minutes each. This will reduce the risk of error and voice fatigue.

The speaker should read the script as if they were narrating an article. A few examples below:

Below are a few tips to help ensure highest-quality recording:

  • Recording location: It is important to record in a quiet location and to use the same recording equipment throughout. We recommend recording in a professional studio and sitting at a consistent distance from the microphone. The speaker should make sure they are comfortable before recording, to eliminate the need for movement.

  • Pronunciations: To ensure that words are mapped to their correct sounds, it is crucial that words are pronounced accurately and distinctly, exactly as they are in the script. The script has been normalized for text-to-speech, so you will notice some unusual punctuation and formatting (for example, '2020' written as 'twenty twenty'). Where letters should be pronounced individually, spaces or hyphens will be used to indicate breaks (for example, 'I S S', 'CAR-T'). The speaker should take the time to review the script beforehand and clarify the pronunciation of any unfamiliar or ambiguous words.

  • Speaking style: Use a natural speaking style that you will be able to maintain consistently throughout the recordings. Each line that you record should be plausible in isolation. This means that you shouldn't give particular emphasis to any word which would rely on context from outside the text. While some variance is natural, it is important to keep volume, pitch, intonation, and tempo as consistent as possible.

  • Voice quality: The speaker should take regular water breaks and rest their voice to ensure consistency. Rather than recording the script all at once, we recommend recording in multiple short sessions, to reduce the risk of the voice becoming tired or strained.

  • Breathing and pausing: Make sure to pause after each utterance, and try to breathe away from the microphone before starting the next one. Otherwise, try to keep your breathing at a low and consistent volume, or else the voice clone's breaths can become unnatural and distracting.

Recording guidelines

We recommend saving each utterance as an individual .wav audio file, with the file name matching the utterance ID — for example, 1.wav.

If you wish to record multiple utterances per file, there needs to be a pause of at least 3 seconds between each utterance. The file name should match the utterance ID range — for example, 1-100.wav.

File format

*.wav, Mono

Sampling rate

22 kHz minimum

Sample format

16 bit PCM minimum

Peak volume levels

-3 dB to -6 dB


> 35 dB

Environment noise, echo

The level of noise at start of the wave before speaking: <-70 dB


Can I provide existing audio recordings instead?

Unfortunately, we cannot create your voice clone using pre-existing audio recordings. This is because we cannot guarantee that the recordings contain the necessary speech data or meet the standards required for a high-quality voice clone.

What happens when recording is complete?

Once recording is complete, you will need to submit the audio files to BeyondWords. Our team will then use the recordings to train your voice model.

  • Creating and licensing your AI voice with BeyondWords >

  • Creating a custom voice with BeyondWords >

Last updated