Script recording guidelines

The voice clone will accurately replicate the style and performance of the speaker. For this reason, it is important that each article is recorded with the same energy, pace, and style that you would like the voice clone to have.

Speaking guidelines

Below are a few tips to help ensure the highest-quality recording:

  • Read as separate articles: Please record and deliver the files as separate articles, one audio file per article. This will allow the speaker to give correct meaning and structure to their performance.

  • Take breaks: We recommend adequate breaks during the recording process to reduce the risk of error and voice fatigue.

  • Correcting mistakes: If you make a mistake, please re-record from an appropriate place in the article to maintain the naturalness and fluency of the recording. It is permissible to "punch in". Please let us know if you would like guidelines on achieving this.

  • Indicating mistakes/issues with the script: Click the flag icon on the right-hand side of the script recording page to report any errors, provide comments, or let us know about any edits made to the script.

  • Recording location: It is important to record in a quiet location and to use the same recording equipment throughout. We recommend recording in a professional studio and sitting at a consistent distance from the microphone. You can create a temporary setup with thick fabrics like duvets or quilts to dampen unwanted sounds and echoes.

  • Distance: The speaker should ensure they are comfortable before recording to eliminate the need for movement. Two fists away is a good starting point.

  • Plosives: Employ a pop filter to minimise “p” and “b” sounds, ensuring crisp audio.

  • Pronunciations: To ensure that words are mapped to their correct sounds, words must be pronounced accurately and distinctly, precisely as they are in the script. The script may be normalised for text-to-speech, so you may notice some unusual punctuation and formatting (for example, "2020" might be written as "twenty-twenty"). Where letters should be pronounced individually, spaces or hyphens may be used to indicate breaks (for example, "I S S", "CAR-T"). The speaker should take the time to review the script beforehand and clarify the pronunciation of any unfamiliar or ambiguous words.

  • Speaking style: Use a natural speaking style that you can maintain consistently throughout the recordings. While some variance is natural and desirable, keeping volume, pitch, intonation, and tempo as consistent as possible is important.

  • Voice quality: To ensure consistency, the speaker should take regular water breaks and rest their voice. Rather than recording the script all at once, we recommend recording in multiple short sessions to reduce the risk of the voice becoming tired or strained.

  • Breathing and pausing: Pause naturally at punctuation and try to breathe away from the microphone. Keep your breathing at a low and consistent volume, or the voice clone's breaths can become unnatural and distracting.

  • Hydration and mouth noise: Mouth noise can be copied by the voice clone and cause unpredictable results. Mouth noise can be caused by not being sufficiently hydrated. To help reduce mouth noise, it is important to become well-hydrated on the days leading up to the recording sessions and throughout. Do not wait until the day of recordings to become hydrated — your body will get rid of it. Reducing caffeine and alcohol can help. If you're sufficiently hydrated and still have audible mouth noise, chewing gum with xylitol or a bite of green apple reduces mouth noise on the day of recording.

Recording specifications

We recommend saving each article as an individual .wav audio file, with the file name matching the article ID — for example, 1.wav.

  • File format: *.wav, Mono

  • Sampling rate: Minimum of 22 kHz for clear audio capture.

  • Sample format: Minimum of 16-bit PCM (uncompressed) for lossless audio quality.

  • Volume levels: Between -23dB and -18dB RMS across the recording, with a maximum peak of -3dB to avoid clipping and distortion.

  • Signal-to-noise ratio (SNR): Greater than 35dB (higher is better) for minimal background noise.

  • Environment noise, echo: Background noise level before speaking should be less than -70dB for optimal clarity.

  • Send us the files as "unprocessed" as possible e.g. do not apply filters, compression, limiters and the like. We'll standardise your files in-house to ensure optimal settings perfect for voice cloning.

Last updated