Professional voice cloning
Voice cloning is only available to Enterprise customers.
Introduction
Professional voice cloning lets you train a highly realistic model of a voice. We achieve this by training a dedicated model on a large set of scripted speaker data.
A professional voice clone will mirror the speaker data it is trained on. For an optimal clone, we require speakers to record a tailored script based on your content to help you achieve your desired speaking style. It's important that unwanted artefacts or sounds are not present during the recording. Otherwise, the model will replicate unwanted features, resulting in a subpar voice clone.
Professional voice cloning is currently available in 62 languages.
To speak to a member of our team about professional voice cloning please book a meeting.
Custom script
To help capture your ideal speaking style, we’ll use our Script Generator build a custom script based on your articles. Articles serve as a natural example of the content that the voice clone will be used to narrate, making it easier for the speaker to achieve your desired speaking style. The number of articles selected (typically between 20-50) will depend on their features, length and voice language.
Voice cloning process
Professional voice clones can require between 1-2 hours of speaker data depending on the language, training can take up to 48 hours.
Stage | Description | Estimate |
---|---|---|
Script data collection | Submit up to 1,000 article URLs. For existing BeyondWords users, this step can be skipped if the data is already available in your BeyondWords audio CMS. | 24 hours |
Speaker and studio validation | Provide a sample audio recording from your selected speaker, recorded in the intended studio environment. This allows our team to evaluate both the speaker and studio quality to ensure they meet our standards. | 24 hours |
Script generation | Our Script generator will generate a custom script with a shareable URL. | 24 hours |
Script recording | Your selected speaker will record the script. This process may vary in duration based on script length and complexity. | 1-5 days |
Pre-processing | We will review and preprocess the speaker data to prepare it for training. | 1-2 days |
Fine-tuning | We will train a single-speaker voice model using the pre-processed speaker data. | 1-2 days |
Testing | The voice will undergo thorough testing to evaluate its performance and ensure it meets our quality standards. | 1-2 days |
Voice deployment | After successful testing and approval, the finalized voice model will be deployed to your BeyondWords account for immediate use. | 24 hours |
Last updated