As much as I would like to open this discussion with a bold statement about how artificial intelligence voices can never replace human voice actors (being one of the latter myself), I cannot. Learning and Development (L&D) professionals have adapted to many different applications of AI in recent years. Now, AI is moving to replace human voice actors in voiceover narrations.
The real question is not whether AI voices can be effectively used for voiceovers in everything from podcasts to explainer videos to eLearning modules, it is when and why to use it certain circumstances, and when to stick with the tried-and-true voiceover professionals. And just how you make those decisions.
AI Voices Today
We have come a long way from the robotic-sounding synthetic voices that were generated by computers. The new AI voices use natural language learning algorithms that either can create voices from scratch in a more believable manner or use actual human voices to create voices that can be manipulated to sound more realistic (i.e., avatars). The latest technology also allows you to alter everything from languages to pace and inflections. That said, there are benefits and drawbacks to even this advanced programming.
Pros of AI Voices
- AI has expanded markets for voiceover. Smaller companies that would not otherwise hire a professional voice actor might now be able to offer audios of their blogs and training projects because of the lower costs of AI voices. Interactive voice recordings and call center responses may be needed in different languages. AI makes that possible for more companies.
- AI voices can be less costly than voice actors. The cost of AI voice generation varies greatly depending on the platform and the extent of usage. Some platforms offer free trials. Paid plans can range from $44 per monthfor 24 voice avatars, 30 voice styles, and five projects to $179 per month for 100 projects with dedicated support. (wellsaidlabs.com). Other sources cite pricing from $6,000 to $40,000 per year. Comparison shopping is a must when choosing the right platform. It costs more to design a custom voice for your brand that you will never encounter elsewhere.
- Using AI voices can be faster than using voice actors. Although professional voice actors often will give you a quick turnaround, even returning within 24 hours, AI can be created at your desk in a matter of minutes once you understand the program’s intricacies. You do need to choose the right AI voice and lay out plans for the script’s interpretation by the program.
- AI voices offer flexibility and consistency. If you change a term that is used throughout your script, it is easy to just change it. No need to have the voice actor re-record it. If a change needs to be made much later in the process, the voice actor may no longer be available. Or, if you are doing your own narration, you may sound different on the day a change is needed. A consideration, especially with longer eLearning modules, is consistency. The AI voice will always be the same: today, next week, and next year.
- AI voices come in many different languages. It is simple to have the same script translated into different languages. This is important these days when call centers and Websites need to offer information in the listener’s own language. Or your company may have employees all over the world. Some AI voice firms offer 500-plus voices across more than 130 language locales. However, if your script is more than a simple instruction audio, be sure to use a translator to revise it with regional word usage and cultural nuances.
Cons of AI Voices
- The initial investment in AI voice software can be cost prohibitive. Training personnel on how to use AI voice programs, setting it up, and continuously optimizing it can be costly not only in financial output but in the amount of time it takes your staff to master it. Initial investments must be weighed against the type and number of projects it will be used for, and at what point you would break even compared to hiring voice actors for those same projects. You will need to develop a relationship with your AI voice company, as the system needs to be optimized continuously.
- AI voices are taking jobs from professional voice actors. Historically, technological advances, such as robotics, have replaced blue collar workers. Now those losing jobs to artificial intelligence capabilities are more likely to be highly paid, often female, office workers. In most cases, AI assists workers, not replaces them. That is not the case with voice actors. There are wide swaths of voiceover jobs that are being impacted. They are typically what many call “low-hanging fruit”: interactive voice prompts, low-budget training modules. and audio articles and newsletters.
- AI voices cannot duplicate human voices entirely. Although your learners may be used to Siri, Cortana, and Alexa, AI voices fall short in several areas: relatability, empathy, pronunciations, emotions, personality, humor. Bottom line, they just can’t connect on a personal level. They often can be robotic and emotionless. You can’t get AI to laugh or even sigh. Even though they have gotten much better recently, there is still something missing: a soul.
“Words mean more than what is set down on paper. It takes the human voice to infuse them with deeper meaning.” —Maya Angelou
Pros of Human Voice Actors*
*Note that Google’s Bard generated many of the reasons not to use AI voices, when queried.
- Voice vctors can connect to other humans on an emotional level. People may not always be able to put their finger on it, but they often perceive something as not quite there with AI voices. Humans seek out interactions with other humans. You probably will choose your voice actors by matching them to your learners’ demographics or other traits. The voice actor’s own history may give them an ability to relate to the learner in important ways. Those factors add credibility and trust to the encounter, enhancing acceptance of the material, as well as retention.
- Voice actors are more engaging. They bring emotion, passion, and often empathy to their interpretation of a script. They can impart a broad range of tones—from excited to funny to sad—that you may need to get your message across in a meaningful way. The very timbre of their voices may suggest a given interpretation, where AI voices can’t truly duplicate those human qualities.
- Voice actors can modulate their voices to convey meaning. Voice actors read the context of a script and use tone, pitch, inflection, volume, and pace to bring those thoughts to life and give them meaning. They bring nuances to the material that may not be clear if you are just reading words from a script, as TTS (Text to Speech) does. In other words, they can read between the lines.
- Voice actors are adaptable. They can be directed to perform the way you want the script expressed. Either through detailed instructions or in a live, directed recording session, it is easy to define the exact delivery and emphasis you want from a voice actor. Ask for two to three takes of your script. The voice actor may show you something you hadn’t thought about. They often suggest a different word or approach that would sound better than what is in the script. They will be less likely to mispronounce words.
- Voice actors speak in many different dialects. Although AI companies are trying to offer regional dialects, the choices are limited. On the flip side, voice actors can be found with every dialect imaginable in every corner of the country and the world. This becomes critical when it is important to connect with particular audiences.
Cons of Human Voice Actors
- Voice actors can be expensive. Technically, you can use your own voice, your phone or computer, and free software to create audio for nothing. Businesses that want to hire a podcast production team are looking at $1,000 to $15,000 per episode. If you need a voice actor for a 750-word article, for example, you can expect to pay $749. Depending upon the number of productions you do, costs can add up. You can find less expensive voice actors on casting sites, but you may get what you pay for by sacrificing quality.
- Hiring a voice actor can be time-consuming. The comparison of time needed using voice actors must include the time it takes to seek them out, have them audition, choose the right one, lay out instructions for the proper delivery, and wait for them to record your script remotely or direct them in a studio. And you may have uptakes once you receive the mp3 file. This sometimes can rule out humans for time-sensitive projects, such as daily news briefs.
How to Decide
There are several factors to be considered when deciding whether to use AI voices or human voice actors in your eLearning, training project, or even IVR. And they vary dramatically in how important they are to the success of your project.
Some of these factors are intangibles: trust, credibility, relatability, passion, and adaptability. The voice actor brings so much more than a voice to every voiceover. They bring personality and an ability to read the context in such a way that it is more believable and easier to retain.
Other criteria are easier to quantify: cost, time, and translation into languages.
In making the decision to use AI voices or human voice actors, you must weigh the relevance and importance of these tangible and intangible criteria. Depending upon the type of voiceover project, you may be willing to sacrifice relatability for cost savings, for instance. There are trade-offs to be made that ultimately will result in a program being well-received and retained by your audience—or not.
A Weighted Model
To quantify this difficult decision, you can use a weighted model such as an Excel one that is downloadable at www.uttervoiceovers.com under “Stuff” (the Excel file will automatically compute the scores for you; examples of the model in use for voiceover for an HR Hotline and voiceover for an onboarding video also are provided). Tangible and intangible voiceover criteria should be considered by rating them on whether they should play a role in your decision-making. These criteria are listed in the model below.
The most critical part of the model has you rank each criterion on importance to the success of the project, from least important at 1 to most important at 10. (Yes, these are subjective decisions.)
The model attempts to aid in decision-making by subjectively assigning a rating on what to consider at all, and then ranking each voiceover criterion on its importance to the project’s success. The purpose here is not to give a definitive answer about which to choose, but to highlight the way to go about making that decision.
The Bottom Line
I suggest that there will be circumstances where human voice actors will always be more appropriate, where AI—no matter how advanced—cannot truly duplicate the intricacies of the human voice. Further advances into growing choices of digital media, however, mean more opportunities for AI voices to be optimal. Tradeoffs will always need to be made.