To Read or Not to Read: How to use voiceover in elearning

Incorporating voiceover into e-learning programs presents learning designers with a few key questions to consider. For example, “What is the proper balance between text and voice over narration?”, “Should you present most of your learning content through text or through voiceover?”, “How long should I make my voiceover segments,” and “Should there be a limit on the amount of voiceover?”

Many designers create presentations in which voiceover is used to read all of the onscreen text. However, this removes control over pacing from the user, and actually interferes with the user’s ability to read and retain the text content. To test this, try reading a document while someone is talking to you and see how much you retain.

Other designers progressively display a series of bullet points, and provide the details as voiceover, replicating the classroom lecture experience. This also impacts pacing and user control, as most users can read at a much faster pace than is practical for voiceover. This approach may be influenced by using television programming as a model for multimedia design. The television video model suggests that the ideal approach is to present summary text on screen, while providing the bulk of the content through voiceover and background interview audio segments. This approach is used widely in television news programs and other programs that communicate factual content, and is the result of a period when television broadcasts had insufficient visual resolution to make large amounts of on-screen text a viable element of the programming. However, this standard is not necessarily appropriate for an interactive elearning presentation delivered on a high resolution screen placed 24 inches from the viewer.

The higher resolution of computer-based multimedia presentations changes the balance in favor of more text and less voice for delivering detailed content. Higher resolution displays and the expectation of interactive user control of pacing and content makes on-screen text more desirable in many cases.

But voiceover should still be considered an important component of e-learning. The key is in understanding where voiceover is appropriate and effective. This is based primarily on the nature of the content you are trying to communicate. For example:

  • If your objective is to have the audience digest a substantial amount of detailed and complex information, text is likely the preferred option. In this case, if desired, you can use brief voiceover segments to quickly identify the main points, displaying the text progressively one paragraph at a time while the voiceover identifies the subject of the paragraph in a concise manner. The user can use that summary to decide whether to read the full paragraphs, or continue by skipping ahead.
  • Back up bullet points. As noted above, many designers use voiceover to narrate the progressive display of bullet points. This is an appropriate use, but try to keep the voiceover as short as possible so that it coincides with the display of the on-screen content. A good rule of thumb is to ensure that there is always something happening on screen as the voiceover plays;consider long periods of static screen displays with voiceover in the background as a sign of something wrong in your design.
  • Voiceover is most useful in describing aspects of a graphic visual representation. Displaying a visual depiction of a topic progressively, describing the issues via  voiceover as the image builds on screen, is a natural strength of voiceover, and the optimal application of voiceover to elearning.
  • Setting a tone for the program. Voiceover adds a human element that sets the tone. Use voiceover to provide an “outside perspective” on the other module content, by providing introductions to sections of your program or specific topics, provide a friendly introduction to quiz interactions, briefly describe a series of on screen options as they appear, etc. In these cases, the user interprets the voiceover as taking on a specific role in the presentation – establishing context, and providing instructions and interpretations – that may benefit from a more human, natural feel.
  • Introduce a human element where needed. A human voice that plays when the user selects a “Listen to a customer” or “Ask the expert” link helps highlight the human aspects of your learning.

For more information on using voiceover, check out this Learning Solutions Magazine article providing some of the research background.

Determining rates for voiceover performances

Determining an appropriate rate for voiceover services can be complex, especially when creating voiceover for broadcast. Here is a posting of SAG-AFTRA union voiceover rates for the Los Angeles area that illustrates some of the complexity.

On the internet,  rates are all over the place, creating disruption in the market and lots of anger from traditional voiceover artists. has a rate sheet posted that covers a range of voice end applications, both broadcast and non-broadcast.  Voiceover artist Todd Shick has a nice description of some voiceover rate issues.

Determining an appropriate rate for your projects (or for your voiceover work) should include the following considerations:

  • What is the final use of the performance files?  Broadcast is a higher quality, more demanding performance that has a traditional pricing model that is attractive to professional VO artists.Non-broadcast, less demanding applications, such as elearning or industrials, can obtain VO services from a range of semi-pro online voiceover performers, so the rates are much less.  The audiences and budgets are much smaller as well, so that makes sense.Since these are separate types of products with different requirements, pricing varies accordingly.


  • What is the level of quality that you are required to provide? Smaller home studios should not charge as much as performers providing professional level studio recordings. So a greater investment in professional level sound should allow you to charge higher rates.


  • What is your level of skill in performing the voiceover?  This is the key issue when determining rates. You are selling your performance, not your equipment (though a pro level performance is diminished in value when recorded on a substandard system).  Voiceovers are like other types of professional performances. You can immediately here it when you review the final product. The characteristics include consistency and control, an ability to create a performance that matches the specific characteristics needed for the final product, etc.Comparing VO with music, there are bar bands,  classically trained quartets, union studio musicians, and  touring name acts. Each provides a different level and type of service, and each gets paid according to a different calculation that can include name recognition, size of following, type of venue, etc.


  • Are you starting out, or have you built a name that draws in clients? Building your name and reputation takes time, but will eventually increase your earning potential. If you are new, without a established track record, (using the music analogy) you will not be able to charge the same as a touring name act…


Creating Audio-based media programs for learning

Audio-based learning programs

A brief scan of the radio dial illustrates the power of audio as a communication media — news and talk programs, National Public Radio magazine style programs, commercials, etc.  The popularity of podcasts and audio books suggests that workers are looking for productive ways to use their commuting time.

Learning professionals can leverage this low cost media option as an innovative method for transmitting information to their organization’s staff.  Audio is relatively simple to produce, and can be distributed via podcasts, CDs, MP3s posted on a website, etc.  Audio programs can be easily and conveniently reviewed by busy employees, hard to reach road warriors, long-term on-site consultants, and everyone else on your staff.  Almost any category of employee can benefit from this effective and compelling training medium.

Audio program content examples

Audio-based learning programs can be composed of any combination of the following elements:

  • Moderated panel discussions can be arranged quickly, and require minimal up-front instructional design labor. Just identify the discussion topics, brief the panelists on the objectives of the discussions, schedule a room, and bring the recording equipment. Also consider including an audience that can prompt the panel with questions and issues. This often helps increase the discussion’s authenticity and relevance.  After the discussion is recorded, the ID can use low cost audio editing software to edit the content of the discussion, to improve the content flow and delete dead space and errors.
  • Individual interviews are also easy to design and arrange.  Identify the topics, select an appropriate subject matter expert, and record his/her comments.  Then edit to produce a tight and compelling presentation.
  • Narrated, scripted segments can be used to deliver detailed information on specific topics, explain complicated issues, or describe a current situation. These types of scripted segments are best delivered by a professional narrator or staff member with a professional sounding voice, or by someone within your organization that can add special credibility to the message.
  • Scenarios can be used to model appropriate behaviors. For example, you can provide the audience with a series of scenarios depicting sales or support interactions with typical customers. These can be actual recordings, or simulated performances using your staff or voice actors to play the roles.
  • Telephone conferences can be included when historically important or especially suitable for the topic, however, the lower quality audio of a typical telephone conference may make these elements less desirable.
  • Use scripted narration to deliver introductions and tie together segments, and music to help identify transitions.
  • Commercials for your products and services can provide breaks in the narrative. Mock commercials can provide opportunities for humor to make the program more entertaining.
  • Humor and fun segments can also be interspersed to increase the entertainment value and lighten the mood.

When creating audio programs for your organization, always keep your audience in mind, and strive to create an entertaining and informative program that will capture and maintain their interest.

Adding up voiceover duration

I often need to add up the duration of voiceover files that I have created for a client, or using audio to determine the overall duration of a presentation.  A quick method for adding up the total audio time within a program can be to select all the audio files within a folder.  At the bottom of the folder, the total run time duration of the audio files should appear, along with the total size of the selected files.   Note that you may have to select “more details” to display the detailed rundown on the folder contents (that include total duration.

While this will not necessarily represent the total seat time of your presentation or elearning course, since you will likely need to include time for the user to read text screens, it will give you an idea of the duration of the voiceover components.

Creating narrated software demonstrations in PowerPoint

I often prefer using PowerPoint to create narrated software demonstrations, instead of some of the real time screen capture utilities, like +Captivate.   Here is my general process:

1. Determine the objectives and focus of the demonstration. This is the scripting phase, where you should be able to refine your message and create an outline or script that will guide your overall development process.

2. Open the Software, as well as PowerPoint.

3. Go through each step of the process that you want to demonstrate.  At each step, press ALT/PrtScrn to capture an image of the current window.

4. Switch to the PowerPoint document, and paste the image within the document to a separate page, in the sequence that you intend for your presentation.  Also enter some text describing the action, or other notes to yourself, that you will later use as a basis for scripting narration and any on-screen text messages that support the instruction.  Use this approach to go through the entire software process that you plan to present.

Note that you can simulate specific software functions by ensuring that you grab all steps of a software function display.  For example, grab the initial screen, then grab the screen with the highlighted menu button, grab the screen with the drop down menu visible, and grab the screen with the appropriate drop down menu option highlighted on the drop down menu.   This will allow you to later prompt the user to select all of the options in the procedure (as in “select File, then copy) by creating hyper-linked areas within the PowerPoint display that correspond to the appropriate software option screen location.

5. If you plan to do a presentation that shows the software screens as full screen, then resize as necessary to full the presentation screen.  If you want to use a less than full screen size (for example, when you are presenting the content within a standard elearning interface with banner and navigation buttons), then you may need to place a box in the background of the master screen to serve as a guide for sizing and placement (to ensure consistent sizing and location across all of the screens).   You can always delete this box from your master slide after placement of the software screens, if necessary.  Note that if you also want to trim some elements of the software screen, such as browser menus, do that for each screen prior to sizing and placement.

6. Go through each screen, and finalize any additional on-screen text, highlights, or other elements, and finalize your narration script.   Also, if you are simulating the software options, create a transparent hyperlink box that routes users to the next screen, and place this box over the position that you want the users to click on.   As users interact with the end product, it will appear like they are using a live version of the software, since the screen display will progress as they click on the appropriate on-screen options.

7. Record and incorporate your narration, finalize animations, etc.  Then package using a flash conversion utility such as Articulate.

The look of the end product can be as basic as you want, or can be made to look like standard elearning (by including a banner and navigation), and can take advantage of the animation and graphic features of PowerPoint.

Voiceover narration rates

The typical pricing approach used by many professional voiceover narrators seems to be stuck in the past – based on the radio/TV broadcast model that required expensive studio time and union performers.  Non-broadcast clients expect a lower cost approach to creating voiceover for small audience presentations, such as elearning.

Hourly rates based on how much time it takes in a studio to complete a project are obsolete in the era of low cost PC-based recording systems.  Going into a “studio” for recording voiceover is no longer necessary for the majority of projects, when you can record great quality files using a PC-based recording system in an office-based dedicated recording environment (with some attention to soundproofing, etc.).

Having to submit a script to get a rate quote also seems to be overly cumbersome and opaque for non-broadcast elearning, online projects, or corporate videos.

Voiceover artists should be able to post rates in a transparent manner –  rates based on duration, pages, or words – like other types of media production services.   Producers need this information – judgments should be based on price, quality and level of service.   And the voiceover artist can always adjust their quoted typical rate if an initial review of the script indicates that the project is overly complex, or requires additional time to complete.  And you can use a traditional pricing model for broadcast.

I know that the old school narrators deride this approach as a race to the bottom in rates (which is true, but unfortunately seems to be  inevitable), but this approach is likely going to be the future for this and many other media production services.  And there will still be producers and projects that require a higher level of quality that will support a traditional approach and more traditional rates.

Elearning voiceover narration vs. traditional media voiceover

Creating voiceover narration for elearning programs requires an understanding of the specific characteristics of the user’s experience of the program.

Radio and television are passive media – users experience of “watching” these media has certain characteristics.  They passively receive the media, usually for the purpose of entertainment.  And they expect either a natural conversational voice, or a traditional “announcer” voice.

eLearning is an active experience.  Users make conscious selections and interactions with the programs and content, follow topics, actively read and evaluate information with the specific purpose of integrating it with their existing knowledge, and are prompted to answer questions or perform other interactive tasks.

You can likely expect elearning users to be more focused, and therefore a bit less patient, especially with narration that reads long blocks of on-screen text.  It is therefore often desirable to create voiceover narration with a  slightly  quicker pacing than other traditional media, and keeping it as lean as possible.

Also keep in mind that these segments will be often played back through lower quality computer speakers, or laptop speakers with less than optimal intelligibility.  And many users may speak other languages as their primary language. This suggests that along with the quicker pacing, the narrator should strive to place additional emphasis on clearly pronouncing the narrative, using slightly higher registers when possible, and/or boosting EQ to increase the clarity of the enunciation.  It may sound a bit less “natural” and conversational, but the audience will be better able to understand the narrative content…

Using portable handheld audio recorders for voiceover

SonyRecorderI picked up a Sony digital handheld stereo field recorder to play around with.  This category of portable digital recorders are primarily made for location recording of  music performances and ambient sound (I use it for both of these purposes), by journalists for conducting interviews, etc.

After playing around with this great recorder, I have found that it can function pretty well as a recorder for capturing voiceover.  The Sony (there are a number of comparable brands available as well) is battery operated, has 2 condenser mics, and can record 24-bit/96kHz stereo WAV files on an internal 4 gig flash drive.  The recorders usually interface to a PC via a USB cable allowing fast transfer of files to a PC for editing.

These types of recorders can be an excellent option for use within a corporate production environment, allowing the recording system to be moved to the voiceover narrator (or a more quiet location) as needed.   They set up quickly (get a small photo tripod for this purpose – 8 inch is fine), and have a headphone jack that allows for monitoring.  The Sony has a fur-style windscreen that minimizes breath pop.   And the quality of the microphones is great.

Consider this approach as a low-cost option to building a standard recording setup.

How much text should you narrate per screen?

When scripting a elearning program, it can be challenging to determine how much to narrate on a specific screen.  Do you read the entire screen text content, allow the user to read the text, or land somewhere in-between?

I strive to keep the narration segments brief  – 30 seconds or less, if possible.  You can go longer if there is something meaningful happening on the screen appearing in sync with the narrative, but opting for a shorter duration screen is usually best.

Some options for addressing narrative associated with longer text blocks include:

  • If a screen has a large amount of text, consider displaying text for the user to read, but play a tone to let the user know that their audio is working.
  • Read a prompt that says something like “Take a moment to learn more about…”.
  • Read the first sentence or two of a large block of text, bolded, in sync with voiceover, then display the remainder of the text block, along with a verbal prompt to “take a moment to review these issues…” or something similar.
  • Try breaking up large text blocks into separate displays accessed via rollovers or pop-ups.  Then you can limit the narration to introducing the interaction.

Whatever approach you take, it is important to always have something happening on the screen during the narration – either text blocks appearing, or visuals appearing in sync with the narrative description.  Reading block after block of static text  verbatim usually is a attention-killer…

Listen to your room

One of the key elements that will affect the final quality of a narration recording is the sound of the room in which the recording is created. Many program developers record in an office that has heating, air conditioning and ventilation (HVAC) noises, computer fans, street  noises, or excessive reverberation.

While these problems can be masked to some degree using processing tools, the optimal solution to select or create a room that provides a suitable environment for recording audio.  The characteristics to address include:

  • HVAC, computer fan and other noises – these are always present, and can be difficult to address.  If possible, select a room without a heating vent, and move your PC into a separate room (using long cables for the monitor, mouse and keyboard).  If not possible, then find a location in the area that is least affected by these noises.
  • Experiment with mic placement, and some soundproofing elements.  For computers, get extension cables for your monitor, and wireless keyboards and mice so that you can move the CPU fan as far from the mic as possible.
  • Reverberation – listen to your room to determine the level and characteristics of the reverberation.  Clap your hands sharply, speak loudly, and listen to how long the sound remains.  Many room have excessive reflective surfaces that will be noticeable on recordings.  In some cases, this can be addressed by a combination of microphone placement, and use of sound absorbing materials (either commercial sound absorbing surfaces such a acoustic foams, or ad hoc approaches, such as placing pillows or curtains around the recording area ).
  • Manage noise as much as possible – Be sure to close doors and windows to reduce external noises, shut down nearby PCs that are not in use, shut off overhead lighting that may have a noticeable hum (such as some florescent fixtures), ask nearby coworkers to keep their noise levels low, etc.
  • Move around  – in some cases, moving to a different position within the room can eliminate the major issues.  For example, moving from a corner or wall location to the middle of the room may eliminate some of the more problematic reflections.

I love Noise Gates

When you are speaking, the higher sound levels of your voice will mostly cover up the lower level ambient noise present within the room.  You can hear it if you are looking for it, but in many cases, background noises can be effectively masked by your voice. However, the ongoing ambient noise becomes much more apparent during the pauses between words and sentences.

A noise gate can make a big difference when recording within a less than optimal environments.  tools that cut out the audio signal when the sound level drops below a user-defined level. Noise gates are hardware or software-based audio processing applications that function by inserting silence in the spaces between phrases and during pauses when the audio level drops below a user defined threshold level. Narrators can use gates to reduce the impact of background sounds and noise within their recording space.

Hardware based noise gates are a great addition to any recording setup. Most recording software also includes software-based noise gates that you can apply to a noisy file after recording.  These tools are relatively easy to operate, and can really make a difference in the quality of your final voiceover narration file.

You can also use software-based noise reduction functions that analyse the characteristics of the ongoing noises present within the recording, and apply selective processing to diminish these noises. These noise reduction functions are available as plug-ins or as part of the audio recording and editing software package.

Working with external voiceover / narrator talent

Here are a few tips to facilitate the process of working with external voiceover/narrator providers:

  •  Be sure that your scripts are as final as possible.  Correct any errors.  Keep in mind that the narrator will assume that you want the script read verbatim, and will read it as is (unless the error is very obviously an error).
  • Take the time to read the script out loud, prior to finalizing.  This will help you to identify phrases that do not work, overly long sentences, etc.
  • Spell out any specialized language or acronyms phonetically.  It is usually a great idea to send links to audio files containing the correct pronunciations – you can often locate english pronunciations at that you can include via links within your scripts.
  • As you finalize your script, pay attention to punctuation – it provides critical guidance to the narrator for how to read a specific passage.  For example, commas indicate pauses, use of  quotes or italic indicates titles, phrases or names that require special treatment, and bolding can be used to indicate special emphasis.  Also include performance suggestions that will guide the reading in the desired direction.  “With enthusiasm” can help guide the narrator to punch up segments with a specific emphasis, or use “serious” to indicate that a section should be read with greater gravity, etc.
  • Send your script as a final edit that has all changes accepted and all comments hidden. I have received scripts that included many levels of review comments that were distracting.  I was always a bit worried that the script was not final.

Scripting vs. extemporaneous narrative

Many developers, especially those creating software demonstrations using tools such as Captivate, often create extemporaneous narration as they perform the software demonstration.

Some projects I have had involved using subject matter experts as narrators, who insisted on speaking of the cuff instead of scripting out their comments.   These were usually SMEs who had done stand up lectures of their topics, and were confident that they could save effort by reproducing their classroom narrative as the basis for the narration of a elearning version of the program.

Neither of these approaches are optimal.

Voice content created off the top of the SME’s head often either require extensive editing to produce a focused and efficient narrative, or are later abandoned in favor of creating a script (often based on the recorded narrative, as in “…just create a script on what I recorded, and just tighten it up a bit…”).

I would strongly recommend discouraging your SMEs or instructors to attempt to do an off-the-cuff performance as the basis for the narration for an elearning program (though you may have to convince them via a  test recording session).

Remote monitoring of Narration Recording Session

Should you monitor narration sessions live to ensure that the narration performance has proper emphasis?

Many narrators use a live  link that allows the client to monitor the performance in real-time in order to provide performance directions.  This is a useful tool for directors that have very specific vision of the narration performance.  However, this usually adds to the overall duration and cost of the narration session (and adds to the labor hours of the director /producer). This can be important during high stakes projects, but for smaller level projects is is often not necessary.

In most cases for non-broadcast applications such as elearning,  remote narrators given proper performance instructions within the script can produce an acceptable performance without live direction.  Personally, I find that I can concentrate more on the performance if I do not have live monitoring.

Recording process tips

I often get tasked to record long scripts with many short segments.  I have found that it is usually better to record the entire script, instead of recording individual segments.  This allows my to concentrate on the performance, instead of having to change focus by stopping the recording and saving a smaller file.

When you do a reading in this manner, feel free to re-read portions throughout the script as necessary to change emphasis, correct problems, identify the best phrasing, etc.  Just continue the recording, with your alternative takes of various segments.  You can then edit the recording of the performance later, and cut out the problems.

When recording large portions of a script with multiple segments, when I reach the next segment I pause for 2 seconds, say “Next”, then pause for another  2 seconds prior to starting the subsequent segment.  This short statement is easy to identify visually within the editing display.

I also apply general processing to the entire file, prior to editing.  This would typically include applying a noise gate and EQ to the entire file.  This helps ensure that all of the segments get the same processing treatment, and will sound similar once edited into individual files.

Once I have done the edit for content, I have a master file composed of the individual segments separated by the word “next”, in a manner that I can visually identify.  I then cut out each segment, paste into a new file, and save the new files as the individual narration segment.