Monitoring and Isolation

The vast majority of producers, engineers, and/or “editors” working with typical spoken word Podcast audio are not using calibrated reference monitors in quiet work spaces with optimized acoustics.

That said, there’s some chatter out there referring to producing and mixing Podcasts solely through fancy near field monitors.

Consider this: what about efficiently dealing with inherent audio clip attributes that require isolation as well as the subjective processing tasks/optimizations typically applied at the pre-mixing stage?

Without proper isolation – it would be difficult to:

(A) Establish audible awareness of (low-level) noise floor nuances

(B) Accurately capture noise profiles

(C) Evaluate S/N

(D) Execute intricate/seamless dialogue edits

(E) Recognize and eliminate subtle mouth noises

(F) Optimize breaths

(G) Replicate typical consumption methods and environments

In my view the sole use of near field monitors for Podcast post production is not your best option. Closed back headphones OTOH are paramount. They are absolutely vital for this type of audio post throughout various stages of your workflow.

Note it is certainly fine to check and/or monitor a MIX through (various types of) near field monitors *after* all of the above variables have been addressed.

And don’t forget to maintain awareness of typical consumption methods and devices, such as laptop speakers, trendy headphones, smart phones, earbuds, and vehicles.

-paul.

Optimizing Dialogue Levels

I was just reading Chris Curran’s Daily Goody segment, published today. The piece is titled Balancing the Levels of All Voices. Chris explains the importance of consistent dialogue levels across multiple participants, and shares various methods to achieve this.

Chris states in his second tip:

>>> “Another way to quickly balance the levels of various participants is to process each participants track to be the same LUFS level. This will make them close to level, but you will always want to adjust the levels slightly using your ears. Because even when the LUFS level of two different voices is the same, the perceived loudness of each voice can differ due to things like proximity to the mic, dynamic range, frequency response of the mic, the timbre of individual voices, etc. So it’s a handy practice to set the LUFS level of each participant to the same value, but then you still have to use your ears.” <<<

Good advise IMHO. Here’s my perspective …

The term LUFS Level is a generalization. It requires clarification.

There are 3 notable measurement descriptors that indicate perceptual Loudness in LUFS/LKFS (or LU’s when using a relative scale):

• Integrated Loudness (also referred to as Program Loudness)

• Short Term Loudness

• Momentary Loudness

Their distinguishing attributes are distinct time and/or averaging intervals: Integrated (cumulative measurement from start to finish), Short Term (3 sec.), and Momentary (400ms). It’s important to recognize the significance of each descriptor.

As well, (and Chris alludes to this in his piece) – you must recognize how a consistent Integrated Loudness measurement across multiple spoken word segments (or session participants) does not necessarily guarantee suitable matched level perception and/or optimized intelligibility.

Remember – Integrated Loudness represents a cumulative measurement from start to finish. For 100% accuracy – the piece must be measured in it’s entirety. Also, the descriptor does not reflect inherent dynamic attributes and/or inconsistencies that my in turn marginalize attempts to optimize perception.

With this in mind, if you choose to use Integrated Loudness as a perceptual Loudness matching indicator – audio optimization (compression, etc.) and target accuracy must be applied and established before relying on any common Integrated Loudness measurement.

What about Short Term/Momentary Loudness?

The 3 sec. averaging interval of the Short Term Loudness descriptor indicates an active, foreground measurement. It is highly useful when analyzing the loudness consistency of spoken word/dialogue. Momentary Loudness will provide even finer “detail” – once again due to it’s inherent averaging interval (400ms).

To summerize: “LUFS Level” is a generalization. As noted there are 3 descriptors (Integrated, Short Term, Momentary). Short Term and Momentary Loudness are useful indicators for the establishment of spoken word consistency. Learn how to use a Loudness Meter (online or offline) to closely monitor each descriptor.

With regards to Loudness Normalization – some processing tools such as RX Loudness Control by iZotope (AAX/Pro Tools only) support user defined Short Term and Momentary Loudness targeting within a certain tolerance range.

These options, along with the ubiquitous Integrated Loudness definition (and of course subjective audio processing) should provide everything you need in your quest to achieve optimized dialogue.

-paul.

Quantifying Podcast Audio Dynamics

I’ve discussed the reasons why there is a need for revised (optimized) Loudness Standards for Internet and Mobile audio distribution. Problematic (noisy) consumption environments and possible device gain deficiencies justify an elevated Integrated Loudness target. Highly dynamic audio complicates matters further.

In essence audio for the Internet/Mobile platform must be perceptually louder on average compared to audio targeted for Broadcast. The audio must also exhibit carefully constrained dynamics in order to maintain optimized intelligibility.

The recommended Integrated Loudness targets for Internet and Mobile audio are -16.0 LUFS for stereo files and -19.0 LUFS for mono. They are perceptually equal.

In terms of Dynamics, I’ve expressed my opinion regarding compression. In my view spoken word audio intelligibility will be improved after careful Dynamic Range Compression is applied. Note that I do not advocate aggressive compression that may result in excessive loudness and possible quality degradation. The process is a subjective art. It takes practice with accessibility to well designed tools along with a full understanding of all settings.

Dynamic-480

I thought I would discuss various aspects of Podcast audio Dynamics. Mainly, the potential problematic significance of wide Dynamics and how to quantify aspects as such using various descriptors and measurement tools. I will also discuss the benefits of Dynamic Range management as a precursor to Loudness Normalization. Lastly I will disclose recommended benchmarks that are certainly not requirements. Feel free to draw your own conclusions and target what works best for you.

Highly Dynamic Audio in Noisy Environments

At it’s core extended or Wide Dynamic Range describes notable disparities between high and low level passages throughout a piece of audio. When this is prevalent in a spoken word segment, intelligibility will be compromised – especially in situations where the listening environment is less than ideal.

For example if you are traveling below Manhattan on a noisy subway, and a Podcast talent’s delivery is inconsistent, you may need to make realtime playback volume adjustments to compensate for any inconsistent high and low level passages.

As well – if the Integrated Loudness is below what is recommended, the listening device may be incapable of applying sufficient gain. Dynamic Range Compression will reestablish intelligibility.

From a post perspective – carefully constrained dynamics will provide additional headroom. This will optimize audio for further down stream processing and ultimately efficient Loudness Normalization.

Dynamic Range Compression and Loudness Normalization

I would say in most cases successful Loudness Normalization for Broadcast compliance requires nothing more than a simple subtractive gain offset. For example if your mastered piece checks in at -20.0 LUFS (stereo), and you are targeting R128 (-23.0 LUFS Integrated), subtracting -3 LU of gain will most likely result in compliant audio. By doing so the original dynamic attributes of the piece will be retained.

Things get a bit more complicated when your Integrated Loudness target is higher than the measured source. For example a mastered -20.0 LUFS piece will require additional gain to meet a -16.0 LUFS target. In this case you may need to apply a significant amount of limiting to prevent the Maximum True Peak from exceeding your target. In essence without safeguards, added gain may result in clipping. The key is to avoid excessive limiting if at all possible.

How do we optimize audio before a gain offset is applied?

I recommend applying a moderate to low amount of (global) final stage Dynamic Range Compression before Loudness Normalization. When processing highly dynamic audio this final stage compression will prevent instances of excessive limiting. The amount of compression is of course subjective. Often a mere 1-2 dB of gain reduction will be sufficient. Effectiveness will always depend on the attributes of the mastered source audio before L.Normalizing.

I carefully manage spoken word dynamics throughout client project workflows. I simply maintain sufficient headroom prior to Loudness Normalization. In most cases I am able to meet the intended Integrated Loudness and Maximum True Peak targets (without limiting) by simply adding gain.

RX Loudness Control

By design iZotope’s RX Loudness Control also applies compression in certain instances of Loudness Normalization. I suggest you read through the manual. It is packed with information regarding audio loudness processing and Loudness Normalization.

RX-LC_site

iZotope states the following:

“For many mixes, dynamics are not affected at all . This is because only a fixed gain is required to meet the spec . However, if your mix is too dynamic or has significant transients, compression and/or limiting are required to meet Short-term/Momentary or True Peak parts of the spec.”

“RX Loudness Control uses compression in a way that preserves the quality of your audio . When needed, a compressor dynamically adjusts your audio to ensure you get the 
best sound while remaining compliant . For loudness standards that require Short-term 
or Momentary compliance, the compressor is engaged automatically when loudness exceeds the specified target.”

It’s a highly recommended tool that simplifies offline processing in Pro Tools. Many of it’s features hook into Adobe’s Premiere Pro and Media Encoder.

LRA, PLR, and Measurement Tools

So how do we quantify spoken word audio dynamics? Most modern Loudness Meters are capable of calculating and displaying what is referred to as the Loudness Range (LRA). This particular descriptor is displayed in Loudness Units (LU’s). Loudness Range quantifies the differences in loudness measurements over time. This statistical perspective can help operators decide whether Dynamic Range Compression may be necessary for optimum intelligibility on a particular platform. (Note in order to prevent a skewed measurement due to various factors – the LRA algorithm incorporates relative and absolute threshold gating. For more information: refer to EBU Tech doc 3342).

I will say before I came across sort of rule of thumb (recommended) guidelines for Internet and Mobile audio distribution, the LRA in the majority of the work that I’ve produced over the years hovered around 3-5 LU. In the highly regarded article Audio for Mobile TV, iPad and iPod, the author and leading expert Thomas Lund of TC Electronic suggests an LRA not much higher than 8 LU for optimal Pod Listening. Basically higher LRA readings suggest inconsistent dynamics which in turn may not be suitable for Mobile platform distribution.

Some Loudness Meters also display the PLR descriptor, or Peak to Loudness Ratio. This correlates with headroom and dynamic range. It is the difference between the Program (average) Loudness and maximum amplitude. Assuming a piece of audio has been Loudness normalized to -16.0 LUFS along with an awareness of a True Peak Maximum somewhere around -1.0 dBTP, it is easy to recognize the general sweet spot for the Mobile platform ->> (e.g. a PLR reasonably less than 16 for stereo).

Note that heavily compressed or aggressively limited (loud) audio will exhibit very low PLR readings. For example if the measured Integrated Loudness of a particular program is -10.0 LUFS with a Maximum True Peak of -1.0 dBTP, the reduced PLR (9) clearly indicates aggressive processing resulting in elevated perceptual loudness. This should be avoided.

If you are targeting -16.0 LUFS (Integrated), and your True Peak Maximum is somewhere between -1.0 and -3.0 dBTP, your PLR is well within the recommended range.

In Conclusion

An optimal LRA is vital for Podcast/Spoken Word distribution. Use it to gauge delivery consistency, dynamics, and whether further optimization may be necessary. At this point in time I suggest adhering to an LRA < 7 LU for spoken word.

LRA Measurements may be performed in real time using a compliant Loudness Meter such as Nugen Audio’s VisLM 2, TC Electronic’s LM2n Loudness Radar, and iZotope’s Insight (also check out the Youlean Loudness Meter). Some meters are capable of performing offline measurements in supported DAWs. There are a number of stand alone third party measurement options available as well, such as iZotope’s RX7 Advanced Audio Editor, Auphonic Leveler, FFmpeg, and r128x.

-paul.

***Please note I personally paid for my RX Loudness Control license and I have no formal affiliation with iZotope.