Podcast Post Production: Gain, Limiting, and Ramifications

When considering best practice audio compliance guidelines for internet distribution [mobile, podcast, streaming, etc.] … processed intermediates may require significant up side gain adjustments in preparation for distribution encoding. 

If you elect to apply offline Loudness Normalization, the process is simply (after measurement) a linear gain offset + limiting if necessary.

There are several [what I refer to as] pre-gain offset issues that all producers and engineers must be aware of …

Noise Floor

This is rudimentary: adding gain will boost the audibility of preexisting broadband noise and possibly degrade audio fidelity.

A specific example where applied gain over noise is problematic ➝ post downward expansion. 

Are you under the assumption that applying downward expansion is a form of broadband noise reduction? It isn’t. The process does not remove persistent audible noise. When talent is actively speaking and the audio level is above a predefined threshold – residual noise will be audible as well. 

If you think talent transitions from silent inactive speech passages to active passages with audible noise sound bad – imagine how this will translate after adding significant gain. Terrible. 

In essence you must do whatever you can to mask [or attenuate] your noise floor prior to downstream processing. Just be careful. Heavy noise reduction will certainly introduce artifacts. Of course best case is to circumvent noise at it’s origin.

Breath Levels

Adding significant gain elevates breath amplitude. Pre and/or post gain optimization is paramount.

IMO in order to preserve natural human speech characteristics breath retention is vital. ‘Ever experience speech passages with all breaths cut and subsequent ripple edits applied? It sounds robotic and horrible. Yet there are cowboys out there that preach the technique suggesting “listeners do not want to hear breaths.” Questionable perspective in my book.

I do agree that breaths elevated in level or those exhibiting snap syndrome* may be bothersome to listeners.

There are several tools available that attempt to sense and attenuate breaths. As far as I am concerned the only way to properly optimize persistent breaths is manually, instance by instance. Sure it’s time consuming. So saddle up.

Limiting Considerations

In order to adhere to a subjective spec. or a best practice imposed [true peak] ceiling – producers must assess how added gain will impact potential limiting requirements.

Do not ignore this fact: integrated loudness ‘targets’ for internet, mobile, and podcast distribution differ from broadcast specification description. You must produce audio intermediates (or mixes) that are prepped, optimized, and capable to sustain additional gain without compromising fidelity and/or nulling adherence to best practice or subjective specs. 

Whether you are driving your mix into a limiter ceiling (I don’t necessarily recommend this) or you are applying offline loudness normalization – limiting (in most cases) will be necessary.

The key of course is proper intermediate optimization before the audio is bumped up (or limited) at the final stage. In fact the goal is to avoid excessive limiting! Encoding heavily limited audio into a lossy codec is a work place hazard. Not a good idea.

What to do? 

Assess the attributes of your audio prior to final stage processing. Be aware of available headroom. Gauge the amount of limiting that will be necessary to meet your compliance goals. If required limiting is excessive and there is even the slightest indication of audible distortion – revert and make changes.

A technique that I frequently discuss and implement for speech/podcasts is what’s referred to as “glueing.” It is achieved by using a bus modeled compressor to tame dynamic transients and/or instances of inconsistent amplitude. If a final required gain offset is significant – a moderate amount of applied bus compression before loudness normalization will help alleviate the necessity for excessive limiting. 

Example: processed/edited stereo intermediate checks in at -20.98 LUFS. Speech dynamics are far from optimized. Intent is a -16 LUFS deliverable, -2.0 dBTP ceiling. 

I’m using the new version of Elixir by FLUX:: Immersive. You can insert it in a DAW session or apply offline. 

First I take the intermediate down to -24 LUFS: so input gain is set to -3.02 dB. Next the limiter threshold is set to -10 dB and the output gain is set to +8 dB. The integrated target and ceiling now comply. However notice the intermittent limiter gain reduction. 

In this scenario I would contemplate re-mastering the intermediate and attempt to tighten things up. Maybe check for asymmetry. Apply bus compression and zone in on RT Short Term loudness description [3 sec. averaging window]. Tweak as necessary.

Anyway … when teaching or discussing how to create and supply high quality deliverables, we must not forget the foundational aspects of speech based audio production. Basic gain manipulation and associated ramifications are vital aspects of professional podcast production. So dig in.

-paul. 

* What is “snap syndrome?” I’m fairly certain I coined the phrase. The anomaly occurs when talent breaths (when they inhale) exhibit a sudden snapping sound for whatever reason. I zone in and remove all audible instances. 

Significance Of Target Based Audio Processing For Independent Podcasters

Are you producing a Podcast and hosting it on your website? Your website is essentially a proprietary distribution platform. Sound familiar? Maybe similar in concept to a broadcast network?

Regarding vague perspectives in relation to whether the “Target Loudness” post production mindset is relevant, or not … hear me out.

Broadcast networks specify audio submission Integrated Loudness targets which include tolerance margins. If an audio submission does not meet the specified requirement(s) – the work is rejected. 

In essence networks expect the submitter to properly (let’s say) manipulate prepared works in order to meet requirements prior to submission. 

Conversely most music streaming services handle this so called manipulation internally using proprietary methods. They apply perceptual loudness manipulation across submissions in order to establish playback consistency. 

For example if -14.0 LUFS is the recognized distribution Integrated Loudness for an arbitrary music streaming service and your mastered music submission checks in at -10 LUFS … the service will subtract 4 LU of gain. 

Note if the above scenario is reversed I’m not entirely sure if adding gain is now commonplace. I’ve heard this practice is not widespread. However I do believe select streaming services add gain (and possibly limiting) if necessary. 

BTW Loudness Normalization in concept is nothing more than adding/subtracting gain in order to meet a specified target. If added gain causes spec. defined True Peak overshoots – limiting may be applied. 

Music Submissions

Many music mastering engineers recommend producers simply ignore the loudness target concept. They widely suggest mastering for optimum fidelity and present streaming services with a well produced product that may be efficiently manipulated according to the service’s requirements. All good.

Podcasts

I don’t have access to valid data specifying whether ubiquitous streaming services currently manipulate spoken word Podcasts using the same methods applied to music submissions. I’ll look into it.

* * *

Back to hosting your Podcast on your personal website, or in essence – your platform …

Efficient website accessibility for your Podcast is an essential requirement. My guess is your implemented site player does not manipulate the attributes of your embedded files in order to standardize distribution Integrated Loudness across your hosted catalogue. And I doubt independent producers at large hire coders to build server side audio processing engines to establish what I previously described. 

Remember, you –  the site owner, producer, whatever – bear the responsibility to serve your listeners with let’s call it optimized audio that is perceptually consistent across all of your hosted programs. Your target may be subjective or it may adhere to published best practices. Again, all good.

Point is –  without a recognized Integrated Loudness target including acceptable tolerance margins (and an ultimate True Peak ceiling) – any standardization concept would be near impossible to efficiently implement.

How to Do It

You can certainly attempt to “mix” your programs in RT using a loudness meter thus adhering to various descriptors. However final stage off-line target processing is much more efficient. 

Of course the quality of your intermediate and/or pre-master prior to Loudness Normalization will dictate final fidelity and speech intelligibility of the processed output.

Bottom Line

Let’s not marginalize the significance of target based audio processing and Loudness Normalization with full True Peak compliance. The general concept works for proprietary broadcast platforms and it is certainly applicable for your personal website where you host your spoken word Podcast.

-paul.

Reworking a Master

It’s been a while since I last posted. The last few years have been difficult. I was compelled to tend to my Dad Sonny. He passed away on April 26. I’m obviously heartbroken. However he would have wanted me to move forward and continue to share insight …

I recently listened to a program consisting of a group of “Podcast Editors.” Group members are also business owners providing podcast production services for a wide range of clients. 

I believe a few (or all?) group members were trained by Chris Curran who runs Podcast Engineering School

The business owners disclosed they are often in a position (for various reasons) to outsource work. All good. However I thought to myself if I were to consider outsourcing work – what method(s) or criteria would I implement to assess applicant talent and/or proficiency?

A top level requirement would be obvious: the ability to effectively edit speech/dialogue and optimize intelligibility. DSP audio processing proficiency (and tool accessibility) would be prerequisites as well.

Let’s have a look at a specific “test” scenario that I might propose for applicant assessment:

A new self producing client’s podcast has been accepted by an imaginary powerhouse spoken word audio network. The network has strict audio submission compliance requirements. WAV files are to be submitted. The network will create the lossy distribution copies. 

The client seeks assistance conforming the following self produced program as measured:

Attributes: -18 LUFS [stereo], -0.8 dBTP. LRA: 3 LU

*** In my opinion the example above visually indicates careless mastering. The narrow headroom may pose difficulties if added gain is ever necessary. It doesn’t sound inherently bad. However it’s not properly optimized for submission to our imaginary network.

Client/Network Compliance Requirements: 

-16 LUFS stereo (tolerence: +/- 1 LU). Ceiling: -2.0 dBTB. LRA < 6 LU

Client’s Specific Instructions:

• Integrated Loudness target compliance for network (source audio needs to be bumped up)

• Compliant True Peak ceiling and prevention of excessive limiting and/or induced distortion 

• Avoidance of breath elevation and noise due to gain offset requirements

• Retention of reference fidelity

For the record I’m not going to disclose the source of this audio. There are many similar examples out there.

Here’s is my re-produced output as measured:

-16 LUFS, -2.2 dBTP. LRA: 2.9 LU

in the zoomed selection below notice there is no visual indication of an elevated noise floor. Also overly aggressive dynamics processing/limiting has been avoided. Fidelity is excellent.

Final Thoughts

From a general perspective the client’s original source audio is suitable for a typical podcast regardless of the visual attributes of the waveforms. However that assessment is not the purpose of this article. The question is – are you capable? Do you think you can pass my proposed test? Can you satisfy our imaginary client?

Be aware there’s allot more to this than you may assume. Compression is only one aspect of my optimization process (of course we can discuss). Also note this remastering scenario does not include accessibility to discrete mix stage audio assets.

* * *

When you add definitive compliance requirements to any workflow the level of complexity elevates. This is especially true in situations where you as an engineer may be be called upon to “fix” audio masters that my not be suitable or properly optimized for downstream program preparation and distribution.

-paul.

Podcast Dynamics: Loudness Range vs. PSR/PLR

I’ve heard a few savvy people refer to the LRA (Loudness Range) descriptor as inherent Dynamic Range. This reference is for the most part inaccurate.

LRA is a threshold gated statistical representation of measured Loudness or variations as such over time. Incorporated Absolute and Relative gating prevents potentially skewed measurements that may result when the passing audio includes sudden instances of impactful amplitude (e.g. gun shots, explosions, etc.) and/or extended periods of silence.

Correlation certainly exists between inherent LRA and Dynamics. In fact – in order to optimize audio for a particular delivery platform, an accurately measured LRA may indicate whether further dynamics manipulation across a segment of audio may be necessary .

PSR and PLR

It is commonplace to acknowledge PSR (Peak to Short Term Loudness Ratio) and PLR (Peak to Loudness Ratio) as accurate indicators of audio dynamics.

PSR is the differential between the measured (ungated) Short Term Loudness and the max. True Peak ceiling. The duration of the averaging window (3 sec.) and the resulting Short Term Loudness measurement relative to the maximum True Peak reflects a near real time representation of playback audio dynamics. High relative PSR values suggest wide dynamics. Conversely low relative PSR values suggest reduced dynamics, excessive limiting, and elevated perceived loudness.

E.g. RT Short Term Loudness: -12 LUFS. Max True Peak -2.0. PSR = 10. As the Short Term Loudness elevates, the differential between it and the True Peak max. decreases thus indicating reduced RT dynamics.

PLR is the differential between measured (gated) Integrated Loudness of (in most cases) an entire audio segment from start to stop and the max True Peak ceiling. In essence PLR represents a long term gated average of inherent dynamics over time.

E.g. measured Integrated Loudness: -16 LUFS. Max True Peak -2.0. PLR = 14. In comparison – if the measured Integrated Loudness checked in at -12 LUFS, the PLR would shift to 10, thus indicating reduced global dynamics and elevated loudness.

LRA vs. Dynamic Range

As far as this vague reference to LRA indicating Dynamic range – consider the following:

A hypothetical mastered (spoken word) Podcast checks in at -16 LUFS with a -2 dBTP max. The inherent PSR (measured in RT using a Loudness Meter) = approx. 10. The PLR = 14, and the measured LRA = 4 LU.

In this example – it is obvious the LRA (4 LU) does not reflect the theoretical dynamic range of the piece. In fact the PSR is the suitable indicator of RT audio dynamics. The PLR represents the global dynamics over the entire duration of the audio segment.

In Conclusion

The LRA descriptor is an algorithmic calculation incorporating gated thresholds. It does not indicate the measured Dynamic Range of a piece of audio. However it is certainly a viable indicator representing the statistical variation of measured Loudness over time.

An elevated spoken word LRA (> 7 LU) may indicate compromised intelligibility, and as noted – the necessity for further DSP processing and re-mastering.

For RT measurement of inherent audio dynamics, use a supported tool to display the running PSR and PLR values. There are various third party options available, such as Dynameter by MeterPlugs, MasterCheck by Nugen Audio, and the Youlean Loudness Meter.

For a (stereo) -16.0 LUFS spoken word Podcast – PLR 15/14 is optimal. Corresponding PSR values will vary based on the attributes of applied dynamics processing.

Incidentally – if you are producing Podcasts professionally, you need to learn how to use a Loudness Meter. It is an essential tool, providing a broad scope of RT descriptors, such as  Loudness, LRA, Dynamic Range, and True Peak. A number of meters support offline measurements within certain DAW environments.

-paul.

“Loudness Leveling” Denotes a Vague Description of Two Discrete Processes

Scores of audio producers in the Podcast Production space have adopted an inaccurate term when referring to basic Loudness Normalization: Loudness Leveling.

First – what is Loudness Normalization? Actually, it’s quite simple:

Audio is measured in it’s entirety. The existing Integrated (Program) Loudness is determined. A gain offset is applied relative to a spec. based or subjective Integrated Loudness target.

For example: if the source audio measures -20 LUFS, and the Loudness Target is -16 LUFS, +4 LU of gain will be applied.

As well, a True Peak Max. Ceiling is defined, which again may be spec. based or subjective. If the required Integrated Loudness gain offset results in overshoots – limiting is applied in order to maintain compliance.

It’s important to note that Loudness Normalization does not correct wide variations in audio levels. As well – it does not guarantee optimized intelligibility for spoken word. If an audio piece (e.g. multiple participant segment) contains inconsistencies as such, the Loudness Normalization gain offset will simply elevate (or reduce) the relative perceptual loudness of the audio. The original dynamic attributes will persist.

That’s it. There’s nothing more to it unless the Loudness Normalization tool features some sort of dynamics optimization process that may or may not be active.

For the record – the Loudness Module included in iZotope’s RX 7 Advanced Audio Editor applies basic Loudness Normalization (measurement, gain, and limiting). It does not apply optimization processing.

Examples

View this source clip waveform. There are two participants with noticeable level inconsistencies:

This is the same clip Loudness Normalized (to -19.0 LUFS). The perceptual loudness is higher. However the level inconsistencies persist:

Leveling is a process that addresses and corrects noted inconsistencies and level variations. It is accomplished by the use of gain riding plugins and/or specialty tools that rely on complex algorithms. One basic example is the use of an “RMS” Compressor featuring an optimal and often extended release time parameter.

This is a “leveled” version of the original source clip displayed above. The previously persistent level inconsistencies no longer exist.

Finally, this is the leveled audio, Loudness Normalized to -19.0 LUFS. The described processes were in fact discrete.

I hope I’ve made it clear that the term Loudness Leveling is not an accurate term to describe Loudness Normalization. The key is that Loudness Normalization is gain and limiting. It does not correct inconsistent level variations. You’ll need to implement discrete Leveling processes to address any persistent inconsistencies.

-paul.

Optimizing Dialogue Levels

I was just reading Chris Curran’s Daily Goody segment, published today. The piece is titled Balancing the Levels of All Voices. Chris explains the importance of consistent dialogue levels across multiple participants, and shares various methods to achieve this.

Chris states in his second tip:

>>> “Another way to quickly balance the levels of various participants is to process each participants track to be the same LUFS level. This will make them close to level, but you will always want to adjust the levels slightly using your ears. Because even when the LUFS level of two different voices is the same, the perceived loudness of each voice can differ due to things like proximity to the mic, dynamic range, frequency response of the mic, the timbre of individual voices, etc. So it’s a handy practice to set the LUFS level of each participant to the same value, but then you still have to use your ears.” <<<

Good advise IMHO. Here’s my perspective …

The term LUFS Level is a generalization. It requires clarification.

There are 3 notable measurement descriptors that indicate perceptual Loudness in LUFS/LKFS (or LU’s when using a relative scale):

• Integrated Loudness (also referred to as Program Loudness)

• Short Term Loudness

• Momentary Loudness

Their distinguishing attributes are distinct time and/or averaging intervals: Integrated (cumulative measurement from start to finish), Short Term (3 sec.), and Momentary (400ms). It’s important to recognize the significance of each descriptor.

As well, (and Chris alludes to this in his piece) – you must recognize how a consistent Integrated Loudness measurement across multiple spoken word segments (or session participants) does not necessarily guarantee suitable matched level perception and/or optimized intelligibility.

Remember – Integrated Loudness represents a cumulative measurement from start to finish. For 100% accuracy – the piece must be measured in it’s entirety. Also, the descriptor does not reflect inherent dynamic attributes and/or inconsistencies that my in turn marginalize attempts to optimize perception.

With this in mind, if you choose to use Integrated Loudness as a perceptual Loudness matching indicator – audio optimization (compression, etc.) and target accuracy must be applied and established before relying on any common Integrated Loudness measurement.

What about Short Term/Momentary Loudness?

The 3 sec. averaging interval of the Short Term Loudness descriptor indicates an active, foreground measurement. It is highly useful when analyzing the loudness consistency of spoken word/dialogue. Momentary Loudness will provide even finer “detail” – once again due to it’s inherent averaging interval (400ms).

To summerize: “LUFS Level” is a generalization. As noted there are 3 descriptors (Integrated, Short Term, Momentary). Short Term and Momentary Loudness are useful indicators for the establishment of spoken word consistency. Learn how to use a Loudness Meter (online or offline) to closely monitor each descriptor.

With regards to Loudness Normalization – some processing tools such as RX Loudness Control by iZotope (AAX/Pro Tools only) support user defined Short Term and Momentary Loudness targeting within a certain tolerance range.

These options, along with the ubiquitous Integrated Loudness definition (and of course subjective audio processing) should provide everything you need in your quest to achieve optimized dialogue.

-paul.

LevelView by Grimm Audio

LevelView by Grimm Audio is a highly functional and well designed real time Loudness Meter.

Here are the details:

LevelView features a unique multifaceted Rainbow Meter. Clicking the Rainbow display toggles the Meter scale (EBU +9 or EBU +18).

There are three compliance modes: EBU R128, ATSC A/85, and a custom User specification (Gated or Ungated). The Rainbow Meter displays a Relative Scale. Consequentially the defined target will be equivalent to 0 LU.

The upper blue Rainbow arc represents Short Term Loudness measured within a 3 sec. time frame. The inward blue arcs indicate slower time frame variances (10, 30, 90, and 270 seconds).

The arced needle meter located above the Rainbow Meter represents the Momentary Loudness measured within a 400ms time frame.

Visual dots displayed (and held) on both the Momentary and Short Term Loudness indictor plots represent the maximum values for each descriptor. Both indicators will shift to orange when their values exceed recognized guidelines (+8 max M, and +6 Max S).

The numerical descriptor table features a large Integrated Loudness value. This may display an Absolute Scale value in LUFS, or a Relative Scale value in LU’s. Clicking the descriptor text toggles it’s view.

Additional numerical descriptors include maximum Momentary Loudness (max M), maximum Short Term Loudness (max S), LRA (Loudness Range), PLR (Peak to Loudness Ratio), and maximum True Peak (max TP). Clicking the max TP descriptor text will toggle the measurement algorithm and display max TP or max SP (Sample Peak). Descriptors will shift to orange when a displayed value exceeds recognized or specification guidelines.

The graph located at the lower left is the Loudness Range histogram. It displays the distribution of the measured Loudness over time. The data will indicate whether further dynamic range compression may be necessary.

LevelView supports Manual start and stop measurements. Setting the meter to Auto will force it to follow the host DAW’s transport. In essence the meter will automatically start/stop and reset based on the status of the transport.

Link mode records and stores data continuously. This allows the operator to revert back in time and re-measure a passage without resetting the stored measurements. In the event a passage is skipped, a gap warning will appear in orange. Re-measurement of a skipped segment will clear the gap warning. The Stop button resets the memory. Note the LevelView documentation indicates that the host “must provide time code for the Link function to work.”

It is possible to run various connected (Host and Client) instances of LevelView on a network or over the Internet. I will be testing these options in the near future.

LevelView is available as an AU, VST, or AAX Plugin. The AU and VST versions support (5.1) Surround Sound measurement. The meter conforms to the SMPTE/ITU channel matrix standard (L-R-C-LFE-Ls-Rs).

The meter may also run in a stand-alone mode with no DAW dependency. I/O configuration options are provided.

My Assessment:

I like this meter and I appreciate it’s unique design and accuracy. The networking options, support for Surround Sound, and stand-alone capability make it highly flexible and well worth it’s reasonable cost ($70 U.S. at Don’tCrack). I’m happy to recommend it.

Improvements I’d like to see:

– Scaleable UI
– Option to define a custom Maximum True Peak in the User mode (currently it defaults to -1.0 dBTP)

-paul.

Loudness Compliance Summarization

– I continue to endorse -16.0 LUFS for (stereo) Podcast distribution. If meeting this target requires an excessive amount of limiting, a slightly lower target is a viable option. However from my perspective a -20.0 LUFS spoken word piece consumed in a less than ideal environment on a mobile device would be problematic. I’m comfortable supporting upwards of a -2.0 LU deviation from the recommended -16.0 LUFS target (when applicable).

**Note mono files require a -3 LU offset to establish perceptual equivalence to stereo file targets.

– Loudness Range (LRA) is a statistical representation of Loudness distribution and/or the Loudness measurement. An LRA no higher than 8 LU will help optimize intelligibility by restricting dynamics and/or wide variations in Loudness over time.

– Networks and Catalog based program sets managed by indie producers must institute Program Loudness consisctency across all distributed media. This will free listeners from making constant playback volume adjsutments when listening to several programs in succession. Up to 1.0 LU tolerance (+/-) is reasonable. However upside Program Loudness should never exceed -16.0 LUFS.

– Without sufficient headroom – lossy, low bitrate encoding may generate peak levels that exceed a compliance ceiling and/or introduce distortion. -1.5 dBTP is the favored maximum ceiling prior to lossy coding. Of course a lesser value (e.g -2.0 dBTP) is appropriate. However, a peak ceiling below -3.0 dBTP may indicate excessive limiting. This should be avoided.

-paul.

Intelligibility Optimization

The attached image displays a processing workflow designed to optimize Spoken Word intelligibility. The workflow also demonstrates a realtime example of Integrated Loudness compliance targeting.

There are 7 reference point Sections worth noting:

Section A includes the Adobe Audition Effects Rack Signal Level Meters indicating the source (Input) level and the (Output) level. The Output level reflects the results of the workflow’s inserted plugins. The chain includes a Compressor, a Limiter, and a Loudness Meter. Note the level meters indicate signal level. They do not indicate or represent perceptual Loudness.

Section B displays the gain reduction applied by the Compressor at the current position of the playhead. For the test/source audio I determined an average of 6dB of gain reduction would yield acceptable results. The purpose of this stage is to reduce the dynamic range and/or dynamic structure of the Spoken Word resulting in optimized intelligibility AND to prevent excessive down stream limiting. This is an important workflow element when preparing Spoken Word audio for Internet/Mobile, and Podcast distribution.

Section C includes my subjective limiting parameters. The Limiter will add the required amount of gain to achieve a -16.0 LUFS deliverable while adhering to a -1.5 dBTP (True Peak Max). If the client, platform, or workflow requires an alternative Loudness target and/or Maximum True Peak ceiling – the parameters and their mathematical relationship may be altered for customized targeting. Please note the Maximum True Peak referenced in any spec. is more of a ceiling as opposed to a target. In essence the measured signal level may be lower than the specified maximum.

Section D indicates the amount of limiting that is occurring at the current position of the playhead.

Section E displays the user defined Integrated Loudness target located above the circular Momentary Loudness LED (12 o’clock position). The defined Integrated Loudness target is also visually represented by the Radar’s second concentric circle. The Radar display indicates the Short Term Loudness measured over time within a 3 sec. window. The consistency of the Short Term Loudness is evident indicating optimized intelligibility.

Section F displays the unprocessed source audio that lacks optimization for Internet/Mobile, and Podcast distribution. Any attempt to consume the audio in it’s current state in a less than ideal listening environment will result in compromised intelligibility. Mobile device consumption in like environments will exacerbate compromised intelligibility.

Section G displays the processed/optimized audio suitable for the noted distribution platform. The Integrated Loudness, True Peak, and LRA descriptors now satisfy compliance targets. Notice there is no indication of excessive limiting.

-paul.

Loudness Meter Scale Variations

I thought I’d revisit various aspects of Loudness Meter Absolute/Relative Scale correlation, and provide a visual representation of a real time processing Session with both Scales active.

Descriptors and Scales

Modern Loudness Meters display various descriptors including Program Loudness – also referred to as Integrated Loudness. There are two scales that can be used to display measured Program or Integrated Loudness over time …

The most common is an Absolute Scale, displayed in LUFS or LKFS. LUFS refers to Loudness Units relative to Full Scale. LKFS refers to Loudness Units K-Weighted relative to Full Scale. There is no difference in the perceptual measured loudness between both descriptor references.

It is also possible to measure and display Integrated/Program Loudness as Loudness Units (or LU’s) on a Relative Scale where 1LU == 1 dB.

When shifting to a Relative Scale, the 0 LU increment is always equivalent to the Meter’s user defined or spec. defined Absolute Loudness target.

For example, in an R128 -23.0 LUFS Absolute Scale workflow, setting the Meter to display a Relative Scale changes the target to 0 LU.

So – if a piece of measured audio checks in at -23.0 LUFS on an Absolute Scale, it would be perceptually equal to measured audio checking in at 0 LU on a Relative Scale.

Likewise if the Meter’s Absolute Scale target is set to -16.0 LUFS, it will correlate to 0 LU on a Relative Scale. Again both would reflect perceptual equivalence.

All broadcast delivery specifications suggest Absolute Scale Integrated Loudness targets. However, for any number of subjective reasons – many operators prefer to use the alternative Relative Scale and “mix or master to 0 LU.”

Please note Loudness Units are also the proper way in which to describe Loudness differentials between two programs. For instance, “Program (A) is +2 LU louder than Program (B).” One might also describe gain offsets in LU’s as opposed to dB’s.

LU Meter

Hornet Plugins recently released Hornet LU Meter. This tool is a Loudness Meter plugin designed to measure and display Integrated/Program Loudness within a 400ms time window. This measurement represents the Momentary Loudness descriptor.

The Meter is indeed nifty and affordable. However there is one sort of caveat worth noting: As the name suggests, it is an LU Meter. In essence Integrated (Momentary) Loudness measurements are solely displayed on a Relative Scale.

Session

The displayed Session (image) consists of a single mono VO clip. The objective is to print a processed stereo version in RT checking in at -16.0 LUFS with a maximum True Peak no higher than -2.0 dBTP.

The output of the mono VO track is routed to a mono Auxiliary Input track titled Normalize. If you are not familiar with Pro Tools, an Auxiliary Input track is not the same as an Auxiliary Send. Auxiliary Input tracks allow the user to pass signal using buses, insert plugins, and adjust level. They are commonly used to create sub-mixes.

I’ve inserted a Compressor and a Limiter on the Normalize Auxiliary Input track. The processed audio is passing through at -19.0 LUFS (mono).

The audio is then routed to a second (now stereo) Auxiliary Input track titled Offset. I use the track fader to apply a +3 dB gain offset, This will reconstitute the loss of gain that occurs on center panned mono tracks. The attenuation is a direct result of the Pro Tools Pan Depth setting.

The signal flow/output is now passing -16.0 LUFS audio. It is routed to a standard audio track titled Print. When this track is armed to record, it is possible to initiate a realtime bounce of the processed/routed audio.

The Meters

Notice the instances of the Hornet LU Meter and TC Electronics Loudness Radar. Both Meters are inserted on the Master Bus and are measuring the session’s Master Output.

I set the Reference (target) on the Hornet LU Meter to -16.0 LUFS. In essence 0 LU on it’s Relative Scale represents -16.0 LUFS.

Conversely the TC Electronic Meter is configured to display Absolute Scale measurements. The circular LED that borders the Radar area indicates Momentary Loudness. The defined Integrated Loudness target is displayed under the arrow at the 12 o’clock position.

Remember the Hornet LU Meter solely displays Momentary Loudness. If you compare it’s current reading to the indication of Momentary Loudness on the TC Electronic Meter, the relationship between Relative Scale and Absolute Scale measurement is clearly indicated. Basically the Hornet Meter registers just below 0 LU. The TC Electronic Meter registers just below -16.0 LUFS.

I will say if you are comfortable monitoring real time Momentary Loudness and understand Relative/Absolute Scale correlation, the Hornet tool is quite useful. In fact it contains additional features such as Grouping, auto/manual Gain Compensation, and auto-Maximum Peak protection.

Additional insight on the K-weighting Curve or K-weighted filtering:

K-weighting suggests de-emphasized low frequencies by way of a high-pass filter. A high-shelving filter is applied to the upper frequency range, and the measured data is averaged.

TC Electronic describes applied K-weighting on audio channels as a “method to build a bridge between subjective impression and objective measurement.”

-paul.

Elixir ITU True Peak Limiter

Certain ISP/True Peak Limiters provide added compliance processing flexibility. Case in point: Elixir by Flux.

Preparation

Before processing or Loudness Normalizing, execute an offline measurement on an optimized source clip.

An optimized audio clip may exhibit the benefits of various stages of enhancement processing such as noise reduction and dynamic range compression.

The displayed clip (see attached image) checks in at -19.6 LUFS. It requires +3.6 dB of gain to meet a -16.0 LUFS Integrated Loudness target. Based on the pre-existing peak ceiling approximately 1.5 dB of limiting will be necessary to establish a -2.0 True Peak maximum.

Processing Example

We use the Limiter’s Input Gain setting to take the clip down to -24.0 LUFS (-4.4 dB for the measured displayed clip).

The initial -24.0 LUFS target will restore headroom and establish a consistent starting point for downstream limiting accuracy. This will allow the Threshold and Output Gain settings to be recognized and implemented as static parameters for all -16.0 LUFS/-2.0 dBTP (stereo) processing. The Input Gain setting however will be variable based on the measured attributes of the optimized source.

Set the Threshold to -10 dB(TP) and the Output Gain to +8dB. The processing may be implemented offline or in real time. The output audio will reflect accurate targets (-16.0 LUFS/-2.0 dBTP) and the applied limiting will be transparent.

Note:

The proprietary functional parameters included on the Elixir Limiter are not necessarily included on Limiters designed by competing developers. In essence the described workflow may need to be customized based on the attributes of the Limiter.

The key is the “math” and static parameters never change, unless of course you decide to alter the referenced targets.

Let me know if you have questions …

-paul.

Programmatic Ads and Loudness Standardization

This is a re-post of an article that I published in October, 2015 …

In a recent Midroll article titled “Why Programmatic Ads Aren’t Necessarily Great for Podcasting,” the staff writer states:

“A number of players in the Podcasting and advertising industries are making bets on programmatic Ad delivery — dynamically inserting Ads into a Podcast as the episode is downloaded. It’s an understandable temptation, but we at Midroll see some tradeoffs.”

I wonder how networks will handle potential perceived Loudness inconsistencies between produced Ads and new or preexisting programs?

minus-sixteen-small

I’ve mentioned my past affiliation with IT Conversations and The Conversations Network, where I was the lead post audio engineer from 2005-2012. Executive Director Doug Kaye built a proprietary content management system and infrastructure that included an automated component based Show Assembly System. Audio components were essentially audio clips (Intros, Outros, Ads, Credits. etc.) combined server side into Podcasts in preparation for distribution.

One key element in this implementation was the establishment of perceived Loudness consistency across all submitted audio components. This was accomplished by standardizing an average Loudness Target using a proprietary software RMS Normalizer to process all server side audio components prior to assembly. (Loudness Normalization is now the recommended process for Integrated Loudness targeting and consistency).

Due to this consistency, all distributed Podcasts were perceptually equal with regard to Integrated or Program Loudness upon playback. This was for the benefit of the listener, removing the potential need to make constant playback volume adjustments within a single program and throughout all programs distributed on the network.

Regarding Programmatic Ad insertion, I have yet to come across a Podcast Network that clearly states a set Integrated Loudness Target for submitted programs. (A Maximum True Peak requirement is equally important. However this descriptor has no effect on perceptual Loudness consistency).

Due to the absence of any suggested internal network guidelines or any form of standardized Loudness Normalization, dynamic Ad insertion has the potential to ruin the perceptual consistency within single programs and throughout the contents of an entire network.

Many conscientious independent producers have embraced the credible -16.0 LUFS Integrated Loudness Target for stereo Internet/ Mobile/Podcast audio distribution (the perceptual equivalent for mono distribution is -19.0 LUFS). It’s far from a requirement, and nothing more than a suggested guideline.

My hope is Podcast Networks will begin to recognize the advantages of standardization and consider the adoption of the -16.0 LUFS Integrated Loudness Target. Dynamically inserted Ads must be perceptually equal to the parent program. Without a standardized and pre-disclosed Integrated Loudness Target, it will be near impossible to establish any level of distribution consistency.

-paul.

CNN and Program Loudness Tolerance

I recently analyzed a few of the internal Podcasts produced by CNN. One particular installment is yet another example of a major media outlet distributing audio that is in my view unsuitable for this particular platform.

Let’s discuss file attributes and measured specs. for one of CNN’s distributed Podcasts:

The distributed audio is mono, 64kbps, with music elements. I’ve stated how I feel about this. I’m not a proponent of 64 kbps MP3 audio PERIOD (mono or stereo). In general audio in this format sounds horrible. Feel free to disagree.

Secondly, the Integrated (Program) Loudness for this particular program is just about -23.0 LUFS with a Maximum True Peak of +0.40 dBTP. From my perspective the perceptual Loudness misses the mark. And, the audio is clipped.

Lastly, the produced audio is way too dynamic for spoken word. The perceptual inconsistency of the delivery by the participants is inadequate when considering how (for the most part) this program will be consumed (mobile devices, problematic ambient spaces, etc.).

I decided to sort of showcase this particular program because it is a good candidate for flexible Target considerations. What do I mean by “flexible Target considerations?” Let me explain …

Again, the distributed file is mono. The recommended Integrated Loudness Target for mono Podcasts is -19.0 LUFS. This is the perceptual equivalent of -16.0 LUFS stereo. If I were to apply a +4 db gain offset to Loudness Normalize this audio to -19.0 LUFS, there would be very little change in the original dynamic structure of the audio. However without some form of aggressive limiting, the maximum amplitude or Peak Ceiling would be driven into oblivion. In fact audible distortion may occur with or without limiting. This is obviously not recommended.

There are two options to consider: 1) apply Dynamic Range Compression before Loudness Normalization, or 2) shoot for a lower Integrated Loudness target. For this particular example I chose to implement both options.

First, in my view optimizing the dynamics in this program for Podcast distribution is unavoidable. It’s just way too choppy and it lacks delivery consistency for spoken word. Also, by lowering the L.Normalized Target, the necessary added gain offset will be reduced resulting in less aggressive limiting. In addition, the reduced amount of added gain will curtail noise floor elevation and other variables such as exaggerated breaths.

As noted the distributed Podcast (displayed in the attached upper waveform example) checks in at -23.0 LUFS and it is clipped. My optimized version (displayed in the lower waveform example) checks in at -20.2 LUFS with a Maximum True Peak of -1.23 dBTP. It is well within a reasonable level of Program Loudness tolerance for Podcast L.Normalization. In fact the perceptual difference between the processed -20.0 LUFS audio and a -19.0 LUFS version would be pretty much undetectable. In essence the audio has been optimized and it exhibits improved intelligibility. It is now well suited for Podcast distribution.

cnn_small

(If you are interested in the tools that I use, they are listed under Available Services).

It is no secret that I am a staunch proponent of the -16.0 LUFS/-19.0 LUFS recommendations for Podcasts. However, in certain situations – tolerance for slightly reduced Program Loudness Targets is acceptable.

For the record – my remaster is much easier to listen to. CNN can do better.

-paul.

Loudness Measurement and Silence

Consider this: Two extended segments of audio, Loudness Normalized (or mixed in real time) to the same Integrated Loudness Target.

Segment (A) is fairly consistent, with a very limited amount of intermittent silence gaps.

Segment (B) is far less consistent, due to a multitude of intermittent silence gaps.

When passing both segments through a Loudness Meter (or measuring the segments offline), and recognizing Integrated Loudness is a reflection of the average perceptual Loudness of an entire segment – how will inherent silence affect the accuracy of the cumulative measurements?

In theory the silence gaps in Segment (B) should affect the overall measurement by returning a lower representation of average Integrated Loudness. If additional gain is added to compensate, Segment (B) would be perceptually louder than Segment (A).

Basically without some sort of active measurement threshold, the algorithms would factor in silence gaps and return an inaccurate representation of Integrated Loudness.

The Fix

In order to establish perceptual accuracy, silence gaps must be removed from active measurements. Loudness Meters and their algorithms are designed to ignore silence gaps. The omission of silence is based on the relationship between the average signal level and a predefined threshold.

Loudness Meter (G10) Gate

The specification Gate (G10) is an aspect of the ITU Loudness Measurement algorithms included in compliant Loudness Meters. It’s function is to temporarily pause Loudness measurements when the signal drops below a relative threshold, thus allowing only prominent foreground sound to be measured.

The relative threshold is -10 LU below ungated LUFS. Momentary and Short Term measurements are not gated. There is also a -70 LUFS Absolute Gate that will force metering to ignore extreme low level noise.

Most Loudness Meters reveal a visual indication of active gating (see attached image) and confirm the accuracy of displayed measurements.

Gate-(480)

Additional “Gate” Generalizations and Nomenclature

A Downward Expander and it’s applied attenuation is dependent on signal level when the signal drops below a user defined threshold. The Ratio dictates the amount of attenuation. Alternatively a Noise Gate functions independent of signal level. When the level drops below the defined threshold, hard muting is applied.

Silence Gate

This is a somewhat proprietary term. It is a parameter setting available on the Aphex 320A and 320D Compellor hardware Leveler/Compressor.

Compellor

When a passing signal level drops below the user defined Silence Gate threshold for 1 second or longer, the device’s VCA (Voltage Controlled Amplifier) gain is frozen. The Silence Gate will prevent the Leveling and Compression processing from releasing and inadvertently increasing the audibility of background noise.

-paul.

Understanding Pan Mode Options

Adobe Audition and Logic Pro X include Pan Mode preference options that determine track output gain for center panned mono clips included in stereo sessions. These options are often the source of confusion when working with a combination of mono and stereo clips, especially when clips are pre-Loudness Normalized prior to importing.

In Audition, the Left/Right Cut (Logarithmic) option retains center panned mono clip gain. The -3.0 dB Center option, which by the way is customizable – will attenuate center panned mono clip gain by the specified dB value.

For example if you were targeting -16.0 LUFS in a stereo session using a combination of pre-Loudness Normalized clips, and all channel faders were set to unity – the imported mono clips need to be -19.0 LUFS (Integrated). The stereo clips need to be -16.0 LUFS (Integrated). The Left/Right Cut Pan Mode option will not alter the gain of the center panned mono clips. This would result in a -16.0 LUFS stereo mixdown.

Conversely the -3.0 dB Center Pan Mode option will apply a -3 dB gain offset (it will subtract 3 dB of gain) to center panned mono clips resulting in a -19.0 LUFS stereo mixdown. In most cases this -3 LU discrepancy is not the desired target for a stereo mixdown. Note 1 LU == 1 dB.

As stated Logic Pro X provides a similar level of Pan Mode flexibility. I’ve also tested Reaper, and it’s options are equally flexible.

Pro Tools

Pro Tools Pan Mode support (they call it Pan Depth) is somewhat restricted. The preference is limited to Center Pan Mode, with selectable dB compensation options (-2.5 dB, -3.0 dB, -4.5 dB, and -6.0 dB).

There are several ways to reconstitute the loss of gain that occurs in Pro Tools when working with center panned mono clips in stereo sessions. One option would be to duplicate a mono clip and place each instance of it on hard-panned discrete mono tracks (L+R respectively). Routing the mono tracks to a stereo output will reconstitute the loss of gain.

A second and much more efficient method is to route all individual instances of mono session clips to a stereo Auxiliary Input, and use it to apply the necessary compensating gain offset before the signal reaches the stereo Master Output. The gain offset can be applied using the Aux Input channel fader or by using an inserted gain trim plugin. Stereo clips included in the session can bypass this Aux and should be directly routed to the stereo Master Output. In essence stereo clips do not require compensation.

Example Session

Have a look at the attached Pro Tools session snapshot. In order to clearly display the signal path relative to it’s gain, I purposely implemented Pre-Fader Metering.

pt-pan_small

Notice how the mono spoken word clip included on track 1 is routed (by way of stereo Bus 1-2) to a stereo Auxiliary Input track (named to Stereo). Also notice how the stereo signal level displayed by the meters on the Stereo Auxiliary Input track is lower than the mono source that is feeding it. The level variation is clear due to Pre-Fader Metering. It is the direct result of the session’s Pan Depth setting that is subtracting -3dB of gain on this center panned mono track.

Next, notice how the signal level on the Master Output has been reconstituted and is in fact equal to the original mono source. We’ve effectively added +3dB of gain to compensate for the attenuation of the original center panned mono clip. The +3dB gain compensation was applied to the signal on the Auxiliary Input track (via fader) before routing it’s output to the stereo Master Output.

So it’s: Center Panned mono resulting in a -3dB gain attenuation —>> to a stereo Aux Input with +3dB of gain compensation —>> to stereo Master Output at unity.

In case you are wondering – why not add +3dB of gain to the mono clip and bypass all the fluff? By doing so you would be altering the native inherent gain structure of the mono source clip, possibly resulting in clipping. My described workflow simply reconstitutes the attenuated gain after it occurs on center panned mono clips. It is all necessary due to Pro Tool’s Pan Depth methods and implementation.

-paul.

AES “Recommendation for Loudness of Audio Streaming & Network File Playback.”

I’d like to share my observations and views on the recently published AES Technical Document AES TD1004.1.15-10 that specifics best practices for Loudness of Audio Streaming and Network File Playback.

The document is a collection of Loudness processing guidelines for diverse platform dependent media streaming and downloading. This would include music, spoken word, and possible high dynamic audio in video streams. The document credits some of the most well respected industry leading professionals, including Bob Katz, Thomas Lund, and Florian Camerer. The term “Podcast” is directly referenced once in the document, where the author(s) state:

Network file playback is on-demand download of complete programs from the network, such as podcasts.”

I support the purpose of this document, and I understand the stated recommendations will most likely evolve. However in my view the guidelines have the potential to create a fair amount of confusion for producers of spoken word content, mainly Podcast producers. I’m specifically referring to the suggested 4 LU range (-16.0 to -20.0 LUFS) of acceptable Integrated Loudness Targets and the solutions for proper targeting.

Indeed compliance within this range will moderately curtail perceptual loudness disparities across a wide range of programs. However the leniency of this range is what concerns me.

I am all for what I refer to as reasonable deviation or “wiggle room” in regard to Integrated Loudness Target flexibility for Podcasts. However IMHO a -20 LUFS spoken word Podcast approaches the broadcast Loudness Targets that I feel are inadequate for this particular platform. A comparable audio segment with wide dynamics will complicate matters further.

I also question the notion (as stated in the document) of purposely precipitating clipping when adding gain “to handle excessive peaks.”

And there is no mention of the perceptual disparities between Mono and Stereo files Loudness Normalized to the same Integrated Loudness Target. For the record I don’t support mono file distribution. However this file format is prevalent in the space.

Perspective

I feel the document’s perspective is somewhat slanted towards platform dependent music streaming and preservation of musical dynamics. In this category, broad guidelines are for the most part acceptable. This is due to the wide range of production techniques and delivery methods used on a per musical genre basis. Conversely spoken word driven audio is not nearly as artistically diverse. Considering how and where most Podcasts are consumed, intelligibility is imperative. In my view they require much more stringent guidelines.

It’s important to note streaming services and radio stations have the capability to implement global Loudness Normalization. This frees content creators from any compliance responsibilities. All submitted media will be adjusted accordingly (turned up or turned down) in order to meet the intended distribution Target(s). This will result in consistency across the noted platform.

Unfortunately this is not the case in the now ubiquitous Podcasting space. At the time of this writing I am not aware of a single Podcast Network that (A) implements global Loudness Normalization … and/or … (B) specifies a requirement for Integrated Loudness and Maximum True Peak Targets for submitted media.

Currently Podcast Loudness compliance Targets are resolved by each individual producer. This is the root cause of wide perceptual loudness disparities across all programs in the space. In my view suggesting a diverse range of acceptable Targets especially for spoken word may further impede any attempts to establish consistency and standardization.

PLR and Retention of Music Dynamics

The document states: “Users may choose a Target Loudness that is lower than the -16.0 LUFS maximum, e.g., -18.0 LUFS, to better suit the dynamic characteristics of the program. The lower Target Loudness helps improve sound quality by permitting the programs to have a higher Peak to Loudness Ratio (PLR) without excessive peak limiting.”

The PLR correlates with headroom and dynamic range. It is the difference between the average Loudness and maximum amplitude. For example a piece of audio Loudness Normalized to -16.0 LUFS with a Maximum True Peak of -1 dBTP reveals a PLR of 15. As the Integrated Loudness Target is lowered, the PLR increases indicating additional headroom and wider dynamics.

In essence low Integrated Loudness Targets will help preserve dynamic range and natural fidelity. This approach is great for music production and streaming, and I support it. However in my view this may not be a viable solution for spoken word distribution, especially considering potential device gain deficiencies and ubiquitous consumption habits carried out in problematic environments. In fact in this particular scenario a moderately reduced dynamic range will improve spoken word intelligibility.

Recommended Processing Options and Limiting

If a piece of audio is measured in it’s entirety and the Integrated Loudness is higher than the intended Target, a subtractive gain offset normalizes the audio. For example if the audio checks in at -18.0 LUFS and you are targeting -20.0 LUFS, we simply subtract 2 dB of gain to meet compliance.

Conversely when the measured Integrated Loudness is lower than the intended Target, Loudness Normalization is much more complex. For example if the audio checks in at -20.0 LUFS, and the Integrated Loudness Target is -16.0 LUFS, a significant amount of gain must be added. In doing so the additional gain may very well cause overshoots, not only above the Maximum True Peak Target, but well above 0dBFS. Inevitably clipping will occur. From my perspective this would clearly indicate the audio needs to be remixed or remastered prior to Loudness Normalization.

Under these circumstances I would be inclined to reestablish headroom by applying dynamic range compression. This approach will certainly curtail the need for aggressive limiting. As stated the reduced dynamic range may also improve spoken word intelligibility. I’m certainly not suggesting aggressive hyper-compression. The amount of dynamic range reduction is of course subjective. Let me also stress this technique may not be suitable for certain types of music.

Additional Document Recommendations and Efficiency

The authors of the document go on to share some very interesting suggestions in regard to effective Loudness Normalization:

1) “If level has to be raised, raise until it reaches Target level or until True Peak reaches 0 dBTP, whichever occurs first. Thus, the sound quality will be preserved, without introducing excessive peak limiting.”

2) “Perform what is noted in example 1, but keep raising the level until the program level reaches Target, and apply either peak limiting or allow some clipping to handle excessive peaks. The advantage is more consistent loudness in the stream, but this is a potential sonic compromise compared to example 1. The best way to retain sound quality and have more consistent loudness is by applying example 1 and implementing a lower Target.”

With these points in mind, please review/demo the following spoken word audio segment. In my opinion the audio in it’s current state is not optimized for Podcast distribution. It’s simply too low in terms of perceptual loudness and too dynamic for effective Loudness Normalization, especially if targeting -16.0 LUFS. Due to these attributes suggestion 1 above is clearly not an option. In fact neither is option 2. There is simply no available headroom to effectively add gain without driving the level well above full scale. Peak limiting is unavoidable.

1

I feel the document suggestions for the segment above are simply not viable, especially in my world where I will continue to recommend -16.0 LUFS as the recommended Target for spoken word Podcasts. Targeting -18.0 LUFS as opposed to -16.0 LUFS is certainly an option. It’s clear peak limiting will still be necessary.

Below is the same audio segment with dynamic range compression applied before Loudness Normalization to -16.0 LUFS. Notice there is no indication of aggressive limiting, even with a Maximum True Peak of -1.7 dBTP.

2

Regarding peak limiting the referenced document includes a few considerations. For example: “Instead of deciding on 2 dB of peak limiting, a combination of a -1 dBTP peak limiter threshold with an overall attenuation of 1 dB from the previously chosen Target may produce a more desirable result.”

This modification is adequate. However the general concept continues to suggest the acceptance of flexible Targets for spoken word. This may impede perceptual consistency across multiple programs within a given network.

Conclusion

The flexible best practices suggested in the AES document are 100% valid for music producers and diverse distribution platforms. However in my opinion this level of flexibility may not be well suited for spoken word audio processing and distribution.

I’m willing to support the curtailment of heavy peak limiting when attempting to normalize spoken word audio (especially to -16.0 LUFS) by slightly reducing the intended Integrated Loudness Target … but not by much. I will only consider doing so if and when my personal optimization methods prior to normalization yield unsatisfactory results.

My recommendation for Podcast producers would be to continue to target -16.0 LUFS for stereo files and -19.0 LUFS for mono files. If heavy limiting occurs, consider remixing or remastering with reduced dynamics. If optimization is unsuccessful, consider lowering the intended Integrated Loudness Target by no more than 2 LU.

A True Peak Maximum of <= -1.0 dBTP is fine. I will continue to suggest -1.5 dBTP for lossless files prior to lossy encoding. This will help ensure compliance in encoded lossy files. What’s crucial here is a full understanding of how lossy, low bit rate coders will overshoot peaks. This is relevant due to the ubiquitous (and not necessarily recommended) use of 64kbps for mono Podcast audio files.

Let me finish by stating the observations and recommendations expressed in this article reflect my own personal subjective opinions based on 11 years of experience working with spoken word audio distributed on the Internet and Mobile platforms. Please fell free to draw your own conclusions and implement the techniques that work best for you.

-paul.

Quantifying Podcast Audio Dynamics

I’ve discussed the reasons why there is a need for revised (optimized) Loudness Standards for Internet and Mobile audio distribution. Problematic (noisy) consumption environments and possible device gain deficiencies justify an elevated Integrated Loudness target. Highly dynamic audio complicates matters further.

In essence audio for the Internet/Mobile platform must be perceptually louder on average compared to audio targeted for Broadcast. The audio must also exhibit carefully constrained dynamics in order to maintain optimized intelligibility.

The recommended Integrated Loudness targets for Internet and Mobile audio are -16.0 LUFS for stereo files and -19.0 LUFS for mono. They are perceptually equal.

In terms of Dynamics, I’ve expressed my opinion regarding compression. In my view spoken word audio intelligibility will be improved after careful Dynamic Range Compression is applied. Note that I do not advocate aggressive compression that may result in excessive loudness and possible quality degradation. The process is a subjective art. It takes practice with accessibility to well designed tools along with a full understanding of all settings.

Dynamic-480

I thought I would discuss various aspects of Podcast audio Dynamics. Mainly, the potential problematic significance of wide Dynamics and how to quantify aspects as such using various descriptors and measurement tools. I will also discuss the benefits of Dynamic Range management as a precursor to Loudness Normalization. Lastly I will disclose recommended benchmarks that are certainly not requirements. Feel free to draw your own conclusions and target what works best for you.

Highly Dynamic Audio in Noisy Environments

At it’s core extended or Wide Dynamic Range describes notable disparities between high and low level passages throughout a piece of audio. When this is prevalent in a spoken word segment, intelligibility will be compromised – especially in situations where the listening environment is less than ideal.

For example if you are traveling below Manhattan on a noisy subway, and a Podcast talent’s delivery is inconsistent, you may need to make realtime playback volume adjustments to compensate for any inconsistent high and low level passages.

As well – if the Integrated Loudness is below what is recommended, the listening device may be incapable of applying sufficient gain. Dynamic Range Compression will reestablish intelligibility.

From a post perspective – carefully constrained dynamics will provide additional headroom. This will optimize audio for further down stream processing and ultimately efficient Loudness Normalization.

Dynamic Range Compression and Loudness Normalization

I would say in most cases successful Loudness Normalization for Broadcast compliance requires nothing more than a simple subtractive gain offset. For example if your mastered piece checks in at -20.0 LUFS (stereo), and you are targeting R128 (-23.0 LUFS Integrated), subtracting -3 LU of gain will most likely result in compliant audio. By doing so the original dynamic attributes of the piece will be retained.

Things get a bit more complicated when your Integrated Loudness target is higher than the measured source. For example a mastered -20.0 LUFS piece will require additional gain to meet a -16.0 LUFS target. In this case you may need to apply a significant amount of limiting to prevent the Maximum True Peak from exceeding your target. In essence without safeguards, added gain may result in clipping. The key is to avoid excessive limiting if at all possible.

How do we optimize audio before a gain offset is applied?

I recommend applying a moderate to low amount of (global) final stage Dynamic Range Compression before Loudness Normalization. When processing highly dynamic audio this final stage compression will prevent instances of excessive limiting. The amount of compression is of course subjective. Often a mere 1-2 dB of gain reduction will be sufficient. Effectiveness will always depend on the attributes of the mastered source audio before L.Normalizing.

I carefully manage spoken word dynamics throughout client project workflows. I simply maintain sufficient headroom prior to Loudness Normalization. In most cases I am able to meet the intended Integrated Loudness and Maximum True Peak targets (without limiting) by simply adding gain.

RX Loudness Control

By design iZotope’s RX Loudness Control also applies compression in certain instances of Loudness Normalization. I suggest you read through the manual. It is packed with information regarding audio loudness processing and Loudness Normalization.

RX-LC_site

iZotope states the following:

“For many mixes, dynamics are not affected at all . This is because only a fixed gain is required to meet the spec . However, if your mix is too dynamic or has significant transients, compression and/or limiting are required to meet Short-term/Momentary or True Peak parts of the spec.”

“RX Loudness Control uses compression in a way that preserves the quality of your audio . When needed, a compressor dynamically adjusts your audio to ensure you get the 
best sound while remaining compliant . For loudness standards that require Short-term 
or Momentary compliance, the compressor is engaged automatically when loudness exceeds the specified target.”

It’s a highly recommended tool that simplifies offline processing in Pro Tools. Many of it’s features hook into Adobe’s Premiere Pro and Media Encoder.

LRA, PLR, and Measurement Tools

So how do we quantify spoken word audio dynamics? Most modern Loudness Meters are capable of calculating and displaying what is referred to as the Loudness Range (LRA). This particular descriptor is displayed in Loudness Units (LU’s). Loudness Range quantifies the differences in loudness measurements over time. This statistical perspective can help operators decide whether Dynamic Range Compression may be necessary for optimum intelligibility on a particular platform. (Note in order to prevent a skewed measurement due to various factors – the LRA algorithm incorporates relative and absolute threshold gating. For more information: refer to EBU Tech doc 3342).

I will say before I came across sort of rule of thumb (recommended) guidelines for Internet and Mobile audio distribution, the LRA in the majority of the work that I’ve produced over the years hovered around 3-5 LU. In the highly regarded article Audio for Mobile TV, iPad and iPod, the author and leading expert Thomas Lund of TC Electronic suggests an LRA not much higher than 8 LU for optimal Pod Listening. Basically higher LRA readings suggest inconsistent dynamics which in turn may not be suitable for Mobile platform distribution.

Some Loudness Meters also display the PLR descriptor, or Peak to Loudness Ratio. This correlates with headroom and dynamic range. It is the difference between the Program (average) Loudness and maximum amplitude. Assuming a piece of audio has been Loudness normalized to -16.0 LUFS along with an awareness of a True Peak Maximum somewhere around -1.0 dBTP, it is easy to recognize the general sweet spot for the Mobile platform ->> (e.g. a PLR reasonably less than 16 for stereo).

Note that heavily compressed or aggressively limited (loud) audio will exhibit very low PLR readings. For example if the measured Integrated Loudness of a particular program is -10.0 LUFS with a Maximum True Peak of -1.0 dBTP, the reduced PLR (9) clearly indicates aggressive processing resulting in elevated perceptual loudness. This should be avoided.

If you are targeting -16.0 LUFS (Integrated), and your True Peak Maximum is somewhere between -1.0 and -3.0 dBTP, your PLR is well within the recommended range.

In Conclusion

An optimal LRA is vital for Podcast/Spoken Word distribution. Use it to gauge delivery consistency, dynamics, and whether further optimization may be necessary. At this point in time I suggest adhering to an LRA < 7 LU for spoken word.

LRA Measurements may be performed in real time using a compliant Loudness Meter such as Nugen Audio’s VisLM 2, TC Electronic’s LM2n Loudness Radar, and iZotope’s Insight (also check out the Youlean Loudness Meter). Some meters are capable of performing offline measurements in supported DAWs. There are a number of stand alone third party measurement options available as well, such as iZotope’s RX7 Advanced Audio Editor, Auphonic Leveler, FFmpeg, and r128x.

-paul.

***Please note I personally paid for my RX Loudness Control license and I have no formal affiliation with iZotope.

Public Radio Loudness Compliance

PRSS (Public Radio Satellite System) recently published Loudness Standardization parameters intended for contributing producers:

[– Target Loudness: Integrated loudness shall be -24 LUFS per program segment with a variance of ±2 LU. This will apply to speech and/or music elements.

[– Maximum Peak Level: Shall be no higher than -3 dBFS for sample peaks and shall be no higher than -2 dBTP for True Peaks.

To supplement the published standards, my twitter acquaintance and fellow Loudness advocate Rob Byers posted The Audio Producer’s Guide to Loudness on Transom.org.

The article documents the basics of Loudness Meters, measurement descriptors, and mixing best practices. It’s a viable guide for anyone planning to submit compliant audio for Public Radio distribution. Incidentally Rob is the Interim Director of Broadcast and Media Operations with Marketplace at American Public Media.

Anyway … I’d like to share my personal perspective regarding the differences between real time compliance mixing vs. compliance processing. I’m confident my subjective insight will prove to be useful for Public Radio Producers targeting the PRSS spec.

Internet/Mobile vs. Broadcast

I’ve stated that targeted (Integrated/Program) Loudness for Radio/Broadcast differs from what I consider suitable for audio distributed on the Internet. This includes streaming audio, video, and Podcasts. Basically audio mixed and/or Loudness Normalized to -23.0/-24.0 LUFS, targeted to comply with a Broadcast spec. is simply not loud enough for Internet distribution. This is due to various aspects of consumption, including device deficiencies and problematic ambiance in less than ideal listening environments. The Integrated Loudness target for Internet/Mobile audio is -16.0 LUFS with allowance for a reasonable deviation. True Peaks should not exceed -1.0 dBTP in lossy files. Some institutions suggest additional headroom.

Mixing for Compliance

I rarely mix audio in real time while attempting to meet Integrated and True Peak compliance targets. This method is acceptable. However there are a few caveats.

First, in order to arrive upon an accurate representation of Integrated Loudness, audio mixes must be measured in their entirety. You cannot spot check a few passages of a mix and estimate this descriptor. Needless to say this can be a time consuming process.

Secondly, in my view real time mixing for compliance is tedious and potentially inaccurate. What I recommend is to use both the Short Term and Integrated Loudness descriptors to sort of gauge the current state of the mix as playback progresses and ends. Once the mix has concluded – simply apply a global Gain Offset to the entire mix. This will shift the Integrated Loudness to your intended target. This is essentially one way to apply Loudness Normalization.

For example if a concluded mix checks in at -20.0 LUFS, and you are targeting -24.0 LUFS, prior to bouncing, a -4LU (dB) global Gain Offset would bring the mix into spec. (The process is discussed in this video highlighting the TC Electronic Loudness Radar Meter included in Adobe Audition and Premiere Pro. Of course any compliant Loudness Meter would be suitable).

By the way let’s not forget the importance of True Peak compliance for any standard. This descriptor will also need to be monitored and dealt with accordingly while mixing.

Trust Your Ears!

This second (and preferred) method of Loudness Normalization requires proper use of the most important tool(s) available to all of us in any mixing or post production environment … our ears. Producers need to learn how to take advantage of natural perception and also apply thoughtful processing to session clips with the intent to achieve a well balanced, good sounding mix. In doing so the use of a Loudness Meter becomes much less of a distraction.

Of course the presence of an inserted meter is a necessity, and it’s descriptors will (over time) display a clear indication of the state of the mix. Trust your ears!

Off-line Loudness Normalization

The workflow that I’m about to describe will reward producers with Loudness compliance flexibility throughout a mixing session. The key is upon completion, the mixed (and exported) audio will be processed off-line resulting in 100% compliance.

As noted, the global Gain Offset method for Loudness Normalization requires knowledge of existing Integrated Loudness prior to applying the necessary adjustments. The following variation shares the same requirement. However the Integrated Loudness and True Peak of the mixed-down audio will be calculated off-line as opposed real time. Let me stress the existing Integrated Loudness must be realized before we can move forward with any form of compliance processing. We will be targeting the PRSS specifications noted above.

FFMpeg:Cross Platform Support

There are many ways to measure audio off-line. The most accessible and economical cross-platform tool is the FFmpeg binary. Indeed this is a Command Line utility. Don’t fret! It’s not that big of a deal. You can easily download a pre-complied binary compatible with your current operating system. You simply point your command line syntax to the location of the binary, key in the path to the location of the file to be measured, and fire away.

Below is example syntax for Loudness Measurement. In this particular instance I point to the binary stored in a root, system wide folder. If you are running a Mac, it may be easier to simply place the binary on your Desktop. In this case you would point to the binary like this: ~/Desktop/ffmpeg … then continue with the remaining displayed syntax, replacing yourSourceFile.wav with the actual path of the file to be measured.

ffmpeg_syntax

And here are the results. Notice the -19.9 LUFS Integrated Loudness (I), and the 1.8 dBFS (dBTP) True Peak (open the image for an extended view).

ffmpeg-small

The PRSS spec. calls for -24.0 LUFS Integrated Loudness with Sample Peaks not exceeding -3.0 dB and True Peaks not exceeding -2.0 dBTP. In this measured example the audio is roughly +4LU louder than it should be and it is obviously clipped with it’s True Peak well above 0dBFS.

Setting Up The Normalization Session

In your preferred DAW, create a new stereo session and do the following:

[– Add a Stereo Audio Track, two Stereo AUX Input Channels (primary/secondary), and a Master Fader.

[– Route the Audio Track’s output to the input of the primary Aux Input Channel.

[– On the primary Aux Input Channel – first insert a Gain Trim plugin. Then insert a True Peak Limiter.

[– Now route the output of the primary Aux Input Channel to the input of the secondary Aux Input Channel.

[– Insert a second instance of a Gain Trim plugin on the secondary Aux Input Channel.

[– Route the processed signal to the Master Fader.

[– Set the True Peak Ceiling on the Limiter to -3.5dBTP. Set the Gain Trim inserted on the secondary Aux Input Channel to +1dB. Note that these settings are static and will never change.

Save the session as a Template.

Here is an example of how I do this in Pro Tools. Note that I have additional plugins inserted on the sessions’s Aux Input Channels. They are in fact deactivated. Please disregard them. I was using this example session for testing, using duplicate sets of plugins for various parameter adjustments. (click to enlarge).

pt-(-24)_620

Making it Work

Using the measured audio displayed above, note the Integrated Loudness (-19.9 LUFS). All you need to do is calculate an initial Gain Offset. This is the difference between the measured Integrated Loudness and -25.0. Add the mixed-down audio into the session’s Audio Track, and set the Gain Trim plugin inserted on the Primary Aux Input Channel to the calculated Gain Offset.

Bounce and you’re done.

Note that the initial Gain Offset will always be determined by calculating the difference between existing Integrated Loudness and -25.0. Once the core session Template is saved, subsequent use is simple: Measure mixed-down audio – Import audio into session – Calculate Gan Offset – Apply Offset to Primary Gain Trim – Bounce.

This is the fourth paragraph …

TP-620

Additional text …

Asymmetric Waveforms: Should You Be Concerned?

In order to understand the attributes of asymmetric waveforms, it’s important to clarify the differences between DC Offset and Asymmetry …

Waveform Basics

A waveform consists of both a Positive and Negative side, separated by a center (X) axis or “Baseline.” This Baseline represents Zero (∞) amplitude as displayed on the (Y) axis. The center portion of the waveform that is anchored to the Baseline may be referred to as the mean amplitude.

wf-480

DC Offset

DC Offset occurs when the mean amplitude of a waveform is off the center axis due to differing amounts of the signal shifting to the positive or negative side of the waveform.

One common cause of this shift is when faulty electronics insert a DC current into the signal. This abnormality can be corrected in most file based editing applications and DAW’s. Left uncorrected, audio with DC Offset will exhibit compromised dynamic range and a loss of headroom.

Notice the displacement of the mean amplitude:

dc-offset-ex-480-png

The same clip after applying DC Offset correction. Also, notice the preexisting placement of (+/-) energy:

dc-offset-removed-480

Asymmetry

Unlike waveforms that indicate DC Offset, Asymmetric waveform’s mean amplitude will reside on the center axis. However the representations of positive and negative amplitude (energy) will be disproportionate. This can inhibit the amount of gain that can be safely applied to the audio.

In fact, the elevated side of a waveform will tap the target ceiling before it’s counterpart resulting in possible distortion and the loss of headroom.

High-pass filters, and aggressive low-end processing are common causes of asymmetric waveforms. Adding gain to asymmetric waveforms will further intensify the disproportionate placement of energy.

In this example I applied a high-pass filter resulting in asymmetry:

asymm-matural-480

Broadcast Chains

Broadcast engineers closely monitor positive to negative energy distribution as their audio passes through various stages of processing and transmission. Proper symmetry aides in the ability to process a signal more effectively downstream. In essence uniform gain improves clarity and maximizes loudness.

Podcasts

In spoken word – symmetry allows the voice to ride higher in the mix with a lower risk of distortion. Since many Podcast Producers will be adding gain to their mastered audio when loudness normalizing to targets, the benefits of symmetric waveforms are obvious.

If an audio clip’s waveform(s) are asymmetric and the audio exhibits audible distortion and/or a loss of headroom, a Phase Rotator can be used to reestablish proper symmetry.

Below is a segment lifted from a distributed Podcast (full zoom out). Notice the lack of symmetry, with the positive side of the waveform limited much more aggressively than the negative:

podcast-asymm-480

The same clip after Phase Rotation:

asymm-podcas-fixed-480

(I processed the clip above using the Adaptive Phase Rotation option located in iZotope’s RX 4 Advanced Channel Ops module.)

In Conclusion

Please note that asymmetric waveforms are not necessarily bad. In fact the human voice (most notably male) is often asymmetric by nature. If your audio is well recorded, properly processed, and pleasing to the ear … there’s really no need to attempt to correct any indication of asymmetry.

However if you are noticing abnormal displacement of energy, it may be worth looking into. My suggestion would be to evaluate your workflow and determine possible causes. Listen carefully for any indication of distortion. Often a slight EQ tweak or a console setting modification is all that may be necessary to make noticeable (audible) improvements to your audio.

-paul.

Podcast Loudness: Mono vs. Stereo Perception …

Consider the following scenario:

Two copies of an audio file. File 1 is Stereo, Loudness Normalized to -16.0 LUFS. File 2 is Mono, also Loudness Normalized to -16.0 LUFS.

Passing both files through a Loudness Meter confirms equal numerical Program Loudness. However the numbers do not reflect an obvious perceptual difference during playback. In fact the Mono file is perceptually louder than it’s Stereo counterpart.

Why would the channel configuration affect perceptual loudness of these equally measured files?

mono-LN-480

The Explanation

I’m going to refer to a feature that I came across in a Mackie Mixer User Manual. Mackie makes reference to the “Constant Loudness” principle used in their mixers, specifically when panning Mono channels.

On a mixer, hard-panning a Mono channel left or right results in equal apparent loudness (perceived loudness). It would then make sense to assume that if the channel was panned center, the output level would be hotter due to the combined or “mixed” level of the channel. In order to maintain consistent apparent loudness, Mackie attenuates center panned Mono channels by about 3 dB.

We can now apply this concept to the DAW …

A Mono file played back through two speakers (channels) in a DAW would be the same as passing audio through a Mono analog mixer channel panned center. In this scenario, the analog mixer (that adheres to the Constant Loudness principle) would attenuate the output by 3dB.

In order to maintain equal perception between Loudness Normalized Stereo and Mono files targeting -16.0 LUFS, we can simulate the Constant Loudness principle in the DAW by attenuating Mono files by 3 LU. This compensation would shift the targeted Program Loudness for Mono files to -19.0 LUFS.

To summarize, if you plan to Loudness Normalize to the recommend targets for internet/mobile, and Podcast distribution … Stereo files should target -16.0 LUFS Program Loudness and Mono files should target -19.0 LUFS Program Loudness.

Note that In my discussions with leading experts in the space, it has come to my attention that this approach may not be sustainable. Many pros feel it is the responsibility of the playback device and/or delivery system to apply the necessary compensation. If this support is implemented, the perceived loudness of -16.0 LUFS Mono will be equal to -16.0 LUFS Stereo. There would be no need to apply manual compensation.

-paul.

Loudness Meter Descriptors …

In the recent article published on Current.org “Working Group Nears Standard for Audio Levels in PRSS Content”, the author states:

“Working group members believe that one solution may lie in promoting the use of Loudness Meters, which offer more precision by measuring audio levels numerically. Most shows are now mixed using peak meters, which are less exact.”

Peak Meters are exact – when they are used to display what they are designed to measure:Sample Peak Amplitude. They do not display an accurate representation of average, perceived loudness over time. They should only be used to monitor and ultimately prevent overload (clipping).

It’s great that the people in Public Radio are finally addressing distribution Loudness consistency and compliance. My hope is their initiative will carry over into their podcast distribution models. In my view before any success is achieved, a full understanding of all spec. descriptors and targets would be essential. I’m referring to Program (Integrated) Loudness, Short Term Loudness, Momentary Loudness, Loudness Range, and True Peak.

Loudness Meter

A Loudness Meter will display all delivery specification descriptors numerically and graphically. Meter descriptors will update in real time as audio passes through the meter.

Short Term Loudness values are often displayed from a graphical perspective as designed by the developer. For example TC Electronic’s set of meters (with the exception of the LM1n) display Short Term Loudness on a circular graph referred to as Radar. Nugen Audio’s VisLM meter displays Short Term Loudness on a grid based histogram. Both versions can be customized to suit your needs and work equally well.

meters-480

Loudness Meters also include True Peak Meters that display any occurrences of Intersample Peaks.

Descriptors

All Loudness standardization guidelines specify a Program Loudness or “Integrated Loudness” target. This time scaled descriptor indicates the average, perceived loudness of an entire segment or program from start to finish. It is displayed on an Absolute scale in LUFS (Loudness Units relative to Full Scale), or LKFS (Loudness Units K Weighted relative to Full Scale). Both are basically the same. LUFS is utilized in the EBU R128 spec. and LKFS is utilized in the ATSC A/85 spec. What is important is that a Loudness Meter can display Program Loudness in either LUFS or LKFS.

The Short Term Loudness (S) descriptor is measured within a time window of 3 seconds, and the Momentary Loudness (M) descriptor is measured within a time window of 400 ms.

The Loudness Range (LRA) descriptor can be associated with dynamic range and/or loudness distribution. It is the difference between average soft and average loud parts of an audio segment or program. This useful indicator can help operators decide whether dynamic range compression is necessary.

Gating

The specification Gate (G10) function temporarily pauses loudness measurements when the signal drops below a relative threshold, thus allowing only prominent foreground sound to be measured. The relative threshold is -10 LU below ungated LUFS. Momentary and Short Term measurements are not gated. There is also a -70 LUFS Absolute Gate that will force metering to ignore extreme low level noise.

Absolute vs. Relative

I mentioned that LUFS and LKFS are displayed on an Absolute scale. For example the EBU R128 Program Loudness target is -23.0 LUFS. For Podcast/Internet/Mobile the Program Loudness target is -16.0 LUFS.

There is also a Relative scale that displays LU’s, or Loudness Units. A Relative LU scale corresponds to an Absolute LUFS/LKFS scale, where 0 LU would equal the specified Absolute target. In practice, -23 LUFS in EBU R128 is equal to 0 LU. For Podcast/Mobile -16.0 LUFS would also be equal to 0 LU. Note that the operator would need to set the proper Program Loudness target in the Meter’s Preferences in order to conform.

ab-rel

LU and dB Relationship

1 LU is equal to 1 dB. So for example you may have measured two programs: Program A checks in at -20 LUFS. Program B checks in at -15 LUFS. In this case program B is +5 LU louder than Program A.

Placement

Loudness Meter plugins mainly support online (Real Time) measurement of an audio signal. For an accurate measurement of Program Loudness of a clip or mixed segment the meter must be inserted in the DAW at the very end of a processing chain, preferably on the Master channel. If the inserts on the Master channel are post fader, any change in level using the Master Fader will result in a global gain offset to the entire mix. The meter would then (over time) display the altered Program Loudness.

If your DAW’s Master channel has pre fader inserts, the Loudness Meter should still be inserted on the Master Channel. However the operator would first need to route the mix through a Bus and use the Bus channel fader to apply global gain offset. The mix would then be routed to the Master channel where the Loudness Meter is inserted.

If your DAW totally lacks inserts on the Master channel, Buses would need to be used accordingly. Setup and routing would depend on whether the buses are pre or post fader.

Some Loudness Meter plugins are capable of performing offline measurements in certain DAW’s on selected regions and/or clips. In Pro Tools this would be an Audio Suite process. You can also accomplish this in Logic Pro X by initiating and completing an offline bounce through a Loudness Meter.

-paul.

Audition CC: Loudness Normalization Pt.2 …

In my previous article I discussed various aspects of the Match Volume Processor in Adobe Audition CC. I mentioned that the ITU Loudness processing option must be used with care due to the lack of support for a user defined True Peak Ceiling.

I also pointed to a video tutorial that I produced demonstrating a Loudness Normalization Processing Workflow recommended by Thomas Lund. It is the off-line variation of what I documented in this article.

Here’s how to implement the off-line processing version in Audition CC …

This is a snapshot of a stereo version of what may very well be the second most popular podcast in existence:

Amplitude Statistics in Audition:

Peak Amplitude:0dB
True Peak Amplitude:0.18dBTP
ITU Loudness:-15.04 LUFS

source-(480)

It appears the producer is Peak Normalizing to 0dBFS. In my opinion this is unacceptable. If I was handling post production for this program I would be much more comfortable with something like this at the source:

Amplitude Statistics in Audition:

Peak Amplitude:-0.81dB
True Peak Amplitude:-0.81dBTP
ITU Loudness:-15.88 LUFS

intermediate-(480)

We will be shooting for the Internet/Mobile/Podcast target of -16.0 LUFS Program Loudness with a suitable True Peak Ceiling.

The first step is to run Amplitude Statistics and determine the existing Program Loudness. In this case it’s -15.88 LUFS. Next we need to Loudness Normalize to -24.0 LUFS. We do this by simply calculating the difference (-8.1) and applying it as a Gain Offset to the source file.

The next step is to implement a static processing chain (True Peak Limiter and secondary Gain Offset) in the Audition Effects Rack. Since these processing instances are static, save the Effects Rack as a Preset for future use.

Set the Limiter’s True Peak Ceiling to -9.5dBTP. Set the secondary Gain Offset to +8dB. Note that the Limiter must be inserted before the secondary Gain Offset.

Process, and you are done.

In this snapshot the upper waveform is the Loudness Normalized source (-24.0 LUFS). The lower waveform in the Preview Editor is the processed audio after it was passed through the Effects Rack chain.

lund-method-(480)

In case you are wondering why the Limiter is before the secondary Gain instance – in a generic sense, if you start with -9.5 and add 8, the result will always be -1.5. This translates into the Limiter doing it’s job and never allowing the True Peaks in the audio to exceed -1.5dBTP. In essence this is the ultimate Ceiling. Of course it may be lower. It all depends on the state of the source file.

This last snapshot displays the processed audio that is fully compliant, followed by it’s Amplitude Statistics:

normalized-(480)

stats-audition

In Summary:

[– Determine Program Loudness of the source (Amplitude Statistics).

[– Loudness Normalize (Gain Offset) to -24.0 LUFS.

[– Run your saved Effects Rack chain that includes a True Peak Limiter (Ceiling set to -9.5dBTP) and a secondary +8dB Gain Offset.

Feel free to ping me with questions.

-paul.

Audition CC: Loudness Normalization …

*** UPDATE: Please note this post was written in 2014. The current version of Adobe Audition CC has been greatly enhanced, specifically in regards to the Match Loudness Module. It is now possible to define a True Peak Maximum, as well as Integrated/Program Loudness targets. It is also possible to customize Loudness Normalization Tolerence.

Adobe Audition CC has a handy Match Volume Processor with various options including Match To/ITU-R BS.1770-2 Loudness. The problem with this option is the Processor will not allow the operator to define a True Peak Ceiling. And so depending on various aspects of the input file, it’s possible the processed audio may not comply due to an unsuitable Peak Ceiling.

For example if you need to target -16.0 LUFS Program Loudness for internet/mobile distribution, the Match Volume Processor may need to increase gain in order to meet this target. Any time a gain increase is applied, you run the risk of pushing the Peak Ceiling to elevated levels.

The ITU Loudness processing option does supply a basic Limiting option. However – it’s sort of predefined. My tests revelaled Peak Ceilings as high as -0.1dBFS. This will result in insufficient headroom for both True Peak compliance and preparation for MP3 encoding.

The Audition Match Volume Processor also features a Match To/True Peak Amplitude option with a user defined True Peak Ceiling (referred to as Peak Volume). This is essentially a True Peak Limiter that is independent of the ITU Loudness Processor. For Program Loudness and True Peak compliance, it may be necessary to run both processing stages sequentially.

processor

There are a few caveats …

[– If the Match Volume Processor (Match To/ITU-R BS.1770-2 Loudness) applies limiting that results in a Peak Ceiling close to full scale, any subsequent limiting (Match To/True Peak Amplitude) has the potential to reduce the existing Program Loudness.

[– If a Match Volume process (Match To/ITU-R BS.1770-2 Loudness) yields a compliant True Peak Ceiling right out of the box, there is no need to run any subsequent processing.

Conclusion

If you are going to use these processing options, my suggestion would be to make sure the measured Program Loudness of your input file is reasonably close to the Program Loudness that you are targeting. Also, make sure the input file has sufficient headroom, with existing True Peaks well below 0dBFS.

If you are finding it difficult to achieve acceptable results, I suggest you apply the concepts described in this video tutorial that I produced. I demonstrate a sort of manual “off-line” Loudness Normalization process. If you prefer to handle this in real time (on-line), refer to my article “Podcast Loudness Processing Workflow.”

-paul.

Podcast Loudness Processing Workflow …

Below is Elixir by Flux. This is an ITU-R BS.1770/EBU R128 compliant multichannel True Peak Limiter. It’s just one of the tools available that can be used in the workflow described below. In this post I also mention the ISL True Peak Limiter by Nugen Audio.

If you have any questions about these tools or Loudness Meters in general, ping me. In fact I think my next article will focus on the importance of learning how to use a Loudness Meter, so stay tuned …

elixir

In my previous post I made reference to an audio processing workflow recommended by Thomas Lund. The purpose of this workflow is to effectively process audio files targeting loudness specifications that are suitable for internet and mobile distribution. in other words – Podcasts.

My first exposure to this workflow was reading “Managing Audio Loudness Across Multiple Platforms” written by Mr. Lund and included in the January 2013 edition of Broadcast Engineering Magazine.

Mr. Lund states:

“Mobile and computer devices have a different gain structure and make use of different codecs than domestic AV devices such as television. Tests have been performed to determine the standard operating level on Apple devices.

Based on 1250 music tracks and 210 broadcast programs, the Apple normalization number comes out as -16.2 LKFS (Loudness, K-weighted, relative to Full Scale) on a BS.1770-3 scale.

It is, therefore, suggested that when distributing Podcast or Mobile TV, to use a target level no lower than -16 LKFS. The easiest and best-sounding way to accomplish this is to:

[– Normalize to target level (-24 LKFS)

[– Limit peaks to -9 dBTP (Units for measurement of true peak audio level, relative to full scale)

[– Apply a gain change of +8 dB

Following this procedure, the distinction between foreground and background isn’t blurred, even on low-headroom platforms.”

Here is my interpretation of the steps referenced in the described workflow:

Step 1 – Normalize to target level -24.0 LUFS. (Notice Mr. Lund refers to LKFS instead of LUFS. No worries. Both are the same. LKFS translates to Loudness Units K-Weighted relative to Full Scale).

So how do we accomplish this? Simple – the source file needs to be measured and the existing Program Loudness needs to be established. Once you have this descriptor, it’s simple math. You calculate the difference between the existing Program Loudness and -24.0. The result will give you the initial gain offset that you need to apply.

I’ll point to a few off-line measurement utilities at the end of this post. Of course you can also measure in real time (on-line). In this case you would need to measure the source in it’s entirety in order to arrive upon an accurate Program Loudness measurement.

Keep in mind since random Program Loudness descriptors at the source will vary on a file to file basis, the necessary gain offset to normalize will always be different. In essence this particular step is variable. Conversely steps 2 and 3 in the workflow are static processes. They will never change. The Limiter Ceiling will always be -9.0 dBTP, and the final gain stage will always be + 8dB. The -16.0 LUFS target “math” will only work if the Program Loudness is -24.0 LUFS at the very beginning from file to file.

Think about it – with the Limiter and final gain stage never changing, – if you have two source files where file A checks in at -19.0 LUFS and File B checks in at -21.0 LUFS, the processed outputs will not be the same. On the other hand if you always begin with a measured Program Loudness of -24.0 LUFS, you will be good to go.

Examples:

[– If your source file checks in at -20.0 LUFS … with -24.0 as the target, the gain offset would be -4.0 dB.

gain

[– If your source file checks in at -15.6 LUFS … with -24.0 as the target, the gain offset would be -8.4 dB.

[– If your source file checks in at -26.0 LUFS … with -24.0 as the target, the gain offset would be +2.0 dB.

[– If your source file checks in at -27.3 LUFS … with -24.0 as the target, the gain offset would be +3.3 dB

In order to maintain accuracy, make sure you use the float values in the calculation. Also – it’s important to properly optimize the source file (see example below) before performing Step 1. I’m referring to dynamics processing, equalization, noise reduction, etc. These options are for the most part subjective. For example if you prefer less compression resulting in wider dynamics, that’s fine. Handle it accordingly.

Moving forward we’ve established how to calculate and apply the necessary gain offset to Loudness Normalize the source audio to -24.0 LUFS. On to the next step …

Step 2 – Pass the processed audio through a True Peak Limiter with it’s Peak Ceiling set to -9.0 dBTP. Typically I set the Channel or “Stereo” Link to 100%, limiting Look Ahead to 1.5ms and Release Time to 150ms.

Step 3 – Apply +8dB of gain.

You’re done.

You can set this up as an on-line process in a DAW, like this:

Lund-480

I’m using the gain adjustment feature in two instances of the Avid Time Adjuster plugin for the initial and final gain offsets. The source file on the track was first measured for Program Loudness. The necessary offset to meet the initial -24.0 LUFS target was -4 dB.

The audio then passes through the Nugen ISL True Peak Limiter with it’s Peak Ceiling set to -9.0 dBTP. Finally the audio is routed through the second instance of the Adjuster plugin adding +8 dB of gain. The Loudness meter displays the Program Loudness after 5 minutes of playback and will accurately display variations in Program Loudness throughout. Bouncing this session will output to the Normalized targets.

Note that you can also apply the initial gain offset, the limiting, and the final gain offset as independent off-line processes. The preliminary measurement of the audio file and gain offset are still required.

Example Workflow

Review the file attributes:

measurements-480
source_480

The audio is fairly dynamic. So I apply an initial stage of compression:

Intermediate-480

Next I apply additional processing options that I feel are necessary to create a suitable intermediate. I reiterate these processing options are entirely subjective. Your desire may be to retain the Loudness Range and/or dynamic attributes present in the original file. If so you will need to process the audio accordingly.

Here is the intermediate:

processed-stats-480
Processed-480

The Program Loudness for this intermediate file is -20.2 LUFS. The initial gain offset required would be -3.8 dB before proceeding.

After applying the initial gain offset, pass the audio through the limiter, and then apply the final gain stage.

This is the resulting output:

normalized-specs-480
new-loudness-normalized

That’s about it. We’re at -16.0 LUFS with a suitable True Peak Max.

I’ve experimented with this workflow countless times and I’ve found the results to be perfectly acceptable. As I previously stated – preparation of your source or intermediate file prior to implementing this three step process is subjective and totally up to you. The key is your output will always be in spec..

Offline Measuring Tools

I can recommend the following tools to measure files “off-line.” I’m sure there are many other options:

[– The new Loudness Meters by TC Electronic support off-line measurements of selected audio clips in Pro Tools (Audio Suite).

[– Auphonic Leveler Batch Processor. I don’t want to discount the availability and effectiveness of the products and services offered by Auphonic. It’s a highly recommended web service and the standalone application that includes high quality audio processing algorithms including Loudness Normalization.

[– Using FFmpeg from the command line.

Example syntax:

ffmpeg -nostats -i yourSourceFile.wav -filter_complex ebur128=peak=true -f null –

[– Using r128x from the command line.

Example syntax:

r128x yourSourceFile.wav

Note there is a Mac only front end (GUI) version of r128x available as well.

-paul.

Fresh Air Podcast: Audio Analysis …

In my No Free Pass for Podcasts post I talked about why the Broadcast Loudness specs. are not necessarily suitable for Podcasts. I noted that the Program Loudness targets for EBU R128 and ATSC A/85 are simply too low for internet and mobile audio distribution. Add excessively dynamic audio to the mix and it will complicate matters further, especially when listeners use mobile devices to consume their media in less than ideal ambient spaces.

fa-processed

Earlier today I was discussing this issue with someone who is well versed in all aspects audio production and loudness processing. He noted that ” … the consensus of it all is, that it is a bad idea to take a really nice standard that leaves plenty of headroom and then start creating new standards with different reference values.” The fix would be to “keep production and storage at -23.0 LUFS and then adjust levels in distribution.” Valid points indeed. However in the real world this mindset is unrealistic, especially in the internet/mobile/Podcasting space.

The fact of the matter is there is no way to avoid the necessity to revise the standards that simply do not work on a platform that consists of unique variables.

And so considering these variables, the implementation of thoughtful, revised, best practices that include platform specific targets for Program Loudness, Loudness Range, and True Peak are unavoidable. Independent Podcasters and network driven Podcasts using arbitrary production techniques and delivery methods simply need direction and guidance in order to comply. In the end it’s all about presenting well produced media to the listener.

Recently I came across a tweet where someone stated “I love the show but it is consistently too quiet to listen to on my phone.” They were referring to the NPR program Fresh Air. I’m not exactly sure if this person was referring to the radio broadcast stream or the distributed Podcast. Either way it’s an interesting assertion that I can directly relate to.

I subscribe to the Fresh Air Podcast. This will probably not surprise you – I refuse to listen to the Podcast right out of the box. When a new show pops up in Instacast, I download the file, decode to WAV, convert to stereo, and then reprocess the audio. I tweak the dynamic range and address show participant audio level variations using various plugins. I then bump things up to -16.0 LUFS (using what I like to refer to as “The Lund Method”) while supplying enough headroom to comply with -1.0 dBTP as my ultimate ceiling. I’ll get into the specifics in a future post.

According to the leading expert Mr. Thomas Lund:

“Mobile and computer devices have a different gain structure and make use of different codecs than domestic AV devices such as television. Tests have been performed to determine the standard operating level on Apple devices. Based on 1250 music tracks and 210 broadcast programs, the Apple normalization number comes out as -16.2LKFS (Loudness, K-weighted, relative to Full Scale) on a BS.1770-3 scale.

It is, therefore, suggested that when distributing podcast or Mobile TV, to use a target level no lower than -16LKFS. The easiest and best-sounding way to accomplish this is to: 1) Normalize to target level (-24LKFS); 2) Limit peaks to -9dBTP (Units for measurement of true peak audio level, relative to full scale); and 3) Apply a gain change of +8dB. Following this procedure, the distinction between foreground and background isn’t blurred, even on low-headroom platforms.”

In this snapshot I demonstrate the described workflow. I’m using two independent instances of the bx_control plugin to apply the gain offsets at various stages of the signal flow. After the initial calculated offset is applied, the audio is routed through the Elixr True Peak Limiter and then out through the second instance of bx_control applying +8dB of static gain. You can also replicate this workflow on an off-line basis. Note that I’ve slightly altered the limiting recommendation.

Lund-small

So why do I feel the need to do this?

Podcast Source

These are the specs. and the waveform overview of a recently published Fresh Air Podcast in it’s entirety:

raw-specs
fa-source-complete

Next is a 3 min. audio segment lifted from the published Podcast. The stats. display measurements of the attached 3 min. segment:

source_revised
source-1

Podcast Optimized for Internet/Mobile

Below is the same 3 min. segment. I reprocessed the audio to make it suitable for Podcast distribution. The stats. display measurements of the attached audio segment:

web-specs-2
source-2

The difference between the published source audio and the reprocessed version is quite obvious. The Loudness Normalized audio is so much more intelligible and easier to listen to. In my view the published audio is simply out of spec. and unsuitable for a Podcast.

Bear in mind the condition of the source audio is not uncommon. The problems that persist are not exclusive to podcasts distributed by NPR or by any of their affiliates. Networks with global reach need to recognize their Podcast distribution platforms as important mechanisms to expand their mass appeal.

It has been noted that the Public Radio community in general is exploring ways to enhance the way in which they produce their programs with focus on loudness standardization. My hope hope is this carries over to their Podcast platforms as well.

-paul.

For more information please refer to “Managing Audio Loudness Across Multiple Platforms” by Thomas Lund at TVTechnology.com.

No Free Pass for Podcasts …

I think it was in the mid to late 1980’s. I was still living home, totally fixated on what was happening with Television devices, programming and transmission. Mainly the advent of MTS Stereo compatible TV’s and VCR’s. I remember waiting patiently for weekly episodes of programs like Miami Vice and Crime Story to air. I would pipe the program audio through my media system in glorious MTS stereo. For me this was a game changer.

vice

I also remember it was around the same time that Cable TV became available in the area. I convinced my Mom and Dad to allow me to order it. Initially it was installed on the living room TV, and eventually made it’s way on to additional TV’s throughout our home. For the most part it was a huge improvement in terms of reception and of course program diversity.

However there was one issue that struck me from the very beginning: the wide variations in loudness between network TV Shows, Movies, and Adverts. In fact it was common for targeted, poorly produced, and exceedingly loud local commercials to air repeatedly throughout broadcast transmissions. Reaching for the remote to apply volume attenuation was a common occurrence and a major annoyance.

Obviously this was not isolated. The issue was widespread and resulted in a public outcry to correct these inconsistencies. In 2010 The CALM Act was implemented. The United States and Europe (and many other regions) adopted and now regulate loudness standardization guidelines for the benefit of the public at large.

If there is anyone out there who cannot relate to this “former” problem, I for one would be very surprised.

Well guess what? We now have the same exact problem existing on the most ubiquitous media distribution platform in existence – the internet.

I realize any expectation of widespread audio loudness standardization on the internet would be unreasonable. There’s just too much stuff out there. And those who create and distribute the media possess a wide scope of skills. However there is one sort of passionate and now ubiquitous subculture that may be ripe for some level of standardization. Of course I’m referring to the thousands upon thousands of independenlty produced Podcasts available to the masses.

In the past I’ve made similar public references to the following exercise. Just in case you missed it, please try this – at you own risk!

Put on your headphones and queue up this episode of The Audacity to Podcast. Set your playback volume at a comfortable level, sit back, and enjoy. After a few minutes, and without changing your playback volume setting – queue up this episode of the Entrepreneur on Fire podcast.

waves-1

Need I say more?

From what I gather both programs are quite popular and highly regarded. I have no intension of suggesting that either producer is doing anything wrong. The way in which they process their audio is their artistic right. On the other hand in my view there is one responsibility they both share. That would be the obligation to deliver well produced content to their subscribers, especially if the Podcast generates a community driven revenue stream. It’s the one thing they will always have in common. And so I ask … wouldn’t it make sense to distribute media following audio processing best practices resulting in some level of consistency within this passionate subculture?

I suspect that some Podcast producers purposely implement extreme Program Loudness levels in an attempt to establish “supremacy on the dial.” This issue also exists in radio broadcast and music production, although things have improved ever since Loudness War participants were called to task with the inception of mandatory compliance guidelines.

I’ve also noticed that many prolific Podcast Producers (including major networks) are publishing content with a total lack of Program Loudness consistency within their own catalogs form show to show. Even more troubling, Podcast aggregation networks rarely specify standardization guidelines for content creators.

It’s important to note that many people who consume audio delivered on the internet do so in less than ideal ambient spaces (automobiles, subways, airplanes etc.) using low-fi gear (ear buds, headphones, mobile devices, and compromised desktop near fields). Simply adopting the broadcast standards wouldn’t work. The existing Program Loudness targets are simply unsuitable, especially if the media is highly dynamic. The space needs revised specs. in order to optimize the listening experience.

Loudness consistency from a Podcast listener’s perspective is solely in the hands of the producers who create the content. In fact it is possible producers may even share common subscribers. Like I said – the space is ripe for standardization.

Currently loudness compliance recommendations are sparse within this massive community driven network. In my view it’s time to raise awareness. A target specification would universally improve the listening experience and ultimately legitimize the viability of the platform.

For the record, I advocate:

File Format: Stereo, 128kbps minimum.
Program Loudness: -16.0 LUFS with acceptance of a reasonable deviation.
Loudness Range: 8 LU, or less.
True Peak Ceiling: -1.0 dBTP in the distribution file. Of course this may be lower.

Quick note: when I refer to Podcasts, from a general perspective I am referring to audio programs and videos/screencasts/tutorials that primarily consist of spoken word soundtracks. Music based Podcasts or cinema styled videos with high impact driven soundtracks may not necessarily translate well when the Loudness Range (and Dynamic Range) is constricted.

For further technical insight, please refer to “Audio for Mobile TV, iPad, and iPod” – Thomas Lund, TC Electronic.

-paul.