Intelligibility Optimization

The attached image displays a processing workflow designed to optimize Spoken Word intelligibility. The workflow also demonstrates a realtime example of Integrated Loudness/Maximum True Peak compliance targeting.

There are 7 reference point Sections worth noting:

Section A includes the Adobe Audition Effects Rack Signal Level Meters indicating the source (Input) level and the (Output) level. The Output level reflects the results of the workflow’s inserted plugins. The chain includes a Compressor, a Limiter, and a Loudness Meter. Note the level meters indicate signal level. They do not indicate or represent perceptual Loudness.

Section B displays the gain reduction applied by the Compressor at the current position of the playhead. For the test/source audio I determined an average of 6dB of gain reduction would yield acceptable results. The purpose of this stage is to reduce the dynamic range and/or dynamic structure of the Spoken Word resulting in optimized intelligibility AND to prevent excessive down stream limiting. This is an important workflow element when preparing Spoken Word audio for Internet/Mobile, and Podcast distribution.

Section C includes my subjective limiting parameters. The Limiter will add the required amount of gain to achieve a -16.0 LUFS deliverable while adhering to a -1.5 dBTP (True Peak Max). If the client, platform, or workflow requires an alternative Loudness target and/or Maximum True Peak ceiling – the parameters and their mathematical relationship may be altered for customized targeting. Please note the Maximum True Peak referenced in any spec. is more of a ceiling as opposed to a target. In essence the measured signal level may be lower than the specified maximum.

Section D indicates the amount of limiting that is occurring at the current position of the playhead.

Section E displays the user defined Integrated Loudness target located above the circular Momentary Loudness LED (12 o’clock position). The defined Integrated Loudness target is also visually represented by the Radar’s second concentric circle. The Radar display indicates the Short Term Loudness measured over time within a 3 sec. window. The consistency of the Short Term Loudness is evident indicating optimized intelligibility.

Section F displays the unprocessed source audio that lacks optimization for Internet/Mobile, and Podcast distribution. Any attempt to consume the audio in it’s current state in a less than ideal listening environment will result in compromised intelligibility. Mobile device consumption in like environments will exacerbate compromised intelligibility.

Section G displays the processed/optimized audio suitable for the noted distribution platform. The Integrated Loudness, True Peak, and LRA descriptors now satisfy compliance targets. Notice there is no indication of excessive limiting.


Technorati Tags: ,

Real Time Print To Track

Logic and Audition users will be familiar with the term Bounce to Track. This process allows the user to perform an Off-line Mixdown of a selected group of Session Tracks without physically exporting. In most cases the Mixdown appears on a supplemental target Track.

Bouncing Off-line is a time saver. However it can be precarious. It would be irresponsible to submit a finished piece of audio to a client without 100% conformation the bounced delivery file (most likely slated for distribution) is glitch free. In essence it is imperative to throughly check your piece prior to submission.

Off-line Bounce (aka Bounce to Disk) was once notoriously absent from Pro Tools. Avid finally implemented support a few years ago.

In professional Post Production, engineers may perform a real time (On-line) Bounce of a mix Session. The process is commonly referred to as Printing. It requires the operator to sit through the Session in it’s entirety.

Besides glitch detection capabilities, it is possible to edit clips before the playhead reaches their location. As well, you can edit clips and/or sub-segments within a previously completed Print and only re-Print the manipulated segment.

So how is this done? Simple – if the DAW or Interface supports it.

For instance in Pro Tools the user can assign Bus outputs to the input of a standard Audio Track. The key is you can ARM a standard Audio Track to record any signal that is passing through it. This would be the Print Track.

Adobe Audition CC does not support direct Bus Output —>> Audio Track assignments. However, it is still possible to implement a Print workflow (see attached image). You will need a supported Audio Interface with a Mix Return. Simply assign all Session Tracks and Buses to the Main Output. Then add a supplemental Audio Track. Set it’s input to Mix Return. ARM the Track to record and fire away.


Technorati Tags: ,

Loudness Meter Scale Variations

I thought I’d revisit various aspects of Loudness Meter Absolute/Relative Scale correlation, and provide a visual representation of a real time processing Session with both Scales active.

Descriptors and Scales

Modern Loudness Meters display various descriptors including Program Loudness – also referred to as Integrated Loudness. There are two scales that can be used to display measured Program or Integrated Loudness over time …

The most common is an Absolute Scale, displayed in LUFS or LKFS. LUFS refers to Loudness Units relative to Full Scale. LKFS refers to Loudness Units K-Weighted relative to Full Scale. There is no difference in the perceptual measured loudness between both descriptor references.

It is also possible to measure and display Integrated/Program Loudness as Loudness Units (or LU’s) on a Relative Scale where 1LU == 1 dB.

When shifting to a Relative Scale, the 0 LU increment is always equivalent to the Meter’s user defined or spec. defined Absolute Loudness target.

For example, in an R128 -23.0 LUFS Absolute Scale workflow, setting the Meter to display a Relative Scale changes the target to 0 LU.

So – if a piece of measured audio checks in at -23.0 LUFS on an Absolute Scale, it would be perceptually equal to measured audio checking in at 0 LU on a Relative Scale.

Likewise if the Meter’s Absolute Scale target is set to -16.0 LUFS, it will correlate to 0 LU on a Relative Scale. Again both would reflect perceptual equivalence.

All broadcast delivery specifications suggest Absolute Scale Integrated Loudness targets. However, for any number of subjective reasons – many operators prefer to use the alternative Relative Scale and “mix or master to 0 LU.”

Please note Loudness Units are also the proper way in which to describe Loudness differentials between two programs. For instance, “Program (A) is +2 LU louder than Program (B).” One might also describe gain offsets in LU’s as opposed to dB’s.

LU Meter

Hornet Plugins recently released Hornet LU Meter. This tool is a Loudness Meter plugin designed to measure and display Integrated/Program Loudness within a 400ms time window. This measurement represents the Momentary Loudness descriptor.

The Meter is indeed nifty and affordable. However there is one sort of caveat worth noting: As the name suggests, it is an LU Meter. In essence Integrated (Momentary) Loudness measurements are solely displayed on a Relative Scale.


The displayed Session (image) consists of a single mono VO clip. The objective is to print a processed stereo version in RT checking in at -16.0 LUFS with a maximum True Peak no higher than -2.0 dBTP.

The output of the mono VO track is routed to a mono Auxiliary Input track titled Normalize. If you are not familiar with Pro Tools, an Auxiliary Input track is not the same as an Auxiliary Send. Auxiliary Input tracks allow the user to pass signal using buses, insert plugins, and adjust level. They are commonly used to create sub-mixes.

I’ve inserted a Compressor and a Limiter on the Normalize Auxiliary Input track. The processed audio is passing through at -19.0 LUFS (mono).

The audio is then routed to a second (now stereo) Auxiliary Input track titled Offset. I use the track fader to apply a +3 dB gain offset, This will reconstitute the loss of gain that occurs on center panned mono tracks. The attenuation is a direct result of the Pro Tools Pan Depth setting.

The signal flow/output is now passing -16.0 LUFS audio. It is routed to a standard audio track titled Print. When this track is armed to record, it is possible to initiate a realtime bounce of the processed/routed audio.

The Meters

Notice the instances of the Hornet LU Meter and TC Electronics Loudness Radar. Both Meters are inserted on the Master Bus and are measuring the session’s Master Output.

I set the Reference (target) on the Hornet LU Meter to -16.0 LUFS. In essence 0 LU on it’s Relative Scale represents -16.0 LUFS.

Conversely the TC Electronic Meter is configured to display Absolute Scale measurements. The circular LED that borders the Radar area indicates Momentary Loudness. The defined Integrated Loudness target is displayed under the arrow at the 12 o’clock position.

Remember the Hornet LU Meter solely displays Momentary Loudness. If you compare it’s current reading to the indication of Momentary Loudness on the TC Electronic Meter, the relationship between Relative Scale and Absolute Scale measurement is clearly indicated. Basically the Hornet Meter registers just below 0 LU. The TC Electronic Meter registers just below -16.0 LUFS.

I will say if you are comfortable monitoring real time Momentary Loudness and understand Relative/Absolute Scale correlation, the Hornet tool is quite useful. In fact it contains additional features such as Grouping, auto/manual Gain Compensation, and auto-Maximum Peak protection.

Additional insight on the K-weighting Curve or K-weighted filtering:

K-weighting suggests de-emphasized low frequencies by way of a high-pass filter. A high-shelving filter is applied to the upper frequency range, and the measured data is averaged.

TC Electronic describes applied K-weighting on audio channels as a “method to build a bridge between subjective impression and objective measurement.”


Technorati Tags: , ,

Elixir ITU True Peak Limiter

Certain ISP/True Peak Limiters provide added compliance processing flexibility. Case in point: Elixir by Flux.


Before processing or Loudness Normalizing, execute an offline measurement on an optimized source clip.

An optimized audio clip may exhibit the benefits of various stages of enhancement processing such as noise reduction and dynamic range compression.

The displayed clip (see attached image) checks in at -19.6 LUFS. It requires +3.6 dB of gain to meet a -16.0 LUFS Integrated Loudness target. Based on the pre-existing peak ceiling approximately 1.5 dB of limiting will be necessary to establish a -2.0 True Peak maximum.

Processing Example

We use the Limiter’s Input Gain setting to take the clip down to -24.0 LUFS (-4.4 dB for the measured displayed clip).

The initial -24.0 LUFS target will restore headroom and establish a consistent starting point for downstream limiting accuracy. This will allow the Threshold and Output Gain settings to be recognized and implemented as static parameters for all -16.0 LUFS/-2.0 dBTP (stereo) processing. The Input Gain setting however will be variable based on the measured attributes of the optimized source.

Set the Threshold to -10 dB(TP) and the Output Gain to +8dB. The processing may be implemented offline or in real time. The output audio will reflect accurate targets (-16.0 LUFS/-2.0 dBTP) and the applied limiting will be transparent.


The proprietary functional parameters included on the Elixir Limiter are not necessarily included on Limiters designed by competing developers. In essence the described workflow may need to be customized based on the attributes of the Limiter.

The key is the “math” and static parameters never change, unless of course you decide to alter the referenced targets.

Let me know if you have questions …


Technorati Tags: ,

Programmatic Ads and Loudness Standardization

This is a re-post of an article that I published in October, 2015 …

In a recent Midroll article titled “Why Programmatic Ads Aren’t Necessarily Great for Podcasting,” the staff writer states:

“A number of players in the Podcasting and advertising industries are making bets on programmatic Ad delivery — dynamically inserting Ads into a Podcast as the episode is downloaded. It’s an understandable temptation, but we at Midroll see some tradeoffs.”

I wonder how networks will handle potential perceived Loudness inconsistencies between produced Ads and new or preexisting programs?


I’ve mentioned my past affiliation with IT Conversations and The Conversations Network, where I was the lead post audio engineer from 2005-2012. Executive Director Doug Kaye built a proprietary content management system and infrastructure that included an automated component based Show Assembly System. Audio components were essentially audio clips (Intros, Outros, Ads, Credits. etc.) combined server side into Podcasts in preparation for distribution.

One key element in this implementation was the establishment of perceived Loudness consistency across all submitted audio components. This was accomplished by standardizing an average Loudness Target using a proprietary software RMS Normalizer to process all server side audio components prior to assembly. (Loudness Normalization is now the recommended process for Integrated Loudness targeting and consistency).

Due to this consistency, all distributed Podcasts were perceptually equal with regard to Integrated or Program Loudness upon playback. This was for the benefit of the listener, removing the potential need to make constant playback volume adjustments within a single program and throughout all programs distributed on the network.

Regarding Programmatic Ad insertion, I have yet to come across a Podcast Network that clearly states a set Integrated Loudness Target for submitted programs. (A Maximum True Peak requirement is equally important. However this descriptor has no effect on perceptual Loudness consistency).

Due to the absence of any suggested internal network guidelines or any form of standardized Loudness Normalization, dynamic Ad insertion has the potential to ruin the perceptual consistency within single programs and throughout the contents of an entire network.

Many conscientious independent producers have embraced the credible -16.0 LUFS Integrated Loudness Target for stereo Internet/ Mobile/Podcast audio distribution (the perceptual equivalent for mono distribution is -19.0 LUFS). It’s far from a requirement, and nothing more than a suggested guideline.

My hope is Podcast Networks will begin to recognize the advantages of standardization and consider the adoption of the -16.0 LUFS Integrated Loudness Target. Dynamically inserted Ads must be perceptually equal to the parent program. Without a standardized and pre-disclosed Integrated Loudness Target, it will be near impossible to establish any level of distribution consistency.


Technorati Tags: ,

Adobe Audition CC Productivity

Below I’ve listed a few Adobe Audition CC (ver.2015.2.1) features/options that may be obscure and perhaps underutilized.



1- Maximize Active Frame (⌘↓). This command toggles full screen display accessibility of the active (blue outlined) UI Panel.

2- Lock In Time (Multitrack). When activated, selected clips are pinned to their current location. I mapped ⌥⌘L for this function.

3- Group (⌘G) (Multitrack). Multiple clips will be congregated and may be repositioned cumulatively.

4- Suspend Groups (⏎⌘G) (Multitrack). This function temporarily deactivates the Group. Actually, this command toggles the behavior between deactivate and activate. There are also options to Remove Focus Clip from Group and Ungroup Selected Clips. They both support custom shortcut mapping,

5- Right + Click on any Clip’s Fade Handle (Multitrack) to display the following customization menu:

– No Fade
– Fade In/Out
– Crossfade
– Symmetrical
– Asymmetrical
– Linear
– Cosine
– Automatic Crossfade Enabled

6- Bounce to New Track (Multitrack). This feature will process and combine multiple clips located on a single track or multiple tracks. This will free up system resources. The following options support custom shortcut mapping:

– Selected Track
– Time Selection
– Selected Clips In Time Selection
– Selected Clips Only

7- Convert To Unique Copy (Multitrack). This function creates a sub clip derived from the original trimmed source clip. Media Handles are no longer accessible in the converted copy (Multitrack and/or Waveform Editor environments). I mapped ⌥⌘C for this function.


1- Time Selection in all Tracks (Multitrack). This is a Ripple Delete variation (⏎⌘⌦) that will retain clip relevant Marker position(s).

2- Split All Clips Under Playhead (Multitrack). I mapped ⌥⌘R for this function.

3- Merge Clips (remove thru edits) (Multitrack). I mapped ⌥⌘J for this function.

Mixer/Track Inserts and Sends

1- Individual Track supplied buttons will designate Sends and Inserts as Pre or Post Fader.


1- Markers implemented in the Waveform Editor may be Merged thus allowing easy selection of encapsulated audio.

2- Selected Range Markers present in the Waveform Editor may be exported as individual clips.

3- Selected Range Markers present in the Waveform Editor may be added to a Playlist where they may be reordered for auditioning.


1- The (Multitrack) Session Export Dialog includes user defined Mixdown options:

– Master: Stereo, Mono, or 5.1
– Signal present on individual Tracks
– Signal present on individual Busses

2- Export with Adobe Media Encoder (Multitrack). This Export option runs Media Encoder and requires the user to select a predefined Media Encoder preset. Routing options are available as well.


Technorati Tags:

CNN and Program Loudness Tolerance

I recently analyzed a few of the internal Podcasts produced by CNN. One particular installment is yet another example of a major media outlet distributing audio that is in my view unsuitable for this particular platform.

Let’s discuss file attributes and measured specs. for one of CNN’s distributed Podcasts:

The distributed audio is mono, 64kbps, with music elements. I’ve stated how I feel about this. I’m not a proponent of 64 kbps MP3 audio PERIOD (mono or stereo). In general audio in this format sounds horrible. Feel free to disagree.

Secondly, the Integrated (Program) Loudness for this particular program is just about -23.0 LUFS with a Maximum True Peak of +0.40 dBTP. From my perspective the perceptual Loudness misses the mark. And, the audio is clipped.

Lastly, the produced audio is way too dynamic for spoken word. The perceptual inconsistency of the delivery by the participants is inadequate when considering how (for the most part) this program will be consumed (mobile devices, problematic ambient spaces, etc.).

I decided to sort of showcase this particular program because it is a good candidate for flexible Target considerations. What do I mean by “flexible Target considerations?” Let me explain …

Again, the distributed file is mono. The recommended Integrated Loudness Target for mono Podcasts is -19.0 LUFS. This is the perceptual equivalent of -16.0 LUFS stereo. If I were to apply a +4 db gain offset to Loudness Normalize this audio to -19.0 LUFS, there would be very little change in the original dynamic structure of the audio. However without some form of aggressive limiting, the maximum amplitude or Peak Ceiling would be driven into oblivion. In fact audible distortion may occur with or without limiting. This is obviously not recommended.

There are two options to consider: 1) apply Dynamic Range Compression before Loudness Normalization, or 2) shoot for a lower Integrated Loudness target. For this particular example I chose to implement both options.

First, in my view optimizing the dynamics in this program for Podcast distribution is unavoidable. It’s just way too choppy and it lacks delivery consistency for spoken word. Also, by lowering the L.Normalized Target, the necessary added gain offset will be reduced resulting in less aggressive limiting. In addition, the reduced amount of added gain will curtail noise floor elevation and other variables such as exaggerated breaths.

As noted the distributed Podcast (displayed in the attached upper waveform example) checks in at -23.0 LUFS and it is clipped. My optimized version (displayed in the lower waveform example) checks in at -20.2 LUFS with a Maximum True Peak of -1.23 dBTP. It is well within a reasonable level of Program Loudness tolerance for Podcast L.Normalization. In fact the perceptual difference between the processed -20.0 LUFS audio and a -19.0 LUFS version would be pretty much undetectable. In essence the audio has been optimized and it exhibits improved intelligibility. It is now well suited for Podcast distribution.


(If you are interested in the tools that I use, they are listed under Available Services).

It is no secret that I am a staunch proponent of the -16.0 LUFS/-19.0 LUFS recommendations for Podcasts. However, in certain situations – tolerance for slightly reduced Program Loudness Targets is acceptable.

For the record – my remaster is much easier to listen to. CNN can do better.


Technorati Tags: ,

Loudness Measurements and Silence

Consider this: Two extended segments of audio, Loudness Normalized (or mixed in real time) to the same Integrated Loudness Target.

Segment (A) is fairly consistent, with a very limited amount of intermittent silence gaps.

Segment (B) is far less consistent, due to a multitude of intermittent silence gaps.

When passing both segments through a Loudness Meter (or measuring the segments offline), and recognizing Integrated Loudness is a reflection of the average perceptual Loudness of an entire segment – how will inherent silence affect the accuracy of the cumulative measurements?

In theory the silence gaps in Segment (B) should affect the overall measurement by returning a lower representation of average Integrated Loudness. If additional gain is added to compensate, Segment (B) would be perceptually louder than Segment (A).

Basically without some sort of active measurement threshold, the algorithms would factor in silence gaps and return an inaccurate representation of Integrated Loudness.

The Fix

In order to establish perceptual accuracy silence gaps must be removed from active measurements. Loudness Meters and their algorithms are designed to ignore silence gaps. The omission of silence is based on the relationship between the average signal level and a predefined threshold.

Loudness Meter (G10) Gate

The specification Gate (G10) is an aspect of the ITU Loudness Measurement algorithms included in compliant Loudness Meters. It’s function is to temporarily pause Loudness measurements when the signal drops below a relative threshold, thus allowing only prominent foreground sound to be measured.

The relative threshold is -10 LU below ungated LUFS. Momentary and Short Term measurements are not gated. There is also a -70 LUFS Absolute Gate that will force metering to ignore extreme low level noise.

Most Loudness Meters reveal a visual indication of active gating (see attached image) and confirm the accuracy of displayed measurements.


Additional Gate Generalizations and Nomenclature

Common Noise Gate

A Downward Expander and it’s applied attenuation is dependent on signal level when the signal drops below a user defined threshold. The Ratio dictates the amount of attenuation. Alternatively a Noise Gate functions independent of signal level. When the level drops below the defined threshold, hard muting is applied.

Silence Gate

This is a somewhat proprietary term. It is a parameter setting available on the Aphex 320A and 320D Compellor hardware Leveler/Compressor.


When a passing signal level drops below the user defined Silence Gate threshold for 1 second or longer, the device’s VCA (Voltage Controlled Amplifier) gain is frozen. The Silence Gate will prevent the Leveling and Compression processing from releasing and inadvertently increasing the audibility of background noise.


Technorati Tags: , ,

Understanding Pan Mode Options

Adobe Audition and Logic Pro X include Pan Mode preference options that determine track output gain for center panned mono clips included in stereo sessions. These options are often the source of confusion when working with a combination of mono and stereo clips, especially when clips are pre-Loudness Normalized prior to importing.

In Audition, the Left/Right Cut (Logarithmic) option retains center panned mono clip gain. The -3.0 dB Center option, which by the way is customizable – will attenuate center panned mono clip gain by the specified dB value.

For example if you were targeting -16.0 LUFS in a stereo session using a combination of pre-Loudness Normalized clips, and all channel faders were set to unity – the imported mono clips need to be -19.0 LUFS (Integrated). The stereo clips need to be -16.0 LUFS (Integrated). The Left/Right Cut Pan Mode option will not alter the gain of the center panned mono clips. This would result in a -16.0 LUFS stereo mixdown.

Conversely the -3.0 dB Center Pan Mode option will apply a -3 dB gain offset (it will subtract 3 dB of gain) to center panned mono clips resulting in a -19.0 LUFS stereo mixdown. In most cases this -3 LU discrepancy is not the desired target for a stereo mixdown. Note 1 LU == 1 dB.

As stated Logic Pro X provides a similar level of Pan Mode flexibility. I’ve also tested Reaper, and it’s options are equally flexible.

Pro Tools

Pro Tools Pan Mode support (they call it Pan Depth) is somewhat restricted. The preference is limited to Center Pan Mode, with selectable dB compensation options (-2.5 dB, -3.0 dB, -4.5 dB, and -6.0 dB).

There are several ways to reconstitute the loss of gain that occurs in Pro Tools when working with center panned mono clips in stereo sessions. One option would be to duplicate a mono clip and place each instance of it on hard-panned discrete mono tracks (L+R respectively). Routing the mono tracks to a stereo output will reconstitute the loss of gain.

A second and much more efficient method is to route all individual instances of mono session clips to a stereo Auxiliary Input, and use it to apply the necessary compensating gain offset before the signal reaches the stereo Master Output. The gain offset can be applied using the Aux Input channel fader or by using an inserted gain trim plugin. Stereo clips included in the session can bypass this Aux and should be directly routed to the stereo Master Output. In essence stereo clips do not require compensation.

Example Session

Have a look at the attached Pro Tools session snapshot. In order to clearly display the signal path relative to it’s gain, I purposely implemented Pre-Fader Metering.


Notice how the mono spoken word clip included on track 1 is routed (by way of stereo Bus 1-2) to a stereo Auxiliary Input track (named to Stereo). Also notice how the stereo signal level displayed by the meters on the Stereo Auxiliary Input track is lower than the mono source that is feeding it. The level variation is clear due to Pre-Fader Metering. It is the direct result of the session’s Pan Depth setting that is subtracting -3dB of gain on this center panned mono track.

Next, notice how the signal level on the Master Output has been reconstituted and is in fact equal to the original mono source. We’ve effectively added +3dB of gain to compensate for the attenuation of the original center panned mono clip. The +3dB gain compensation was applied to the signal on the Auxiliary Input track (via fader) before routing it’s output to the stereo Master Output.

So it’s: Center Panned mono resulting in a -3dB gain attenuation —>> to a stereo Aux Input with +3dB of gain compensation —>> to stereo Master Output at unity.

In case you are wondering – why not add +3dB of gain to the mono clip and bypass all the fluff? By doing so you would be altering the native inherent gain structure of the mono source clip, possibly resulting in clipping. My described workflow simply reconstitutes the attenuated gain after it occurs on center panned mono clips. It is all necessary due to Pro Tool’s Pan Depth methods and implementation.


Technorati Tags: , ,

Utilizing Multiple Outputs for Recording

The vast majority of audio industry professionals use DAWS running on proficient computer systems to record audio directly to secondary hard disks. For some reason direct to disk recording is not widely endorsed in the Podcasting space. Many consultants (for various reasons) advise against this recording method. Instead, they recommend the use of inexpensive hand-held solid state Recorders.

For instance I’ve heard a few people state “computers cause ground loops”, hence the widespread Portable Recorder recommendation. In my opinion that is a half-baked assertion. In fact, ANY electronic component in a signal chain (including your electrical system) is capable of producing inherent noise. Often the replacement of cheaply manufactured components (interfaces, mixers, processors, cables, etc.) will solve audible noise problems. The key is to isolate the source and correct or replace it.

Portable Recorders are well suited for location interviews and video shoots. For in-studio sessions I feel direct to disk recording on a proficient system is much more flexible compared to the use of an external device. More so, the sole use of a Portable Recorder without a proper backup strategy is flat out risky.

That being said I thought I would document a basic Skype Recording session that I implemented in Pro Tools using a multi-output Motu Audio Interface. The incoming audio will be recorded on a secondary hard disk installed (or interfaced) on the host system. The real time session audio will also be routed to an alternate Interface Output, feeding an external Recorder for backup purposes.


Note a multi-output Mixer can be used in place of an Audio Interface. As far as software you can use any modern DAW to replicate the described session. If you are using a Mac, Rogue Amoeba’s distinctive Audio Hijack application is also highly capable.


1-Record Studio Host and Skype Participant on discrete mono tracks in real time.

2-Combine the discrete recordings and create a split-stereo clip with independent dynamics processing applied to each channel, all in real time.

3-Use a Pre-Fader Send to independently control the level of the split-stereo discrete recording, and patch the real time signal to the Interface S/PDIF Output. This will feed the external Recorder’s S/PDIF Input.

4-Monitor the session through Headphones and play out through Desktop near-field Monitors.

Please review the displayed Pro Tools session snapshot.

• The Input for the mono Host track is the Interface connected mic. The Input for the mono Skype track is “Mix 1 Return.” This is an Interface supported feature, allowing the operator to route the computer’s Output (in this case Skype) to an available DAW Input. This configuration effectively creates a mix-minus with discrete, unprocessed recordings on individual mono tracks.

• The mono recording tracks are routed to individual mono Aux Input tracks using Buses. The Aux Input tracks are hard-panned L+R and contain various inserted processing options, including a Gain Trim, Expander, and Compressor.

The processing applied in this session is not intended to replace what would normally occur in post. The Compressors are there just to tame dynamics in the event either participant exceeds nominal input levels. The Expander is set up to apply mild attenuation when the host is not speaking.

• The Aux Input tracks have their Outputs set to a common stereo Bus.

• Finally a third standard stereo audio track (Rec-Sum) uses the stereo Bus Output(s) as it’s Inputs. By hard panning the channels L+R we are able to maintain discrete channel separation within any printed stereo clip.

To record the discrete raw audio and the processed split-stereo audio in real time, we simply arm all session Audio tracks to record and fire away. The session can be monitored through Headphones and played out through near fields via the Main Output.

Secondary Output

The Motu Interface used for this session has a total of 8 Outputs, including a stereo S/PDIF option. I implemented Pre-Fader Send on the session’s Rec-Sum channel with it’s Output set to S/PDIF. This will route the track’s split-stereo audio to the S/PDIF stereo Input of an external Marantz CF Recorder. With the Send designated as Pre-Fader, it’s level control will be independent of the parent (Rec-Sum) channel fader, thus allowing discrete control of the real time signal being fed to the Recorder.

Note in the displayed Pro Tools session snapshot – the floating fader positioned to the left of the mixer is a user friendly and easily accessible copy of the much smaller Send fader displayed in the parent (Rec-Sum) track.

In summary, we can successfully initialize and capture 4 recordings in a single pass: the raw Host audio, the raw Skype participant audio, a split-stereo processed version of the Skype session, and a split-stereo copy of the processed Skype session stored on the Recorder.

The image below displays the completed session with the split-stereo clip playing through the Main Outputs.


My general recommendation:when it is feasible, use direct to disk and Portable recording options in unison on a proficient system to capture in-studio multitrack and single participant Podcast sessions.


Technorati Tags: , ,

Bit Depth and Dithering

In a professional environment Dithering will be applied to audio clips when reducing word length. This process will mask errors that occur due to the removal of digital audio bits. I thought I’d cover the basics.


Digital Audio

Digital Audio incorporates individual samples consisting of bits created by the process of Quantization. This is essentially the conversion of a continuous, linear range of values present in analog audio into a fixed range of discrete values. Bit Depth (a.k.a. Word Length or Resolution) represents the number of bits stored in a sample’s measure of amplitude. It indicates the extent of inherent vertical precision. Higher bit depths (or bits per sample) encompass improved vertical dynamic resolution resulting in an extended Dynamic Range.

1 bit = 6dB of Dynamic Range. Theoretically 16bit audio has a quantified Dynamic Range of 96 dB. 24bit audio has a quantified Dynamic Range of 144 dB. However, in order to accurately assess Dynamic Range we must also recognize the amplitude of the highest spectral component of the inherent noise floor. Specifically, where it resides relative to the maximum Peak value that a system is capable of reproducing. Dynamic Range is the measurement of this ratio or range.

Signal to Noise Ratio (SNR) is the quantified range between the nominal average signal level and the average level of the noise floor. Audio with an extended Dynamic Range will exhibit a higher SNR compared to audio with a reduced Dynamic Range. In essence 24bit audio will allow you to work with additional headroom without any increase in noise compared to 16bit audio.

Word Length Reduction

Truncation is the removal of bits with no compensating replacement. The repositioning of samples after converting to a lower resolution creates Quantization Errors resulting in audible artifacts and distortion. Dithering is technology that adds minimal perceived noise to audio before word length reduction. This noise will minimize and/or mask the audibility of distortion caused by Quantization Errors. It will also help preserve the sound quality and Dynamic Range of a higher resolution clip when converting or exporting to a lower bit depth.

There is a trade off: you are replacing bad noise with alternative “good” noise that is smoother, less audible, and much more consistent.

Noise Shaping is a supplemental feature that pushes Dithering noise into frequency ranges that are less audible to humans, thus allowing greater Dither with reduced perceptual noise.

Take a look at the Noise Shaped frequency response curve in the attached image. There is a clear visual indication of increased gain at higher frequencies that we are less susceptible to.


So what does this all mean for the typical Podcast Producer? Is Dithering just another obscure aspect of professional Audio Mastering and Post Production that can be safely ignored?

Consider the following variables:

If you are recording spoken word in a well suited environment that is reasonable quiet, and you are using capable (and trouble free) gear that is properly configured, there is really no reason to record 24bit audio. In my honest opinion with proper handling 16bit audio from acquisition to distribution will be perfectly acceptable.

Remember, I’m specifically referring to spoken word audio slated for Podcast distribution. If you are tracking music, well then by all means make full use of the advantages of higher resolution audio recording.

If you elect to record 24bit audio, and you are not properly implementing the word length reduction to 16bit, you are essentially nulling the advantages of the original higher resolution audio. When down-converting, you will be unknowingly degrading the sound quality by introducing artifacts and distortion. That’s not my opinion – it is a fact.

Consider this: The stand-alone version of iZotope’s Ozone 7 Mastering Suite processes all imported audio to 32bit word length. The manual specifically states:

“If you select a bit depth other than 32-bit, you may want to apply dither to your export. Ozone processes files at 32-bit so dither is desirable for files being exported to values lower than 32-bit.”

Most DAWS include Dithering options. In some cases it’s by way of a plugin. You may also notice Dithering options included in application Preferences or Export dialogs. Hopefully after reading this article you will understand what it all means and whether you should consider implementing it. Please note that Dither must be applied at the very last stage of any processing chain.


Technorati Tags: , , ,

AES “Recommendation for Loudness of Audio Streaming & Network File Playback.”

I’d like to share my observations and views on the recently published AES Technical Document AES TD1004.1.15-10 that specifics best practices for Loudness of Audio Streaming and Network File Playback.

The document is a collection of Loudness processing guidelines for diverse platform dependent media streaming and downloading. This would include music, spoken word, and possible high dynamic audio in video streams. The document credits some of the most well respected industry leading professionals, including Bob Katz, Thomas Lund, and Florian Camerer. The term “Podcast” is directly referenced once in the document, where the author(s) state:

Network file playback is on-demand download of complete programs from the network, such as podcasts.”

I support the purpose of this document, and I understand the stated recommendations will most likely evolve. However in my view the guidelines have the potential to create a fair amount of confusion for producers of spoken word content, mainly Podcast producers. I’m specifically referring to the suggested 4 LU range (-16.0 to -20.0 LUFS) of acceptable Integrated Loudness Targets and the solutions for proper targeting.

Indeed compliance within this range will moderately curtail perceptual loudness disparities across a wide range of programs. However the leniency of this range is what concerns me.

I am all for what I refer to as reasonable deviation or “wiggle room” in regard to Integrated Loudness Target flexibility for Podcasts. However IMHO a -20 LUFS spoken word Podcast approaches the broadcast Loudness Targets that I feel are inadequate for this particular platform. A comparable audio segment with wide dynamics will complicate matters further.

I also question the notion (as stated in the document) of purposely precipitating clipping when adding gain “to handle excessive peaks.”

And there is no mention of the perceptual disparities between Mono and Stereo files Loudness Normalized to the same Integrated Loudness Target. For the record I don’t support mono file distribution. However this file format is prevalent in the space.


I feel the document’s perspective is somewhat slanted towards platform dependent music streaming and preservation of musical dynamics. In this category, broad guidelines are for the most part acceptable. This is due to the wide range of production techniques and delivery methods used on a per musical genre basis. Conversely spoken word driven audio is not nearly as artistically diverse. Considering how and where most Podcasts are consumed, intelligibility is imperative. In my view they require much more stringent guidelines.

It’s important to note streaming services and radio stations have the capability to implement global Loudness Normalization. This frees content creators from any compliance responsibilities. All submitted media will be adjusted accordingly (turned up or turned down) in order to meet the intended distribution Target(s). This will result in consistency across the noted platform.

Unfortunately this is not the case in the now ubiquitous Podcasting space. At the time of this writing I am not aware of a single Podcast Network that (A) implements global Loudness Normalization … and/or … (B) specifies a requirement for Integrated Loudness and Maximum True Peak Targets for submitted media.

Currently Podcast Loudness compliance Targets are resolved by each individual producer. This is the root cause of wide perceptual loudness disparities across all programs in the space. In my view suggesting a diverse range of acceptable Targets especially for spoken word may further impede any attempts to establish consistency and standardization.

PLR and Retention of Music Dynamics

The document states: “Users may choose a Target Loudness that is lower than the -16.0 LUFS maximum, e.g., -18.0 LUFS, to better suit the dynamic characteristics of the program. The lower Target Loudness helps improve sound quality by permitting the programs to have a higher Peak to Loudness Ratio (PLR) without excessive peak limiting.”

The PLR correlates with headroom and dynamic range. It is the difference between the average Loudness and maximum amplitude. For example a piece of audio Loudness Normalized to -16.0 LUFS with a Maximum True Peak of -1 dBTP reveals a PLR of 15. As the Integrated Loudness Target is lowered, the PLR increases indicating additional headroom and wider dynamics.

In essence low Integrated Loudness Targets will help preserve dynamic range and natural fidelity. This approach is great for music production and streaming, and I support it. However in my view this may not be a viable solution for spoken word distribution, especially considering potential device gain deficiencies and ubiquitous consumption habits carried out in problematic environments. In fact in this particular scenario a moderately reduced dynamic range will improve spoken word intelligibility.

Recommended Processing Options and Limiting

If a piece of audio is measured in it’s entirety and the Integrated Loudness is higher than the intended Target, a subtractive gain offset normalizes the audio. For example if the audio checks in at -18.0 LUFS and you are targeting -20.0 LUFS, we simply subtract 2 dB of gain to meet compliance.

Conversely when the measured Integrated Loudness is lower than the intended Target, Loudness Normalization is much more complex. For example if the audio checks in at -20.0 LUFS, and the Integrated Loudness Target is -16.0 LUFS, a significant amount of gain must be added. In doing so the additional gain may very well cause overshoots, not only above the Maximum True Peak Target, but well above 0dBFS. Inevitably clipping will occur. From my perspective this would clearly indicate the audio needs to be remixed or remastered prior to Loudness Normalization.

Under these circumstances I would be inclined to reestablish headroom by applying dynamic range compression. This approach will certainly curtail the need for aggressive limiting. As stated the reduced dynamic range may also improve spoken word intelligibility. I’m certainly not suggesting aggressive hyper-compression. The amount of dynamic range reduction is of course subjective. Let me also stress this technique may not be suitable for certain types of music.

Additional Document Recommendations and Efficiency

The authors of the document go on to share some very interesting suggestions in regard to effective Loudness Normalization:

1) “If level has to be raised, raise until it reaches Target level or until True Peak reaches 0 dBTP, whichever occurs first. Thus, the sound quality will be preserved, without introducing excessive peak limiting.”

2) “Perform what is noted in example 1, but keep raising the level until the program level reaches Target, and apply either peak limiting or allow some clipping to handle excessive peaks. The advantage is more consistent loudness in the stream, but this is a potential sonic compromise compared to example 1. The best way to retain sound quality and have more consistent loudness is by applying example 1 and implementing a lower Target.”

With these points in mind, please review/demo the following spoken word audio segment. In my opinion the audio in it’s current state is not optimized for Podcast distribution. It’s simply too low in terms of perceptual loudness and too dynamic for effective Loudness Normalization, especially if targeting -16.0 LUFS. Due to these attributes suggestion 1 above is clearly not an option. In fact neither is option 2. There is simply no available headroom to effectively add gain without driving the level well above full scale. Peak limiting is unavoidable.


I feel the document suggestions for the segment above are simply not viable, especially in my world where I will continue to recommend -16.0 LUFS as the recommended Target for spoken word Podcasts. Targeting -18.0 LUFS as opposed to -16.0 LUFS is certainly an option. It’s clear peak limiting will still be necessary.

Below is the same audio segment with dynamic range compression applied before Loudness Normalization to -16.0 LUFS. Notice there is no indication of aggressive limiting, even with a Maximum True Peak of -1.7 dBTP.


Regarding peak limiting the referenced document includes a few considerations. For example: “Instead of deciding on 2 dB of peak limiting, a combination of a -1 dBTP peak limiter threshold with an overall attenuation of 1 dB from the previously chosen Target may produce a more desirable result.”

This modification is adequate. However the general concept continues to suggest the acceptance of flexible Targets for spoken word. This may impede perceptual consistency across multiple programs within a given network.


The flexible best practices suggested in the AES document are 100% valid for music producers and diverse distribution platforms. However in my opinion this level of flexibility may not be well suited for spoken word audio processing and distribution.

I’m willing to support the curtailment of heavy peak limiting when attempting to normalize spoken word audio (especially to -16.0 LUFS) by slightly reducing the intended Integrated Loudness Target … but not by much. I will only consider doing so if and when my personal optimization methods prior to normalization yield unsatisfactory results.

My recommendation for Podcast producers would be to continue to target -16.0 LUFS for stereo files and -19.0 LUFS for mono files. If heavy limiting occurs, consider remixing or remastering with reduced dynamics. If optimization is unsuccessful, consider lowering the intended Integrated Loudness Target by no more than 2 LU.

A True Peak Maximum of <= -1.0 dBTP is fine. I will continue to suggest -1.5 dBTP for lossless files prior to lossy encoding. This will help ensure compliance in encoded lossy files. What’s crucial here is a full understanding of how lossy, low bit rate coders will overshoot peaks. This is relevant due to the ubiquitous (and not necessarily recommended) use of 64kbps for mono Podcast audio files.

Let me finish by stating the observations and recommendations expressed in this article reflect my own personal subjective opinions based on 11 years of experience working with spoken word audio distributed on the Internet and Mobile platforms. Please fell free to draw your own conclusions and implement the techniques that work best for you.


Technorati Tags: , ,

Quantifying Podcast Audio Dynamics

I’ve discussed the reasons why there is a need for revised Loudness Standards for Internet and Mobile audio distribution. Problematic (noisy) consumption environments and possible device gain deficiencies justify an elevated Integrated Loudness target resulting in audio that is perceptually louder on average compared to Loudness Normalized audio targeted for Broadcast. Low level, highly dynamic audio complicates matters further. The recommended Integrated Loudness targets for Internet and Mobile audio are -16.0 LUFS for stereo files and -19.0 LUFS for mono. They are perceptually equal.

In terms of Dynamics, I’ve expressed my opinion regarding compression. In my view spoken word audio intelligibility will be improved after careful Dynamic Range Compression is applied. I stress that I do not advocate aggressive compression that may result in excessive loudness and possible quality degradation. The process is a subjective art that takes practice with accessibility to well designed tools along with a full understanding of all settings.


I thought I would discuss various aspects of Podcast audio Dynamics. Mainly, why an extended Dynamic Range is potentially problematic and how to quantify it using various descriptors and measurement tools. I will also discuss the benefits of Dynamic Range management as a precursor to Loudness Normalization. Lastly I will disclose recommended benchmarks that are certainly not requirements. Feel free to draw your own conclusions and target what works best for you.

Highly Dynamic Audio in Noisy Environments

Extended or “High Dynamic Range” at it’s core describes wide disparities in a piece of audio between high and low level passages. When this is prevalent in a spoken word segment, intelligibility will be compromised, especially if the listening environment is less than ideal.

For example if you are traveling below Manhattan on a noisy subway, and a Podcast talent’s delivery is inconsistent, you would be forced to make realtime playback volume adjustments to compensate for the inconsistent high and low level passages. And if the Integrated Loudness is well below what is recommended, the listening device may very well be incapable of applying a sufficient volume boost due to insufficient gain. Dynamic Range Compression will reestablish intelligibility. It will also provide additional headroom that will optimize the audio for Loudness Normalization.

Dynamic Range Compression and Loudness Normalization

I would say in most cases successful Loudness Normalization for Broadcast compliance requires nothing more than a simple subtractive gain offset. For example if your mastered piece checks in at -20.0 LUFS (stereo), and you were targeting R128 (-23.0 LUFS Integrated), subtracting -3dB of gain will most likely result in compliant audio. By doing so the original dynamic attributes of the piece will be retained.

Things get a bit more complicated when your Integrated Loudness target is higher than that of the source. For example a mastered -20.0 LUFS piece would need additional gain to meet a -16.0 LUFS target. In this case you may need to apply a significant amount of limiting to prevent the Maximum True Peak from exceeding your target. In essence without safeguards, added gain may result in clipping. The key is to avoid aggressive limiting (aka “Hard Limiting”) if at all possible. So how do we optimize the audio before the gain offset is applied?

I’ve found that a moderate to low amount of Dynamic Range Compression applied to audio segments before Loudness Normalization will prevent instances of aggressive limiting when processing highly dynamic audio. The amount of compression is of course subjective. Often a mere 1-2 dB of gain reduction will be sufficient. The results will always depend on just how dynamic the source audio is before normalizing.

I carefully manage spoken word dynamics throughout client project workflows. I simply maintain sufficient headroom prior to Loudness Normalization. In most cases I am able to meet the intended Integrated Loudness and Maximum True Peak targets (without limiting) by simply adding gain.

By design iZotope’s RX Loudness Control also applies compression in certain instances of Loudness Normalization. I suggest you read through the manual. It is packed with information regarding audio loudness processing and Loudness Normalization.


iZotope states the following:

“For many mixes, dynamics are not affected at all . This is because only a fixed gain is required to meet the spec . However, if your mix is too dynamic or has significant transients, compression and/or limiting are required to meet Short-term/Momentary or True Peak parts of the spec.”

“RX Loudness Control uses compression in a way that preserves the quality of your audio . When needed, a compressor dynamically adjusts your audio to ensure you get the
best sound while remaining compliant . For loudness standards that require Short-term
or Momentary compliance, the compressor is engaged automatically when loudness exceeds the specified target.”

It’s a highly recommended tool that simplifies offline processing in Pro Tools. Many of it’s features hook into Adobe’s Premiere Pro and Media Encoder.

LRA, PLR, and Measurement Tools

So how do we quantify spoken word audio dynamics? Most modern Loudness Meters are capable of calculating and displaying what is referred to as the Loudness Range (LRA). This particular descriptor is displayed in Loudness Units (LU’s). It represents statistical differences in loudness over time. This indicator can help operators decide whether Dynamic Range Compression may be necessary for optimum intelligibility on a particular platform.

I will say before I came across sort of rule of thumb (recommended) guidelines for Internet and Mobile audio distribution, the LRA in the majority of the work that I’ve produced hovered around 6 LU. In the highly regarded article “Audio for Mobile TV, iPad and iPod,” the author and leading expert Thomas Lund of TC Electronic suggests an LRA “not much higher than 8 LU” for optimal “Pod Listening.” Basically higher LRA readings suggest wider dynamics that may not be suitable for mobile platform distribution.

Some Loudness Meters also display the PLR descriptor, or Peak to Loudness Ratio. This correlates with headroom and dynamic range. It is the difference between the Program (average) Loudness and maximum amplitude. Assuming a piece of audio has been Loudness normalized to -16.0 LUFS along with an awareness of a True Peak Maximum somewhere around -1.0 dBTP, it is easy to recognize the general sweet spot for the mobile platform (PLR less than 16).

Note that aggressively compressed and heavily limited “loud” audio will exhibit very low PLR readings. For example if the measured Integrated Loudness of a particular program is -10.0 LUFS with a Maximum True Peak of -1.0 dBTP, the reduced PLR (9) clearly indicates aggressive processing resulting in elevated perceptual loudness. This should be avoided.

If you are targeting -16.0 LUFS (Integrated), and your True Peak Maximum is somewhere between -1.0 and -3.0 dBTP, your PLR is well within the recommended range.

Pay close attention to your Loudness Range. Use it to gauge delivery consistency, dynamics, and whether optimization may be necessary. If your Loudness Range is close to and not much higher than 8 LU, your audio will be well suited for a Podcast and will exhibit optimal intelligibility.

LRA Measurements can be performed in real time using a compliant Loudness Meter like Nugen Audio’s VisLM 2, TC Electronic’s LM2n Loudness Radar, and iZotope’s Insight. Some meters can also perform offline measurements in supported DAWs. There are a number of stand alone third party measurement options available as well, including iZotope’s RX5 Advanced Audio Editor, Auphonic Leveler, FFmpeg, and r128x.


“Audio for Mobile TV, iPad, and iPod” by Thomas Lund

***Please note I personally paid for my RX Loudness Control license and I have no formal affiliation with iZotope.

Technorati Tags: , , ,

Adobe Audition Multiband Compressor

I thought I’d clear up a few misconceptions regarding the Multiband Compressor bundled in Adobe Audition. Also, I’d like to discuss the infamous “Broadcast” preset that I feel is being recommended without proper guidance. This is an aggressive preset that applies excessive compression and heavy limiting resulting in processed audio that is often fatiguing to the listener.


The Basics

The tool itself is “Powered by iZotope.” They are a well respected audio plugin and application development firm. Personally I think it’s great that Adobe decided to bundle this processor in Audition. However, it is far from a novice targeted tool. In fact it’s pretty robust.

What’s interesting is it’s referred to as a “Multiband Compressor.” This is slightly misleading, considering the processor includes a Peak Limiter stage along with it’s advertised Multiband Compressor. I think Dynamics Processor would be a more suitable name.

Basically the multi-band Compressor includes 3 adjustable crossovers, resulting in 4 independent Frequency Bands. Each Band includes a discrete Compressor with Threshold, Gain Compensation, Ratio, Attack, and Release settings. Bands can be soloed or bypassed.

There is global Peak Limiter module located to the right of the Compressor settings. This module may be activated or bypassed. Without a clear understanding of the supplied settings for the Limiter, you run the risk of generating excessive loudness when processing audio. I’m referring to a substantial increase in perceived loudness.

The Limiter Parameters

The Threshold is the limiting trigger. When the input signal surpasses it, limiting is activated. The Margin is what defines the Peak Ceiling. As you decrease the Threshold, the signal is driven up to and against the Margin resulting in an increase in average loudness. This also results in dynamic range reduction.

Activating the “Brickwall Limiter” feature in the supplemental Options module will ensure accurate Margin compliance. In essence you will be implementing Hard Limiting. Deactivating this option may result in “overs” and/or peaks that exceed the specified Margin.

The bundled Broadcast preset defaults the Limiter Threshold setting to -10.0 dB with a Margin of -0.1 dBFS. Any alternative Threshold settings are of course subjective. I’m suggesting that it may be a good idea to ease up on this default Threshold setting. This will result in less aggressive limiting and a reduction of average levels.

I’m also suggesting that the default Margin setting of -0.1 is not recommended in this context. I would set this to -1.0 dBFS or lower (-1.5 dBFS, or even -2.0 dBFS).

Please note this is not a True Peak Limiter. Your processed lossless audio file has the potential to loose headroom when and if it is converted to a lossy codec such as MP3.

At this point I suggest no changes should be made to the Attack and Release settings.

The Compressors

We cannot discount additional settings included in the Broadcast preset that are contributing to the aggressive processing. If you examine the Ratio settings for each independent compression module, 3:1 is the highest set Ratio These predefined Ratios are fairly moderate and for starters require no adjustment.

However, notice the Threshold settings for each compression module as well as the Gain Compensation setting in Module (band) 4 (+3 dB).

First, the low Threshold settings result in fairly aggressive compression per band. Also, the band 4 gain compensation is generating a further increase in average level for that particular band.

Again the settings and any potential adjustments are subjective. My recommendation would be to experiment with the Threshold settings. Specifically, cut back by reducing all Thresholds while maintaining their relative relationship. Do this by activating the “Link Band Controls” setting located in the supplemental Limiter Options.

View the red Gain Reduction meters included in each module. Monitor the amount of attenuation that occurs with the default Threshold settings. Compare initial readings with the gain reduction that occurs after you make your adjustments. Your goal is to ease up on the gain reduction. This will result in less aggressive compression. Remember to use your ears!


An area of misinformation for this processor is the purpose of the Output Gain adjustment, located at the far upper right of the interface. Please note this setting does not define the Peak Ceiling! Remember – it is the Margin setting in the Limiter module that defines your Ceiling. The Output Gain simply adds or cuts global output level after compression. Think of if it as Global Gain compensation.

To prove my point, I dug out a short video demo that I created sometime last year for a community member.

With the Broadcast preset selected, and the Output Gain set to -1.5 dBFS – the actual output Peak Amplitude surpasses -1.5 dBFS, even with the Brickwall option turned ON. This reading is displayed numerically above the Output Gain meter(s) in real time.

In the second pass of the test I set the Output Gain to 0 dBFS. I then set the Limiter Margin to -1.5 dBFS. As the audio plays through you will notice the output is limited to and never surpasses -1.5 dBTP. Just keep your eye on the numerical, realtime display.

Video Demo Link

I purposely omitted any specific references to Attack and Release settings. They are the source for a future discussion.


Here’s an alternative use recommendation for this Adobe Multiband Compressor: DeEssing.

Use the Spectrum Analyzer to determine the frequency range where excessive sibilant energy occurs. Set two crossovers to encapsulate this range. Bypass the remaining associated compression modules. Tweak the remaining active band compression settings thus allowing the compressor to attenuate the problematic sibilant energy.

If you find the supplied Spectrum Analyzer difficult to read, consider using a third party option with higher resolution to perform your analysis.


Please note – in order to get the most out of this tool, you really need to learn and understand the basics of dynamics compression and how each setting will affect the source audio. More importantly, when someone simply suggests the use of a preset, take it with a grain of salt. More than likely this person lacks a full understanding of the tool, and may not be capable of providing clear instructional guidance for all functions. It’s a bad mix – especially when charging novices big bucks for training.

By the way, nothing wrong with being a novice. The point is paid consultants have an obligation to provide expert assistance. Boiler plate suggestions serve no purpose.


Technorati Tags: ,

Skype, Logic Pro X, and Aggregate Devices …


Studio Host and Skype participant to be recorded inside Logic Pro X on a single machine (single pass) with no additional hardware other than a Mic Input Device.


[– Two independent mono Host/Participant stems with no processing.

[– One processed split-stereo mixdown of the session with the Host and Guest residing on discrete (L+R) channels.

[– Real time Processing and Recording of all instances.


Of course the objectives noted above are easily attainable using two independent machines, with the recording box running Logic Pro X and the Skype machine handling the connection. In this case you would also need to use a mixer to set up a proper mix-minus.

You can also implement similar workflows by using two inexpensive USB audio interfaces connected to a single machine.

Considering the resourcefulness of today’s modern day Macs, I’m confident the following workflow will be successful freeing the user from complexities and added costs.

OSX Aggregate Devices

The foundation of this setup is based on a user created Aggregate Audio Device. Aggregate devices appear in the OSX System Preferences/Sound I/O options for system wide use. By wrapping supported “Subdevices” into a single Aggregate, you effectivly create a sort of cumulative Input Device that can be designated in Logic as the default. We also need a software utility that supports routing of the Skype Output to an Input in Logic.

I originally created this workflow using SoundFlower that was installed on my secondary iMac and carried over form previous versions of OSX. SoundFlower, along with the iMac’s Line Input were wrapped into a single Aggregate Device, and then designated in Logic as the default Input.

This worked well. However, I had no plans to install the now unsupported SoundFlower on my production MacPro for further testing. And so I looked around for a suitable up to date (and actively developed) replacement for SoundFlower.

Sound Siphon

Sound Siphon by Static Z Software “… makes your Mac’s Audio Output available as an Audio Input Device. It enables you to send audio from one application to another where it can be processed, streamed, or recorded.

Exactly what I needed.

Note that Sound Siphon is very diverse in terms of features. And the developer states that many useful enhancements are in the works. You can download a restricted demo. My hope is that you consider purchasing a $29.99 license. This will ensure the longevity of the application and continued development. Note that I have no affilation and I gladly purchased a license.

This is a snapshot of Sound Siphon:


In the example above I display a user defined Device (“Capture Safari”) that is essentially a Custom Audio Input. I then associated the Safari Application with this device. This becomes a system wide option to capture Safari audio. For example QuickTime X will now display “Capture Safari” as an Input option for audio recording.

It’s important to note that this particular Sound Siphon feature is supplemental to the Skype recording implementation. In other words – it’s an entrley different use case scenario. My goal here is to disclose the flexibility of the application.

Creating the Aggregate Device

Input 1 on my Mackie Onyx 1220i Mixer receives the output from a dbx 286A Voice Processor. The studio Mic is connected to the processor for proper gain staging. I needed to wrap the Mic signal along with the Skype audio into a single Input Device and designate it in Logic’s Preferences for proper routing.

To create an Aggregate Device, open Audio MIDI Setup, located in ~/Applications/Utilities. When creating a new Aggregate, supported Subdevices appear in the right side setup table.


Notice that Sound Siphon is listed as a 2 in/2 out device in the left source view. This is created when you install the application. Once installed, it will be available to be wrapped into an Aggregate Device along with pre-existing devices.

For my implementation I created “Skype Tracker” as a new Aggregate and selected my mixer (Onyx-(2528)) and Sound Siphon as Subdevices. Up top you set your Sample Rate and the Clock Source. My system seems to perform better with Sound Siphon set as the Clock Source.

It’s important to review the Input Channel matrix of the new Aggregate Device. Notice that Sound Siphon will only support Input channels (17+18). When routing Inputs in Logic, I will use Input 1 for the studio Mic and Input 17 for Skype.


Here are the Skype settings that I am using:


The Microphone is set to the Aggregate Device. The Speakers option is set to Sound Siphon. This setting is imperative and from what I can tell non-flexiable.

Logic Pro X

The first thing we need to do is define the Input Device in Global Preferences/Audio/Devices. I set mine to the Aggregate Device:


Next we will address setup and routing. What’s important here is that I use an Object in Logic that may not be immediately obvious in your particular installation.

Specifically, I often use Input Channel Strip Objects in my projects. They are implemented in the Environemnt (aka “MIDI Environment”). It is accessible form the Logic Window Menu.

From the Logic Docs regarding Input Channel Strips:

“The Input Channel Strip allows you to directly route and control signals from your audio hardware’s Inputs. Once an Input Channel Strip is assigned to an Audio Channel Strip, it can be monitored and recorded directly into Logic Pro, along with its effect plug-ins.

The signal is processed, inclusive of plug-ins even while Logic Pro is not playing. In other words, Input Channel Strips can behave just like external hardware processors. Aux sends can be used pre- or post-fader.

Input Channel Strips can be used as live Inputs that can stream audio signals from external sources (such as MIDI synthesizers and sound modules) into a stereo mix (by bouncing an Output Channel Strip).”

You can also create Bus Channel Strip Objects in the Environment. They are not the same as Auxiliary Channel Strips and can be quite useful in certain instances. For more information about Bus Channel Strips please refer to this article.

The Environment

To expose the accessability of the Logic Environment, open global Preferences and access the Advanced options. The MIDI option needs to be selected as part of the Advanced Tools:


Once that setting is ticked, “Open Midi Environment” will appear as an option in the Logic Window Menu.

Channel Strip Objects are added to the Environment from the New Menu/Channel Strip. Notice how the Environment emulates the Project Mixer:


Note that when adding Input Channel Strips in the Environment, you must define the corresponding (Aggregate) Device Inputs using the Channel Strip editor:


For this particular project I created two Input Channel Strips in the Environment using Inputs 1 and 17 respectively, based on Aggregate Subdevice availability (Input 1 = Mic, Input 17 = Skype).

You will also need 4 Audio Tracks (2 Mono, 1 Stereo, 1 PreListen), and 2 (Mono) Auxiliary Channel Strips. Create Audio Tracks using the Track/New Tracks option – located in the Logic Application Menu. Add Auxiliary Channel Strips using the Mixer’s Options Menu/Create New … || Note that the Input Channel Strips created in the Environment should be designated Mono.

Here is my Project Mixer with all necessary Objects and Routing:



The reddish labeled channels are the two Input Channel Strips that I created in the Environment. If you look at the text at the very top of these Channel Strips, you will see their Input designations.

The signals coming in through the Inputs are routed to their own independent Aux Channels for processing. Notice I inserted a Gain Trim on the Mic Input Channel. All processing options are of course subjective. One example would be to insert two instances of a Compressor on each Aux Channel. You would set these up to apply real time, non-aggresive dynamic range compression as you record.

Moving forward – notice the Aux Channels are Mono and hard panned L+R respectivly. This will maintain channel separation when recording the split-stereo version of the session. In this example each Aux Channel Output is routed to Audio Channel 3 (“Split Record”). This Stereo Audio Track is panned center. When armed it will record the Aux Channel Outputs to a split-stereo file.

Also study how I set up the remaining Audio Tracks – Audio Track 1 (“Rec. Mic”) and Audio Track 2 (“Rec. Skype”). Their Inputs are set to Bus 1 and 2 respectively, allowing these tracks to receive the unprocessed Outputs (“dry” audio) from the Input Channel Strips.

Keep in mind that if Effects are inserted on the Input Channel Strips, the audio routed to Audio Tracks 1+2 will be processed. In most cases I would not insert any Effects on the Input Channel Strips other than Gain. My intension here is to record dry stems.

I Grouped various aspects of these two channels, mainly Volume, Mute, Solo, and Record. This will link the faders and make it easy to control audibility of the mono stems cumulatively.

Wrap Up

That’s basicilly it. You can record/monitor all tracks in real time. And when you are done, there is no need to bounce, although you still can. You simply “Export” or “Export Region” as an individual file(s).



You may have noticed the Outputs for the Auxiliary Channel Strips (1+2) and the Input for Audio Track 3 (“Split Record”) is Bus 3. This is in fact a virtual (permanent) Bus used to route the processed audio to Track 3 for recording.

When you select a permanent virtual Bus in Logic for routing, an Auxiliary Channel Strip is auto-created and will appear in the Mixer. For this particular workflow – we use two Auxiliary Channel Strips, one for Mic processing and a second for Skype processing.

Throughout this entire workflow no changes were made to my default OSX Audio I/O Settings located in System Preferences/Sound.

As I always say – Audio Tracking and Post are highly subjective arts. In fact many Logic “experts” have never heard of or utilized the options in the Environment. And your processing options are also subjective. My hope is this documentation will at the very least introduce you the creation and usage of Aggregate Devices.

If by chance you develop a successful alternative solution, all well and good. In my tests I’ve found the documented implementation to work quite well.

Let me know if you have any questions.

I’d like to thank my friend Victor Cajiao for his help while testing this workflow.


Technorati Tags: , ,