Adobe Audition Multiband Compressor

I thought I’d clear up a few misconceptions regarding the Multiband Compressor bundled in Adobe Audition. Also, I’d like to discuss the infamous “Broadcast” preset that I feel is being recommended without proper guidance. This is an aggressive preset that applies excessive compression and heavy limiting resulting in processed audio that is often fatiguing to the listener.


The Basics

The tool itself is “Powered by iZotope.” They are a well respected audio plugin and application development firm. Personally I think it’s great that Adobe decided to bundle this processor in Audition. However, it is far from a novice targeted tool. In fact it’s pretty robust.

What’s interesting is it’s referred to as a “Multiband Compressor.” This is slightly misleading, considering the processor includes a Peak Limiter stage along with it’s advertised Multiband Compressor. I think Dynamics Processor would be a more suitable name.

Basically the multi-band Compressor includes 3 adjustable crossovers, resulting in 4 independent Frequency Bands. Each Band includes a discrete Compressor with Threshold, Gain Compensation, Ratio, Attack, and Release settings. Bands can be soloed or bypassed.

To the right of the compression settings there is global Peak Limiter module that can be activated or bypassed. Without a clear understanding of the supplied settings for the Limiter, you run the risk of generating excessive loudness when processing audio. I’m referring to a substantial increase in perceived loudness.

The Limiter Parameters

The Threshold is the limiting trigger. When the input signal surpasses it, limiting is activated. The Margin is what defines the Peak Ceiling. And so as you decrease the Threshold, the signal is driven up to and against the Margin resulting in an increase in average loudness. This also results in dynamic range reduction.

Activating the “Brickwall Limiter” feature in the supplemental Options module will ensure accurate Margin compliance. In essence you will be implementing Hard Limiting. Deactivating this option may result in “overs” or peaks that exceed the specified Margin.

The bundled Broadcast preset defaults the Limiter Threshold setting to -10.0 dB with a Margin of -0.1 dBFS. Any alternative Threshold settings are of course subjective. I’m suggesting that it may be a good idea to ease up on this default Threshold setting. This will apply less aggressive limiting resulting in a reduction of average levels.

I’m also suggesting that the default Margin setting of -0.1 is not recommended in this context. I would set this to -1.0 dBFS or lower (-1.5 dBFS, or even -2.0 dBFS).

Please note this is not a True Peak Limiter. Your processed lossless audio file has the potential to loose headroom when and if it is converted to a lossy codec like MP3.

At this point I suggest no changes should be made to the Attack and Release settings.

The Compressors

We cannot discount additional settings included in the Broadcast preset that are contributing to the aggressive processing. If you examine the Ratio settings for each independent compression module, 3:1 is the highest set Ratio These predefined Ratios are fairly moderate and for starters require no adjustment.

However, notice the Threshold settings for each compression module as well as the Gain Compensation setting in Module (band) 4. It’s +3 dB.

First, the hefty Threshold settings result in some fairly aggressive compression per band. Also, the band 4 gain compensation is generating a further increase in average level for that particular band.

Again the settings and any potential adjustments are subjective. My recommendation would be to experiment with the Threshold settings. Specifically, cut back by reducing all Thresholds while maintaining their relative relationship. Do this by activating the “Link Band Controls” setting located in the supplemental Limiter Options.

Use the Red Gain Reduction meters included in each module to monitor the amount of attenuation that occurs with the default Threshold settings. Then compare these readings to what occurs after you make your adjustments. Your goal is to ease up on the gain reduction. This will result in less aggressive compression. Remember to use your ears!


An area of misinformation for this processor is the purpose of the Output Gain adjustment, located at the far upper right of the interface. Please note this setting does not define the Peak Ceiling! Remember – it’s the Margin setting in the Limiter module that defines your Ceiling. The Output Gain simply adds or cuts global output level after compression. Think of if it as Global Gain compensation.

To prove my point, I dug out a short video demo that I created sometime last year for a community member.

With the Broadcast preset selected, and the Output Gain set to -1.5 dBFS – the actual output Peak Amplitude surpasses -1.5 dBFS, even with the Brickwall option turned ON. This reading is displayed numerically above the Output Gain meter(s) in real time.

In the second pass of the test I set the Output Gain to 0 dBFS. I then set the Limiter Margin to -1.5 dBFS. As the audio plays through you will notice the output is limited to and never surpasses -1.5 dBTP. Just keep your eye on the numerical, realtime display.

Video Link

I purposely omitted any specific references to Attack and Release settings. They are the source for a future discussion.


Here’s an alternative use recommendation for this Adobe Multiband Compressor: DeEssing.

Use the Spectrum Analyzer to determine the frequency range where excessive sibilant energy occurs. Set two crossovers to encapsulate this range. Bypass the remaining associated compression modules. Tweak the remaining active band compression settings thus allowing the compressor to attenuate the problematic sibilant energy.

If you find the supplied Spectrum Analyzer difficult to read, consider using a third party option with higher resolution to perform your analysis.


Please note – in order to get the most out of this tool, you really need to learn and understand the basics of dynamics compression and how each setting will affect the source audio. More importantly, when someone simply suggests the use of a preset, take it with a grain of salt. More than likely this person lacks a full understanding of the tool, and may not be capable of providing clear instructional guidance for all functions. It’s a bad mix – especially when charging novices big bucks for training.

By the way, nothing wrong with being a novice. The point is paid consultants have an obligation to provide expert assistance. Boiler plate suggestions serve no purpose.


Technorati Tags: ,

dbx 286s: Beyond The Basics …

The dbx brand has been a favorite of mine since the late 1970’s. My first piece of dbx kit was a stand-alone noise reduction unit that I coupled with an old Teac Reel to Reel Tape Deck. Through the years I’ve owned various EQ’s and Dynamics processors, including the highly regarded 160A Compressor. I purchased mine in 2006.


In January 2011 I was skimming through eBay listings looking for a dbx 286A Microphone Preamp Processor. At the time I had heard the original 286 model was co-designed by Bob Orban, and both models were widely used in Radio Broadcast facilities. I found it interesting that Radio Engineers would use a piece of gear that was not only cheap in terms of cost – but unconventional in terms of controls.


One piece was available on eBay, supposedly used for 4 hours at a party in Hollywood Hills California, and then boxed for resale. The seller had a positive reputation, so I grabbed it for $115. Upon arrival it’s condition was as described, and it’s been in my rack ever since.

The 286/286A has evolved into the 286s, quite frankly an outright steal priced at $199. Due to it’s straight forward approach and affordable price, the Podcasting community has embraced it and often classifies it as “drool-worthy.” Pretty amusing.


In this article I am going to discuss the attributes of the Compressor stage and the De-Esser. I’ll demystify the DeEsser and talk about the importance of the Output (Gain) Compensation setting.


I mentioned the processor is unconventional. For example the Compressor’s Drive and Density controls essentially replace Threshold, Ratio, Attack, and Release – present on most Compressors.

The De-Esser requires a user defined High-Pass Frequency designation and Threshold to reduce excessive sibilance. Setup can be time consuming due to the lack of any visual representation of problematic energy in need of attenuation.


Compression results depend on the level (and dynamics) of the incoming signal and the corresponding settings. On a conventional compressor the Threshold monitors the incoming signal. When the signal surpasses the Threshold, processing engages and gain reduction is activated. The Ratio determines the amount of gain reduction. The Attack will affect how aggressively (or the speed at which) gain reduction initializes and then reaches maximum attenuation. The Release will control the speed of the transition from full attenuation – back to the original level

The Drive control on the 286s determines the amount of gain reduction (compression) that will be applied to the incoming signal. Higher settings will increase the input signal level and yield more aggressive compression (and noise).

How much gain reduction should you shoot for? Well that’s subjective. I would recommend experimenting with 6-12dB of gain reduction. Of course results will vary on a case by case basis due to obvious variables (mic selection, preamp level, etc.)


When using a compressor to process spoken word, improper Release settings can result in choppiness, often referred to as pumping. The key is to have the gain reduction occurrences smoothly transition between instances of audible sound and natural pauses (silence).

The 286s uses a variable program dependent Release. In the event you feel (and hear) the necessity to speed up or slow down the program dependent Release – the Density control will come in handy.

Note that the Density scale on the 286s is again somewhat unconventional. On a typical dynamics processor – setting the Release full counter-clockwise would result in a very fast Release. As the setting is adjusted clockwise, the Release duration would be extended. The scale usually transitions from milliseconds to full seconds.

On the 286s, think of Density as a linear speed controller, where “1” (counter-clockwise) is slow and “10” (full clockwise) is fast.

For normal speech I would recommend experimenting with the Density set between 3 and 5.

The De-Esser

If you check around you will notice a wide range of references regarding the frequency range where sibilance generally occurs. In reality there are so many variables and each instance of sibilance will need to be properly identified and handled accordingly.

The 286s De-Esser uses a variable high-pass filter that tells the processor where to begin to attenuate problematic energy. This Frequency control has a range of 800Hz-10kHz. The user manual states ” … settings between 4-8kHz will yield the best results for vocal processing.” This is good starting point. However proper setup requires time consuming arbitrary tweaking that may result in a low level of accuracy. A visual representation of the frequency range of the excessive sibilant energy will solve this problem. Once you identify the frequencies and/or range where most of the energy is present, setting the Frequency on the 286s will be demystified.

The De-Esser’s Threshold setting controls the amount of attenuation (sensitivity) and will remain constant as the input level changes.

Have a look at the spectral analysis below:


Notice the excessive energy in the 2-6kHz range (Frequency Range is represented on the X axis). For this particular segment of audio I would initially set the Frequency control on the 286s to 5kHz. Next I would adjust the Threshold until the sibilant energy is attenuated. I would then sweep the Frequency setting within the visual range of the sibilant energy and fine tune both settings until I achieve the most pleasing results. The key is not to over do it. Heavy attenuation will suppress vital energy and remove any hint of natural presence and sparkle.

To perform this analysis excersize – set the Threshold setting on the 286s to OFF. Pass the output of the processor to your DAW of choice and perform a real time spectral analysis of your voice using a software plugin the includes a Spectrum Analyzer. You can use any supported EQ plugin with it’s controls bypassed. You can also use something like the free (AU/VST) Span plugin by Voxengo (note that Span is CPU intensive).

Output Gain Compensation

Gain Compensation is an integral part of Audio Compression. It is most commonly used to offset the gain reduction that occurs when audio is compressed. It is often referred to as Make-up Gain. When this gain offset is applied to compressed audio, the perceived, average level of the audio is increased. Excessive Make-up Gain can sometimes elevate noise that may have been previously inaudible at lower average levels.

Earlier I discussed how an elevated Drive control setting on the 286s will increase the input signal of low level source audio. In doing so you may pick up a suitable amount of compression. However you also run the risk of a noticeable increase in noise. In this particular scenario, try setting the Output Gain on the 286s to a negative value to offset the gain (and noise) that may have been introduced by the Drive setting.


I think it’s important to first learn the basics of Audio Compression from a conventional perspective. In doing so you will find it easier to get the most out of the unconventional controls on the dbx 286s, especially Drive and Density.

And let’s not forget that De-Essing is really nothing more than frequency band compression that will attenuate problematic energy. Establishing a visual reference to the energy will simplify the process of accurate correction.


Technorati Tags: , ,

Skype, Logic Pro X, and Aggregate Devices …


Studio Host and Skype participant to be recorded inside Logic Pro X on a single machine (single pass) with no additional hardware other than a Mic Input Device.


[– Two independent mono Host/Participant stems with no processing.

[– One processed split-stereo mixdown of the session with the Host and Guest residing on discrete (L+R) channels.

[– Real time Processing and Recording of all instances.


Of course the objectives noted above are easily attainable using two independent machines, with the recording box running Logic Pro X and the Skype machine handling the connection. In this case you would also need to use a mixer to set up a proper mix-minus.

You can also implement similar workflows by using two inexpensive USB audio interfaces connected to a single machine.

Considering the resourcefulness of today’s modern day Macs, I’m confident the following workflow will be successful freeing the user from complexities and added costs.

OSX Aggregate Devices

The foundation of this setup is based on a user created Aggregate Audio Device. Aggregate devices appear in the OSX System Preferences/Sound I/O options for system wide use. By wrapping supported “Subdevices” into a single Aggregate, you effectivly create a sort of cumulative Input Device that can be designated in Logic as the default. We also need a software utility that supports routing of the Skype Output to an Input in Logic.

I originally created this workflow using SoundFlower that was installed on my secondary iMac and carried over form previous versions of OSX. SoundFlower, along with the iMac’s Line Input were wrapped into a single Aggregate Device, and then designated in Logic as the default Input.

This worked well. However, I had no plans to install the now unsupported SoundFlower on my production MacPro for further testing. And so I looked around for a suitable up to date (and actively developed) replacement for SoundFlower.

Sound Siphon

Sound Siphon by Static Z Software “… makes your Mac’s Audio Output available as an Audio Input Device. It enables you to send audio from one application to another where it can be processed, streamed, or recorded.

Exactly what I needed.

Note that Sound Siphon is very diverse in terms of features. And the developer states that many useful enhancements are in the works. You can download a restricted demo. My hope is that you consider purchasing a $29.99 license. This will ensure the longevity of the application and continued development. Note that I have no affilation and I gladly purchased a license.

This is a snapshot of Sound Siphon:


In the example above I display a user defined Device (“Capture Safari”) that is essentially a Custom Audio Input. I then associated the Safari Application with this device. This becomes a system wide option to capture Safari audio. For example QuickTime X will now display “Capture Safari” as an Input option for audio recording.

It’s important to note that this particular Sound Siphon feature is supplemental to the Skype recording implementation. In other words – it’s an entrley different use case scenario. My goal here is to disclose the flexibility of the application.

Creating the Aggregate Device

Input 1 on my Mackie Onyx 1220i Mixer receives the output from a dbx 286A Voice Processor. The studio Mic is connected to the processor for proper gain staging. I needed to wrap the Mic signal along with the Skype audio into a single Input Device and designate it in Logic’s Preferences for proper routing.

To create an Aggregate Device, open Audio MIDI Setup, located in ~/Applications/Utilities. When creating a new Aggregate, supported Subdevices appear in the right side setup table.


Notice that Sound Siphon is listed as a 2 in/2 out device in the left source view. This is created when you install the application. Once installed, it will be available to be wrapped into an Aggregate Device along with pre-existing devices.

For my implementation I created “Skype Tracker” as a new Aggregate and selected my mixer (Onyx-(2528)) and Sound Siphon as Subdevices. Up top you set your Sample Rate and the Clock Source. My system seems to perform better with Sound Siphon set as the Clock Source.

It’s important to review the Input Channel matrix of the new Aggregate Device. Notice that Sound Siphon will only support Input channels (17+18). When routing Inputs in Logic, I will use Input 1 for the studio Mic and Input 17 for Skype.


Here are the Skype settings that I am using:


The Microphone is set to the Aggregate Device. The Speakers option is set to Sound Siphon. This setting is imperative and from what I can tell non-flexiable.

Logic Pro X

The first thing we need to do is define the Input Device in Global Preferences/Audio/Devices. I set mine to the Aggregate Device:


Next we will address setup and routing. What’s important here is that I use an Object in Logic that may not be immediately obvious in your particular installation.

Specifically, I often use Input Channel Strip Objects in my projects. They are implemented in the Environemnt (aka “MIDI Environment”). It is accessible form the Logic Window Menu.

From the Logic Docs regarding Input Channel Strips:

“The Input Channel Strip allows you to directly route and control signals from your audio hardware’s Inputs. Once an Input Channel Strip is assigned to an Audio Channel Strip, it can be monitored and recorded directly into Logic Pro, along with its effect plug-ins.

The signal is processed, inclusive of plug-ins even while Logic Pro is not playing. In other words, Input Channel Strips can behave just like external hardware processors. Aux sends can be used pre- or post-fader.

Input Channel Strips can be used as live Inputs that can stream audio signals from external sources (such as MIDI synthesizers and sound modules) into a stereo mix (by bouncing an Output Channel Strip).”

You can also create Bus Channel Strip Objects in the Environment. They are not the same as Auxiliary Channel Strips and can be quite useful in certain instances. For more information about Bus Channel Strips please refer to this article.

The Environment

To expose the accessability of the Logic Environment, open global Preferences and access the Advanced options. The MIDI option needs to be selected as part of the Advanced Tools:


Once that setting is ticked, “Open Midi Environment” will appear as an option in the Logic Window Menu.

Channel Strip Objects are added to the Environment from the New Menu/Channel Strip. Notice how the Environment emulates the Project Mixer:


Note that when adding Input Channel Strips in the Environment, you must define the corresponding (Aggregate) Device Inputs using the Channel Strip editor:


For this particular project I created two Input Channel Strips in the Environment using Inputs 1 and 17 respectively, based on Aggregate Subdevice availability (Input 1 = Mic, Input 17 = Skype).

You will also need 4 Audio Tracks (2 Mono, 1 Stereo, 1 PreListen), and 2 (Mono) Auxiliary Channel Strips. Create Audio Tracks using the Track/New Tracks option – located in the Logic Application Menu. Add Auxiliary Channel Strips using the Mixer’s Options Menu/Create New … || Note that the Input Channel Strips created in the Environment should be designated Mono.

Here is my Project Mixer with all necessary Objects and Routing:



The reddish labeled channels are the two Input Channel Strips that I created in the Environment. If you look at the text at the very top of these Channel Strips, you will see their Input designations.

The signals coming in through the Inputs are routed to their own independent Aux Channels for processing. Notice I inserted a Gain Trim on the Mic Input Channel. All processing options are of course subjective. One example would be to insert two instances of a Compressor on each Aux Channel. You would set these up to apply real time, non-aggresive dynamic range compression as you record.

Moving forward – notice the Aux Channels are Mono and hard panned L+R respectivly. This will maintain channel separation when recording the split-stereo version of the session. In this example each Aux Channel Output is routed to Audio Channel 3 (“Split Record”). This Stereo Audio Track is panned center. When armed it will record the Aux Channel Outputs to a split-stereo file.

Also study how I set up the remaining Audio Tracks – Audio Track 1 (“Rec. Mic”) and Audio Track 2 (“Rec. Skype”). Their Inputs are set to Bus 1 and 2 respectively, allowing these tracks to receive the unprocessed Outputs (“dry” audio) from the Input Channel Strips.

Keep in mind that if Effects are inserted on the Input Channel Strips, the audio routed to Audio Tracks 1+2 will be processed. In most cases I would not insert any Effects on the Input Channel Strips other than Gain. My intension here is to record dry stems.

I Grouped various aspects of these two channels, mainly Volume, Mute, Solo, and Record. This will link the faders and make it easy to control audibility of the mono stems cumulatively.

Wrap Up

That’s basicilly it. You can record/monitor all tracks in real time. And when you are done, there is no need to bounce, although you still can. You simply “Export” or “Export Region” as an individual file(s).



You may have noticed the Outputs for the Auxiliary Channel Strips (1+2) and the Input for Audio Track 3 (“Split Record”) is Bus 3. This is in fact a virtual (permanent) Bus used to route the processed audio to Track 3 for recording.

When you select a permanent virtual Bus in Logic for routing, an Auxiliary Channel Strip is auto-created and will appear in the Mixer. For this particular workflow – we use two Auxiliary Channel Strips, one for Mic processing and a second for Skype processing.

Throughout this entire workflow no changes were made to my default OSX Audio I/O Settings located in System Preferences/Sound.

As I always say – Audio Tracking and Post are highly subjective arts. In fact many Logic “experts” have never heard of or utilized the options in the Environment. And your processing options are also subjective. My hope is this documentation will at the very least introduce you the creation and usage of Aggregate Devices.

If by chance you develop a successful alternative solution, all well and good. In my tests I’ve found the documented implementation to work quite well.

Let me know if you have any questions.

I’d like to thank my friend Victor Cajiao for his help while testing this workflow.


Technorati Tags: , ,

Asymmetric Waveforms: Should You Be Concerned?

In order to understand the attributes of asymmetric waveforms, it’s important to clarify the differences between DC Offset and Asymmetry …

Waveform Basics

A waveform consists of both a Positive and Negative side, separated by a center (X) axis or “Baseline.” This Baseline represents Zero (∞) amplitude as displayed on the (Y) axis. The center portion of the waveform that is anchored to the Baseline may be referred to as the mean amplitude.


DC Offset

DC Offset occurs when the mean amplitude of a waveform is off the center axis due to differing amounts of the signal shifting to the positive or negative side of the waveform.

One common cause of this shift is when faulty electronics insert a DC current into the signal. This abnormality can be corrected in most file based editing applications and DAW’s. Left uncorrected, audio with DC Offset will exhibit compromised dynamic range and a loss of headroom.

Notice the displacement of the mean amplitude:


The same clip after applying DC Offset correction. Also, notice the preexisting placement of (+/-) energy:



Unlike waveforms that indicate DC Offset, Asymmetric waveform’s mean amplitude will reside on the center axis. However the representations of positive and negative amplitude (energy) will be disproportionate. This can inhibit the amount of gain that can be safely applied to the audio.

In fact, the elevated side of a waveform will tap the target ceiling before it’s counterpart resulting in possible distortion and the loss of headroom.

High-pass filters, and aggressive low-end processing are common causes of asymmetric waveforms. Adding gain to asymmetric waveforms will further intensify the disproportionate placement of energy.

In this example I applied a high-pass filter resulting in asymmetry:


Broadcast Chains

Broadcast engineers closely monitor positive to negative energy distribution as their audio passes through various stages of transmission. Proper symmetry aides in the ability to process a signal more effectively downstream. In essence uniform gain improves clarity and maximizes loudness.


In spoken word – symmetry allows the voice to ride higher in the mix with a lower risk of distortion. Since many Podcast Producers will be adding gain to their mastered audio when loudness normalizing to targets, the benefits of symmetric waveforms are obvious.

In the event an asymmetric waveform represents audio with audible distortion and/or a loss of headroom, a Phase Rotator can be used to reestablish proper symmetry.

Below is a segment lifted from a distributed Podcast (full zoom out). Notice the lack of symmetry, with the positive side of the waveform limited much more aggressively than the negative:


The same clip after Phase Rotation:


(I processed the clip above using the Adaptive Phase Rotation option located in iZotope’s RX 4 Advanced Channel Ops module.)

In Conclusion

Please note that asymmetric waveforms are not necessarily bad. In fact the human voice (most notably male) is often asymmetric by nature. If your audio is well recorded, properly processed, and pleasing to the ear … there’s really no need to attempt to correct any indication of asymmetry.

However if you are noticing abnormal displacement of energy, it may be worth looking into. My suggestion would be to evaluate your workflow and determine possible causes. Listen carefully for any indication of distortion. Often a slight EQ tweak or a console setting modification is all that may be necessary to make noticeable (audible) improvements to your audio.


Technorati Tags: , ,

Intermediate File Format for New Media Producers: MP2

mp2-file If you are in the audio production business or involved in some sort of collaborative Podcast effort, moving large lossless audio files to and from various locations can be challenging.

Slow internet speeds, Hotel WiFi, and server bottlenecks have the potential to cripple efficient file management and ultimately impede timely delivery. And let’s not forget how quickly drive space can diminish when storing WAV and/or AIFF files for archival purposes.

The Requirements of a Suitable Intermediate

From the perspective of a Spoken Word New Media Producer, there are two requirements for Intermediate files: Size Reduction and Retention of Fidelity. The benefits of file size reduction are obvious. File transfers originating from locations with less than ideal connectivity would be much more efficient, and the consumption of local or remote disk/server space would be minimized. The key here is to use a flexible lossy codec that will reduce file sizes AND hold up well throughout various stages of encoding and decoding.

Consider the possible benefits of the following client/producer relationship: A client converts (encodes) lossless files to lossy and delivers the files to the producer via FTP, DropBox, etc. The Producer would then decode the files back to their original format in preparation for post production.

When the work is completed, the distribution file is created and delivered (in most cases) as an MP3. Finally with a bit of ingenuity, the producer can determine what needs to be retained for archival purposes, and convert these files back to the intermediate format for long term storage.

How about this scenario: Podcast Producer A is located in L.A.. Producer B is located in NYC. Producer B handles the audio post for a double-ender that will consist of 2 individual WAV files recorded locally at each location.


Upon completion of a session, the person in L.A must send the NY based audio producer a copy of the recorded lossless audio. The weekly published program typically runs upwards of 60 minutes. Needless to say the lossless files will be huge. Let’s hope the sender is not in a Hotel room or at Starbucks.

The good news is such a codec exists …

MPEG 1 Layer II (commonly referred to as MP2 with an .mp2 file extension) is in fact a lossy “perceptual” codec. What makes it so unique (by design) is the format’s ability to limit the introduction of artifacts throughout various stages of encoding and decoding. And get this – MP2’s check in at about 1/5th the size of a lossless source. For example a 30 minute (16 bit/44.1kHz) Stereo WAV file currently residing on my desktop is 323.5 megabytes. It’s MP2 counterpart is 58.7 megabytes.

Public Radio

If you look into the file submission requirements over at PRX (The Public Radio Exchange) and NPR (see requirements), you will notice MP2 audio files are what they ask for.

In fact during the early days of IT Conversations, founder and Executive Director Doug Kaye implemented the use of MP2 audio files as intermediates throughout the entire network based on recommendations by some of the most prominent engineers in the Public Radio space. We expected our show producers and content providers to convert their audio files to MP2 prior to submission to our servers using third party software applications.

Eventually a proprietary piece of software (encoder/uploader) was developed and distributed to our affilates. The server side MP2’s were downloaded by our audio engineers, decoded to lossless, produced, and then sent back up to the network as MP2 in preparation for server side distribution encoding (MP3).

From a personal perspective I was so impressed with the codec’s performance, I immediatly began to ask my clients to submit MP2 audio files to me, and I’ve never looked back. I have never experienced a noticeable degradation of audio quality when converting a client’s MP2 back to WAV in preparation for post.


In my view it’s always a good idea to have unfettered access to all previously produced project files. Besides produced masters, let’s not forget the accumulation of individual project assets that were edited, saved, and mixed in post.

On average my project folders that include audio assets for a 30 minute program may consume upwards of 3 Gigabytes of storage space. Needless to say an efficient method of storage is imperative.

Fidelity Retention

If you are concerned about the possibility of audio quality degradation due to compression artifacts, well that’s understandable. In certain instances accessability to raw, uncompressed audio will be more suitable. However I am convinced that you will be impressed with how well MP2 audio files hold up throughout various workflows.

In fact try this: (Suggested encoders listed below)

Convert a stereo WAV file to stereo MP2 (256 kbps). Compare the file sizes. Listen to the MP2 and assess fidelity retention. Then convert the stereo MP2 directly to stereo MP3 (128 kbps). Listen for any indication of noticeable artifacts.

Let me know what you think …

My recommendation would be to first experiment with converting a few of your completed project assets to MP2 in preparation for storage. I’ve found that I rarely need to dig back into old work. I have on a few occasions, and the decoded MP2’s were perfectly fine. Note that I always save a copy of the produced lossless master.

Specifications and Software

The requirements for mono and stereo MP2 files:

Stereo: 256 kbps, 16 bit, 44.1kHz
Mono: 128 kbps, 16 bit, 44.1kHz

There are many audio applications that support MP2 encoding. Since I have limited exposure to Windows based software, the scope of my awareness is narrow. I do know that Adobe Audition supports the format. In the past I’ve heard that dBPowerAmp is a suitable option.

On the Mac side, besides the cross platform Audition – there is a handy utility on the Mac App Store called Audio-Converter. It’s practically free, priced at $0.99. File encoding is also supported in FFmpeg either from the Command Line or through various third party front ends.

Here is the syntax (stereo, then mono) for Command Line use on a Mac. The converted file will land on your Desktop, named Output.mp2:

ffmpeg -i yourInputFile.wav -acodec mp2 -ab 256k ~/Desktop/Output.mp2

ffmpeg -i yourInputFile.wav -acodec mp2 -ab 128k ~/Desktop/Output.mp2

Here’s a good place to download pre-compiled FFmpeg binaries.

Many modern media applications support native playback of MP2 audio files, including iTunes and Quicktime.

In Conclusion

If you are in the business of moving around large Spoken Word audio files, or if you are struggling with disk space consumption issues, the use of MP2 audio files as intermediates is a worthy solution.


Technorati Tags: ,

iZotope Ozone 6

iZotope has released a newly designed version of Ozone, their flagship Mastering processor. Notice I didn’t refer to Ozone [6] as a plugin? Well I’m happy to report that Ozone [6] is now capable to run independent of a DAW as a stand-alone desktop processor.


Besides the stand-alone option and striking UI overhaul, Ozone’s flexibility has been greatly enhanced with the addition of support to host third party Audio Units and VST plugins. Preliminary tests here indicate that it functions very well in the stand-alone mode. More on this in moment …

I’ve been a customer and supporter of iZotope since early 2005. If I remember correctly Ozone 3 was the first version that I had access to. In fact back in the early days of Podcasting, many producers purchased an Ozone license based on my endorsement. This was an interesting scenario all due to the fact that most of the people in the community who bought it – had no idea how to use it! And so a steady flow of user support inquiries began to trickle in.

I decided the best way to bring users up to speed was to design Presets. I would distribute the underlying XML file and have the users move it to the proper location on their system’s. After doing so, the Preset would be accessible within Ozone’s Preset Manager.

The complexity of the Presets varied. Some people wanted basic Band-Pass filters. Others requested the simulation of a broadcast chain that would result in a signature sound for their recorded voice. In fact I remember one particular instance where the user requested a Preset that would make him sound like an “AM Radio DJ”. So I went to work and I think I made him happy.

As Ozone matured, it’s level of complexity increased resulting in somewhat sluggish performance (at least for me). When iZotope released Alloy 2, I bought it – and found it to be much more responsive. And so I sort of moved away from Ozone, especially Ozone 5. My guess is if my system’s were a bit more robust, poor performance would be less of an issue. Note that my personal experience with Ozone was not necessarily the general concensus. Up to this latest release, the plugin was highly regarded with widespread use in the Mastering community.

Over the past 24 hours I’ve been paying close attention to how Ozone users are reacting to this new version. Note that a few key features have been removed. The Reverb module is totally gone. Gating/Expansion has been removed from the Dynamics Module, and the Dithering options have been minimized. The good news is these particular features are not game changers for me based on how I use this tool. I will say the community reaction has been tepid. Some users are passing on the release due to the omissions that I’ve mentioned and others that I’m sure I’ve overlooked.

For me personally – the $99 upgrade was a no-brainer. In my view the stand-alone functionality and the support for third party plugins makes up for what has been removed. In stand-alone mode you can import multiple files, save your work as projects, implement processing chains in a specific order, apply head/tail cuts/fades, and export your work.

Ozone [6] will accept WAV, AIFF, or MP3 files. If you are exporting to lossless, you can convert Sample Rates and apply Dither. This all worked quite well on my 2010 MacPro. In fact the performance was quite good, with no signs of sluggish performance. I did notice some problematic issues with plugin wrappers not scaling properly. Also the Plugin Manager displayed duplicates of a few plugins. This did not hinder performance in any way. In fact all of my plugins functioned well.

And so that’s my preliminary take. My guess is this new version of Ozone is well suited for advanced New Media Producers who have a basic understanding of how to process audio dynamics and apply EQ. Of course there’s much more to it, and I’m around to answer any questions that you might have.

Look for more information in future posts …


Technorati Tags: , , ,

Podcast Loudness: Mono vs. Stereo Perception …

Consider the following scenario:

Two copies of an audio file. File 1 is Stereo, Loudness Normalized to -16.0 LUFS. File 2 is Mono, also Loudness Normalized to -16.0 LUFS.

Passing both files through a Loudness Meter confirms equal numerical Program Loudness. However the numbers do not reflect an obvious perceptual difference during playback. In fact the Mono file is perceptually louder than it’s Stereo counterpart.

Why would the channel configuration affect perceptual loudness of these equally measured files?


The Explanation

I’m going to refer to a feature that I came across in a Mackie Mixer User Manual. Mackie makes reference to the “Constant Loudness” principle used in their mixers, specifically when panning Mono channels.

On a mixer, hard-panning a Mono channel left or right results in equal apparent loudness (perceived loudness). It would then make sense to assume that if the channel was panned center, the output level would be hotter due to the combined or “mixed” level of the channel. In order to maintain consistent apparent loudness, Mackie attenuates center panned Mono channels by about 3 dB.

We can now apply this concept to the DAW …

A Mono file played back through two speakers (channels) in a DAW would be the same as passing audio through a Mono analog mixer channel panned center. In this scenario, the analog mixer (that adheres to the Constant Loudness principle) would attenuate the output by 3dB.

In order to maintain equal perception between Loudness Normalized Stereo and Mono files targeting -16.0 LUFS, we can simulate the Constant Loudness principle in the DAW by attenuating Mono files by 3 LU. This compensation would shift the targeted Program Loudness for Mono files to -19.0 LUFS.

To summarize, if you plan to Loudness Normalize to the recommend targets for internet/mobile, and Podcast distribution … Stereo files should target -16.0 LUFS Program Loudness and Mono files should target -19.0 LUFS Program Loudness.

Note that In my discussions with leading experts in the space, it has come to my attention that this approach may not be sustainable. Many pros feel it is the responsibility of the playback device and/or delivery system to apply the necessary compensation. If this support is implemented, the perceived loudness of -16.0 LUFS Mono will be equal to -16.0 LUFS Stereo. There would be no need to apply manual compensation.


Technorati Tags: ,

Loudness Meter Descriptors …

In the recent article published on “Working Group Nears Standard for Audio Levels in PRSS Content”, the author states:

“Working group members believe that one solution may lie in promoting the use of Loudness Meters, which offer more precision by measuring audio levels numerically. Most shows are now mixed using peak meters, which are less exact.”

Peak Meters are exact – when they are used to display what they are designed to measure:Sample Peak Amplitude. They do not display an accurate representation of average, perceived loudness over time. They should only be used to monitor and ultimately prevent overload (clipping).

It’s great that the people in Public Radio are finally addressing distribution Loudness consistency and compliance. My hope is their initiative will carry over into their podcast distribution models. In my view before any success is achieved, a full understanding of all spec. descriptors and targets would be essential. I’m referring to Program (Integrated) Loudness, Short Term Loudness, Momentary Loudness, Loudness Range, and True Peak.

Loudness Meter

A Loudness Meter will display all delivery specification descriptors numerically and graphically. Meter descriptors will update in real time as audio passes through the meter.

Short Term Loudness values are often displayed from a graphical perspective as designed by the developer. For example TC Electronic’s set of meters (with the exception of the LM1n) display Short Term Loudness on a circular graph referred to as Radar. Nugen Audio’s VisLM meter displays Short Term Loudness on a grid based histogram. Both versions can be customized to suit your needs and work equally well.


Loudness Meters also include True Peak Meters that display any occurrences of Intersample Peaks.


All Loudness standardization guidelines specify a Program Loudness or “Integrated Loudness” target. This time scaled descriptor indicates the average, perceived loudness of an entire segment or program from start to finish. It is displayed on an Absolute scale in LUFS (Loudness Units relative to Full Scale), or LKFS (Loudness Units K Weighted relative to Full Scale). Both are basically the same. LUFS is utilized in the EBU R128 spec. and LKFS is utilized in the ATSC A/85 spec. What is important is that a Loudness Meter can display Program Loudness in either LUFS or LKFS.

The Short Term Loudness (S) descriptor is measured within a time window of 3 seconds, and the Momentary Loudness (M) descriptor is measured within a time window of 400 ms.

The Loudness Range (LRA) descriptor can be associated with dynamic range and/or loudness distribution. It is the difference between average soft and average loud parts of an audio segment or program. This useful indicator can help operators decide whether dynamic range compression is necessary.


The specification Gate (G10) function temporarily pauses loudness measurements when the signal drops below a relative threshold in highly dynamic audio, thus allowing only prominent foreground sound to be measured. The relative threshold is -10 LU below ungated LUFS. Momentary and Short Term measurements are not gated. There is also a -70 LUFS Absolute Gate that will force metering to ignore extreme low level noise.

Absolute vs. Relative

I mentioned that LUFS and LKFS are displayed on an Absolute scale. For example the EBU R128 Program Loudness target is -23.0 LUFS. For Podcast/Internet/Mobile the Program Loudness target is -16.0 LUFS.

There is also a Relative scale that displays LU’s, or Loudness Units. A Relative LU scale corresponds to an Absolute LUFS/LKFS scale, where 0 LU would equal the specified Absolute target. In practice, -23 LUFS in EBU R128 is equal to 0 LU. For Podcast/Mobile -16.0 LUFS would also be equal to 0 LU. Note that the operator would need to set the proper Program Loudness target in the Meter’s Preferences in order to conform.


LU and dB Relationship

1 LU is equal to 1 dB. So for example you may have measured two programs: Program A checks in at -20 LUFS. Program B checks in at -15 LUFS. In this case program B is +5 LU louder than Program A.


Loudness Meter plugins mainly support online (Real Time) measurement of an audio signal. For an accurate measurement of Program Loudness of a clip or mixed segment the meter must be inserted in the DAW at the very end of a processing chain, preferably on the Master channel. If the inserts on the Master channel are post fader, any change in level using the Master Fader will result in a global gain offset to the entire mix. The meter would then (over time) display the altered Program Loudness.

If your DAW’s Master channel has pre fader inserts, the Loudness Meter should still be inserted on the Master Channel. However the operator would first need to route the mix through a Bus and use the Bus channel fader to apply global gain offset. The mix would then be routed to the Master channel where the Loudness Meter is inserted.

If your DAW totally lacks inserts on the Master channel, Buses would need to be used accordingly. Setup and routing would depend on whether the buses are pre or post fader.

Some Loudness Meter plugins are capable of performing offline measurements in certain DAW’s on selected regions and/or clips. In Pro Tools this would be an Audio Suite process. You can also accomplish this in Logic Pro X by initiating and completing an offline bounce through a Loudness Meter.


Technorati Tags: , ,

Audition CC: Loudness Normalization Pt.2 …

In my previous article I discussed various aspects of the Match Volume Processor in Adobe Audition CC. I mentioned that the ITU Loudness processing option must be used with care due to the lack of support for a user defined True Peak Ceiling.

I also pointed to a video tutorial that I produced demonstrating a Loudness Normalization Processing Workflow recommended by Thomas Lund. It is the off-line variation of what I documented in this article.

Here’s how to implement the off-line processing version in Audition CC …

This is a snapshot of a stereo version of what may very well be the second most popular podcast in existence:

Amplitude Statistics in Audition:

Peak Amplitude:0dB
True Peak Amplitude:0.18dBTP
ITU Loudness:-15.04 LUFS


It appears the producer is Peak Normalizing to 0dBFS. In my opinion this is unacceptable. If I was handling post production for this program I would be much more comfortable with something like this at the source:

Amplitude Statistics in Audition:

Peak Amplitude:-0.81dB
True Peak Amplitude:-0.81dBTP
ITU Loudness:-15.88 LUFS


We will be shooting for the Internet/Mobile/Podcast target of -16.0 LUFS Program Loudness with a suitable True Peak Ceiling.

The first step is to run Amplitude Statistics and determine the existing Program Loudness. In this case it’s -15.88 LUFS. Next we need to Loudness Normalize to -24.0 LUFS. We do this by simply calculating the difference (-8.1) and applying it as a Gain Offset to the source file.

The next step is to implement a static processing chain (True Peak Limiter and secondary Gain Offset) in the Audition Effects Rack. Since these processing instances are static, save the Effects Rack as a Preset for future use.

Set the Limiter’s True Peak Ceiling to -9.5dBTP. Set the secondary Gain Offset to +8dB. Note that the Limiter must be inserted before the secondary Gain Offset.

Process, and you are done.

In this snapshot the upper waveform is the Loudness Normalized source (-24.0 LUFS). The lower waveform in the Preview Editor is the processed audio after it was passed through the Effects Rack chain.


In case you are wondering why the Limiter is before the secondary Gain instance – in a generic sense, if you start with -9.5 and add 8, the result will always be -1.5. This translates into the Limiter doing it’s job and never allowing the True Peaks in the audio to exceed -1.5dBTP. In essence this is the ultimate Ceiling. Of course it may be lower. It all depends on the state of the source file.

This last snapshot displays the processed audio that is fully compliant, followed by it’s Amplitude Statistics:



In Summary:

[– Determine Program Loudness of the source (Amplitude Statistics).

[– Loudness Normalize (Gain Offset) to -24.0 LUFS.

[– Run your saved Effects Rack chain that includes a True Peak Limiter (Ceiling set to -9.5dBTP) and a secondary +8dB Gain Offset.

Feel free to ping me with questions.


Technorati Tags: ,

Audition CC: Loudness Normalization …

Adobe Audition CC has a handy Match Volume Processor with various options including Match To/ITU-R BS.1770-2 Loudness. The problem with this option is the Processor will not allow the operator to define a True Peak Ceiling. And so depending on various aspects of the input file, it’s possible the processed audio may not comply due to an unsuitable Peak Ceiling.

For example if you need to target -16.0 LUFS Program Loudness for internet/mobile distribution, the Match Volume Processor may need to increase gain in order to meet this target. Any time a gain increase is applied, you run the risk of pushing the Peak Ceiling to elevated levels.

The ITU Loudness processing option does supply a basic Limiting option. However – it’s sort of predefined. My tests revelaled Peak Ceilings as high as -0.1dBFS. This will result in insufficient headroom for both True Peak compliance and preparation for MP3 encoding.

The Audition Match Volume Processor also features a Match To/True Peak Amplitude option with a user defined True Peak Ceiling (referred to as Peak Volume). This is essentially a True Peak Limiter that is independent of the ITU Loudness Processor. For Program Loudness and True Peak compliance, it may be necessary to run both processing stages sequentially.


There are a few caveats …

[– If the Match Volume Processor (Match To/ITU-R BS.1770-2 Loudness) applies limiting that results in a Peak Ceiling close to full scale, any subsequent limiting (Match To/True Peak Amplitude) has the potential to reduce the existing Program Loudness.

[– If a Match Volume process (Match To/ITU-R BS.1770-2 Loudness) yields a compliant True Peak Ceiling right out of the box, there is no need to run any subsequent processing.


If you are going to use these processing options, my suggestion would be to make sure the measured Program Loudness of your input file is reasonably close to the Program Loudness that you are targeting. Also, make sure the input file has sufficient headroom, with existing True Peaks well below 0dBFS.

If you are finding it difficult to achieve acceptable results, I suggest you apply the concepts described in this video tutorial that I produced. I demonstrate a sort of manual “off-line” Loudness Normalization process. If you prefer to handle this in real time (on-line), refer to my article “Podcast Loudness Processing Workflow.”


Technorati Tags: ,

Skype in the Box …


Studio Host and Skype participant to be recorded inside your DAW utilizing a slightly advanced configuration.

The session will require a proper mix-minus using your mixer’s Aux Send to feed the Skype Input – minus the Skype participant.


[– Two discrete mono Host/participant recordings with minimal or no processing.

[– Host Mic routed through a voice processing chain using plugins.

[– Incoming Skype routed through a compressor to tame levels, if necessary.

[– One fully processed stereo mix of the session with the Host audio on the left channel and the Skype participant on the right channel.

[– Real time recording and output.

There are certainly various ways to accomplish these objectives utilizing a Bounce to Track concept. The optional inserted plugins and even the routing decisions noted below are entirely subjective. And success with this implementation will depend on how resourceful your system is. I would recommend that you send the session audio out in real time to an external recorder for backup.


This particular example works well for me in Pro Tools. I tried to make this design as generic as possible. My guess is you will have no trouble applying these concepts in any professional DAW. (Click to enlarge)



First I’ll mention that I’m using a Mackie Onyx 1220i Firewire Mixer. This device is defined as my default system I/O. The mixer has a sort nifty feature that allows the creation of a mix-minus just by the press of a button.


Pressing the Input button located on the mixer’s Line In 11-12 channel(s) sets the computer’s audio output as the channel’s input, passing the signal through Firewire 1-2. Disengaging this button will set the Input(s) to Line and the channels’s 1/4″ Input jacks would become active.

Skype recognizes the mixer as the default I/O. So I plug my mic into the mixer’s Channel 1 Input and hard-pan left. I then hard-pan Channel(s) 11-12 right. With the Input button pressed – I can hear Skype. In order to create a successful mix-minus you need to tell the mixer to prevent the Skype input from being inserted back into the Main Mix. These options are located in the mixer’s Source Matrix Control area.

This configuration translates into a Pro Tools session by setting the Track 1 Input (mono) to Onyx Channel 1 and the Track 2 Input (mono) to Onyx Channel 12. I now have discrete channels of audio coming into Pro Tools on independent tracks.

Typically I insert noise reduction plugins on the Mic Input Channel. A Gate basically mutes the channel when there is no signal, and iZotope’s Dialog DeNoiser handles problematic broadband noise in real time. At this stage the Skype Input is recorded with no processing.

Next, both Input Channels are bused out to independent mono Auxiliary Inputs that are hard-panned left + right respectively in preparation to route the passing audio to a Stereo Record bus. To process the mic signal passing through Aux 1 I usually insert something like Waves MaxxVolume, FabFilter’s Pro-DS, and Avid’s Impact Compressor.

For the Skype audio passing through Aux 2, I might insert a gain stage plugin and another instance of Avid’s Impact Compressor. This would keep the Skype audio in check in the event the guest’s delivery is problematic.

The last step is to bus out the processed audio to a Stereo Audio Track with it’s channels hard-panned left + right. This will maintain the channel separation that we established by hard-panning the Aux Inputs. On this track I may insert a Loudness Maximizer and a Peak Limiter. The processed and recorded stereo file will contain the Mic audio on the Left Channel and the Skype audio on the Right Channel.

Finally you’ll notice I have a Loudness Meter inserted on the Master in one of the Pro Tools Post Fader inserts. Once a session is completed I can disarm the “Record” track and monitor the stereo mixdown. Since the Loudness Meter will be operating Post Fader, I can apply a global gain offset using the Master Fader. Output measurements will be accurate. Of course at this point the channels that contain the original discrete mono recordings would need to be muted.


All the recording and processing steps in this session can be executed in real time. You simply define your Inputs, add Inserts, set up panning/routing, and finally arm your tracks to record. You will be able to converse with the Skype guest as you monitor the session through the mixer’s headphone output with no latency issues. When the session ends you will have access to independent mono recordings for both participants and a processed stereo mix with discrete channels.

Note that you can also implement this workflow as a two step process by first recording the Host/Skype session as discrete mono files. Then Bounce to Track (or Disk) to create the stereo mixdown.

Again the efficiency of this workflow will depend on how resourceful your system is. You might consider running Skype on a separate computer. And I reiterate: as you record in the box, consider sending the session audio out to an external recorder for backup.


Technorati Tags: ,

Podcasting System featuring the Allen & Heath XB-10 Console …

I continue to look around for a Broadcast Console that would be suitable to replace my trusty Mackie Onyx 1220i FW mixer. I was always aware of the XB-10 by Allen & Heath, although I did not pay much attention to it due to it’s use of pot-styled channel faders as opposed to sliding (long-throw) faders.


Last evening I skimmed through the manual for the XB-10. Looking past the pot-styled fader issue this $799 console is packed with features that make it highly attractive. And it’s smaller than my Mackie, checking in at 13.2 inches wide x 10 inches deep. Allen & Heath also offers the XB-14-2 Console. It checks in at 15.2 inches wide x 18.3 inches deep with ample surface space for long-throw sliding faders. Bottom line is it’s larger than my Mackie and the size just doesn’t work for me.

XB-10: The Basics

Besides all the useful routing options, the XB-10 has a dedicated Mix-Minus channel that can be switched to receive the output of a Telephone Hybrid or the output of the bi-directional USB bus. In this case it would be easy to receive a Skype guest from a computer.

The console has latching On/Off switches on all input channels, supports pre-fader listening, and has built-in Compressors on channels 1-3. The manual states ” … the Compressor is optimized to reduce the dynamic range of the presenter microphone(s). Low signal levels are given a 10dB gain boost. Soft Knee compression activates at -20dBu, and higher level signals are limited.” Personally I would use a dedicated voice processor for the main presenter. However having the dynamics processing on-board is a useful feature, especially when adding additional presenters to the program mix.

The XB-10 is also equipped with an Output Limiter that can be used to ensure that the final mix does not exceed a predefined level. There is an activation switch located on the back panel of the device with a trim pot control to set the limiting threshold. If the Limiter is active and functioning, a front panel LED illuminates.

One other feature that is worth mentioning is the Remote Connector interface located on the back of the device. This can be used to implement CD player remote triggering, ON AIR light illumination, and external metering options.

I decided to design a system using the XB-10 as the controller that is suitable for flexible Podcast Production and Recording. Bear in mind I don’t have any of these system components on hand except for older versions of the dbx Voice Processor and the Telos Phone Hybrid. I also have a rack-mounted Solid State Recorder by Marantz, similar to the Tascam. I’m confident that all displayed components would work well together yielding excellent results.

Also note there are many ways to integrate these components within the system in terms of connections and routing. This particular design is similar in concept to how I have my current system set up using the components that I currently own (Click to Enlarge).


System Design Concepts and Selections

The mic of choice is the Shure SM7B. The was the first broadcast style mic that I bought back in 2004 and it’s one of my prized possessions. As far as I’m concerned it’s the most forgiving broadcast mic available, with one caveat – it requires a huge amount of clean gain to drive it. Common +60dB gain trims on audio mixers will not be suitable, especially when setting the gain near or at it’s highest level. This will with no doubt result in problematic noise.

In my current system I plug my dynamic mic(s) into my dbx 286a Voice Processor (mic input) and then route the processor’s line output to a line input on one of the Mic channels on my Mackie mixer. By doing so I pick up an additional +40dB of available gain to drive the mic. Of course this takes a bit of tweaking to get the right balance between the gain setting on the processor and the gain setting on the Mackie. The key is not to max out either of the gain stages.

I’ve recreated this chain in the new design using the updated dbx 286s. In doing so the primary presenter gets the voice processor on her channel. If there is the necessity to expand the system by introducing a second presenter, I’ve implemented the Cloudlifter CL-1 gain stage between the mic and the console’s mic input on channel 2. The CL-1 will provide up to +20dB of additional clean gain when using any passive microphone. Finally I point to the availability of the on-board dynamics processor and consider this perfectly suitable for a second presenter.

I mentioned the XB-10 has a dedicated telephone interface channel with a built in mix-minus. Once again I’ve selected the Hx1 Digital Telephone Hybrid by Telos Systems for use in this system. The telephone interface channel can be set to receive an incoming telephone caller or something like the Skype output coming in from a computer. I’ve taken this a step further by also implementing an analog Skype mix-minus using the Console’s Aux Send to feed the computer input. The computer output is routed back into the Console on an available channel(s).

As noted the USB interface on the Console is bi-directional. One use case scenario would be to use the computer USB output to send sound effects and audio assets into the program mix. (I am displaying QCart for Mac as a possible option).

The rest is pretty self explanatory. I’m using the Monitor output bus to feed the studio speakers. The Console’s Main outputs are routed to the Tascam recorder, and it’s outputs are routed to an available set of inputs on the Console.

Like I said I’m fairly confident this system design would be quite functional and well suited for flexible Podcast Production and Recording.

In closing beginning in 2004 besides designing sort of generic systems based on various levels of cost and complexity, it was common for an aspiring Podcast Producer to reach out to me and ask for technical assistance with the components they purchased. In this case I would build detailed diagrams for the producer much the same as the example included in this post. A visual representation of system routing and configuration is a great way to expidite setup when and if the producer who purchased the gear is overwhelmed.


At one time I was providing a service where two individual participants were simultaneously calling into my studio for interview session recording. Since I had two dedicated phone lines and corresponding telephone hybrids, the participants were able two converse with each other using 2 Aux buses, in essence by creating two individual mix-minuses.

Here is the original diagram that I built in October 2006 that displays the routing of the callers via Aux sends:


Even though the XB-10 console contains a single Aux bus, a similar configuration may still be possible where an incoming caller from the telephone hybrid would be able to converse with a Skype guest, minus themselves. I need to read into this further before I am able to make a determination on whether this is supported.


[– Shure SM7B Broadcast Dynamic Microphone
[– Cloudlifter CL-1 Gain Stage
[– Allen & Heath XB-10 Broadcast Console
[– dbx 286s Voice Processor
[– Telos Hx1 Digital Telephone Hybrid
[– Tascam SS-R200 Solid State Recorder


[– QCart for Mac OSX
[– KRK Rokit 5 Powered Studio Monitors


Technorati Tags: , , ,

Podcast Loudness Processing Workflow …

Below is Elixir by Flux. This is an ITU-R BS.1770/EBU R128 compliant multichannel True Peak Limiter. It’s just one of the tools available that can be used in the workflow described below. In this post I also mention the ISL True Peak Limiter by Nugen Audio.

If you have any questions about these tools or Loudness Meters in general, ping me. In fact I think my next article will focus on the importance of learning how to use a Loudness Meter, so stay tuned …


In my previous post I made reference to an audio processing workflow recommended by Thomas Lund. The purpose of this workflow is to effectively process audio files targeting loudness specifications that are suitable for internet and mobile distribution. in other words – Podcasts.

My first exposure to this workflow was reading “Managing Audio Loudness Across Multiple Platforms” written by Mr. Lund and included in the January 2013 edition of Broadcast Engineering Magazine.

Mr. Lund states:

“Mobile and computer devices have a different gain structure and make use of different codecs than domestic AV devices such as television. Tests have been performed to determine the standard operating level on Apple devices.

Based on 1250 music tracks and 210 broadcast programs, the Apple normalization number comes out as -16.2 LKFS (Loudness, K-weighted, relative to Full Scale) on a BS.1770-3 scale.

It is, therefore, suggested that when distributing Podcast or Mobile TV, to use a target level no lower than -16 LKFS. The easiest and best-sounding way to accomplish this is to:

[– Normalize to target level (-24 LKFS)

[– Limit peaks to -9 dBTP (Units for measurement of true peak audio level, relative to full scale)

[– Apply a gain change of +8 dB

Following this procedure, the distinction between foreground and background isn’t blurred, even on low-headroom platforms.”

Here is my interpretation of the steps referenced in the described workflow:

Step 1 – Normalize to target level -24.0 LUFS. (Notice Mr. Lund refers to LKFS instead of LUFS. No worries. Both are the same. LKFS translates to Loudness Units K-Weighted relative to Full Scale).

So how do we accomplish this? Simple – the source file needs to be measured and the existing Program Loudness needs to be established. Once you have this descriptor, it’s simple math. You calculate the difference between the existing Program Loudness and -24.0. The result will give you the initial gain offset that you need to apply.

I’ll point to a few off-line measurement utilities at the end of this post. Of course you can also measure in real time (on-line). In this case you would need to measure the source in it’s entirety in order to arrive upon an accurate Program Loudness measurement.

Keep in mind since random Program Loudness descriptors at the source will vary on a file to file basis, the necessary gain offset to normalize will always be different. In essence this particular step is variable. Conversely steps 2 and 3 in the workflow are static processes. They will never change. The Limiter Ceiling will always be -9.0 dBTP, and the final gain stage will always be + 8dB. The -16.0 LUFS target “math” will only work if the Program Loudness is -24.0 LUFS at the very beginning from file to file.

Think about it – with the Limiter and final gain stage never changing, – if you have two source files where file A checks in at -19.0 LUFS and File B checks in at -21.0 LUFS, the processed outputs will not be the same. On the other hand if you always begin with a measured Program Loudness of -24.0 LUFS, you will be good to go.


[– If your source file checks in at -20.0 LUFS … with -24.0 as the target, the gain offset would be -4.0 dB.


[– If your source file checks in at -15.6 LUFS … with -24.0 as the target, the gain offset would be -8.4 dB.

[– If your source file checks in at -26.0 LUFS … with -24.0 as the target, the gain offset would be +2.0 dB.

[– If your source file checks in at -27.3 LUFS … with -24.0 as the target, the gain offset would be +3.3 dB

In order to maintain accuracy, make sure you use the float values in the calculation. Also – it’s important to properly optimize the source file (see example below) before performing Step 1. I’m referring to dynamics processing, equalization, noise reduction, etc. These options are for the most part subjective. For example if you prefer less compression resulting in wider dynamics, that’s fine. Handle it accordingly.

Moving forward we’ve established how to calculate and apply the necessary gain offset to Loudness Normalize the source audio to -24.0 LUFS. On to the next step …

Step 2 – Pass the processed audio through a True Peak Limiter with it’s Peak Ceiling set to -9.0 dBTP. Typically I set the Channel or “Stereo” Link to 100%, limiting Look Ahead to 1.5ms and Release Time to 150ms.

Step 3 – Apply +8dB of gain.

You’re done.

You can set this up as an on-line process in a DAW, like this:


I’m using the gain adjustment feature in two instances of the Avid Time Adjuster plugin for the initial and final gain offsets. The source file on the track was first measured for Program Loudness. The necessary offset to meet the initial -24.0 LUFS target was -4 dB.

The audio then passes through the Nugen ISL True Peak Limiter with it’s Peak Ceiling set to -9.0 dBTP. Finally the audio is routed through the second instance of the Adjuster plugin adding +8 dB of gain. The Loudness meter displays the Program Loudness after 5 minutes of playback and will accurately display variations in Program Loudness throughout. Bouncing this session will output to the Normalized targets.

Note that you can also apply the initial gain offset, the limiting, and the final gain offset as independent off-line processes. The preliminary measurement of the audio file and gain offset are still required.

Example Workflow

Review the file attributes:


The audio is fairly dynamic. So I apply an initial stage of compression:


Next I apply additional processing options that I feel are necessary to create a suitable intermediate. I reiterate these processing options are entirely subjective. Your desire may be to retain the Loudness Range and/or dynamic attributes present in the original file. If so you will need to process the audio accordingly.

Here is the intermediate:


The Program Loudness for this intermediate file is -20.2 LUFS. The initial gain offset required would be -3.8 dB before proceeding.

After applying the initial gain offset, pass the audio through the limiter, and then apply the final gain stage.

This is the resulting output:


That’s about it. We’re at -16.0 LUFS with a suitable True Peak Max.

I’ve experimented with this workflow countless times and I’ve found the results to be perfectly acceptable. As I previously stated – preparation of your source or intermediate file prior to implementing this three step process is subjective and totally up to you. The key is your output will always be in spec..

Offline Measuring Tools

I can recommend the following tools to measure files “off-line.” I’m sure there are many other options:

[– The new Loudness Meters by TC Electronic support off-line measurements of selected audio clips in Pro Tools (Audio Suite).

[– Auphonic Leveler Batch Processor. I don’t want to discount the availability and effectiveness of the products and services offered by Auphonic. It’s a highly recommended web service and the standalone application that includes high quality audio processing algorithms including Loudness Normalization.

[– Using FFmpeg from the command line.

Example syntax:

ffmpeg -nostats -i yourSourceFile.wav -filter_complex ebur128=peak=true -f null –

[– Using r128x from the command line.

Example syntax:

r128x yourSourceFile.wav

Note there is a Mac only front end (GUI) version of r128x available as well.


Technorati Tags: ,

Fresh Air Podcast: Audio Analysis …

In my No Free Pass for Podcasts post I talked about why the Broadcast Loudness specs. are not necessarily suitable for Podcasts. I noted that the Program Loudness targets for EBU R128 and ATSC A/85 are simply too low for internet and mobile audio distribution. Add excessively dynamic audio to the mix and it will complicate matters further, especially when listeners use mobile devices to consume their media in less than ideal ambient spaces.


Earlier today I was discussing this issue with someone who is well versed in all aspects audio production and loudness processing. He noted that ” … the consensus of it all is, that it is a bad idea to take a really nice standard that leaves plenty of headroom and then start creating new standards with different reference values.” The fix would be to “keep production and storage at -23.0 LUFS and then adjust levels in distribution.” Valid points indeed. However in the real world this mindset is unrealistic, especially in the internet/mobile/Podcasting space.

The fact of the matter is there is no way to avoid the necessity to revise the standards that simply do not work on a platform that consists of unique variables.

And so considering these variables, the implementation of thoughtful, revised, best practices that include platform specific targets for Program Loudness, Loudness Range, and True Peak are unavoidable. Independent Podcasters and network driven Podcasts using arbitrary production techniques and delivery methods simply need direction and guidance in order to comply. In the end it’s all about presenting well produced media to the listener.

Recently I came across a tweet where someone stated “I love the show but it is consistently too quiet to listen to on my phone.” They were referring to the NPR program Fresh Air. I’m not exactly sure if this person was referring to the radio broadcast stream or the distributed Podcast. Either way it’s an interesting assertion that I can directly relate to.

I subscribe to the Fresh Air Podcast. This will probably not surprise you – I refuse to listen to the Podcast right out of the box. When a new show pops up in Instacast, I download the file, decode to WAV, convert to stereo, and then reprocess the audio. I tweak the dynamic range and address show participant audio level variations using various plugins. I then bump things up to -16.0 LUFS (using what I like to refer to as “The Lund Method”) while supplying enough headroom to comply with -1.0 dBTP as my ultimate ceiling. I’ll get into the specifics in a future post.

According to the leading expert Mr. Thomas Lund:

“Mobile and computer devices have a different gain structure and make use of different codecs than domestic AV devices such as television. Tests have been performed to determine the standard operating level on Apple devices. Based on 1250 music tracks and 210 broadcast programs, the Apple normalization number comes out as -16.2LKFS (Loudness, K-weighted, relative to Full Scale) on a BS.1770-3 scale.

It is, therefore, suggested that when distributing podcast or Mobile TV, to use a target level no lower than -16LKFS. The easiest and best-sounding way to accomplish this is to: 1) Normalize to target level (-24LKFS); 2) Limit peaks to -9dBTP (Units for measurement of true peak audio level, relative to full scale); and 3) Apply a gain change of +8dB. Following this procedure, the distinction between foreground and background isn’t blurred, even on low-headroom platforms.”

In this snapshot I demonstrate the described workflow. I’m using two independent instances of the bx_control plugin to apply the gain offsets at various stages of the signal flow. After the initial calculated offset is applied, the audio is routed through the Elixr True Peak Limiter and then out through the second instance of bx_control applying +8dB of static gain. You can also replicate this workflow on an off-line basis. Note that I’ve slightly altered the limiting recommendation.


So why do I feel the need to do this?

Podcast Source

These are the specs. and the waveform overview of a recently published Fresh Air Podcast in it’s entirety:


Next is a 3 min. audio segment lifted from the published Podcast. The stats. display measurements of the attached 3 min. segment:


Podcast Optimized for Internet/Mobile

Below is the same 3 min. segment. I reprocessed the audio to make it suitable for Podcast distribution. The stats. display measurements of the attached audio segment:


The difference between the published source audio and the reprocessed version is quite obvious. The Loudness Normalized audio is so much more intelligible and easier to listen to. In my view the published audio is simply out of spec. and unsuitable for a Podcast.

Bear in mind the condition of the source audio is not uncommon. The problems that persist are not exclusive to podcasts distributed by NPR or by any of their affiliates. Networks with global reach need to recognize their Podcast distribution platforms as important mechanisms to expand their mass appeal.

It has been noted that the Public Radio community in general is exploring ways to enhance the way in which they produce their programs with focus on loudness standardization. My hope hope is this carries over to their Podcast platforms as well.


For more information please refer to “Managing Audio Loudness Across Multiple Platforms” by Thomas Lund at

Technorati Tags: , , ,

No Free Pass for Podcasts …

I think it was in the mid to late 1980’s. I was still living home, totally fixated on what was happening with Television devices, programming and transmission. Mainly the advent of MTS Stereo compatible TV’s and VCR’s. I remember waiting patiently for weekly episodes of programs like Miami Vice and Crime Story to air. I would pipe the program audio through my media system in glorious MTS stereo. For me this was a game changer.


I also remember that it was around the same time that Cable TV became available in the area. I convinced my Mom and Dad to allow me to order it. Initially it was installed on the living room TV, and eventually made it’s way on to additional TV’s throughout our home. For the most part it was a huge improvement in terms of reception and of course program diversity. However there was one issue that struck me from the very beginning:the wide variations in loudness between network TV Shows, Movies, and Adverts. In fact it was common for targeted, poorly produced, and exceedingly loud local commercials to air repeatedly throughout broadcast transmissions. Reaching for the remote to apply volume attenuation was a common occurrence and a major annoyance.

Obviously this was not isolated. The issue was widespread and resulted in a public outcry to correct these inconsistencies. In 2010 The CALM Act was implemented. The United States and Europe (and many other regions) adopted and now regulate loudness standardization guidelines for the benefit of the public at large.

If there is anyone out there who cannot relate to this “former” problem, I for one would be very surprised.

Well guess what? We now have the same exact problem existing on the most ubiquitous media distribution platform in existence – the internet.

I realize any expectation of widespread audio loudness standardization on the internet would be unreasonable. There’s just too much stuff out there. And those who create and distribute the media possess a wide scope of skills. However there is one sort of passionate and now ubiquitous subculture that may be ripe for some level of standardization. Of course I’m referring to the thousands upon thousands of independenlty produced Podcasts available to the masses.

In the past I’ve made similar public references to the following exercise. Just in case you missed it, please try this – at you own risk!

Put on your headphones and queue up this episode of The Audacity to Podcast. Set your playback volume at a comfortable level, sit back, and enjoy. After a few minutes, and without changing your playback volume setting – queue up this episode of the Entrepreneur on Fire podcast.


Need I say more?

From what I gather both programs are quite popular and highly regarded. I have no intension of suggesting that either producer is doing anything wrong. The way in which they process their audio is their artistic right. On the other hand in my view there is one responsibility that they both share. That would be the obligation to deliver well produced content to their subscribers, especially if the Podcast generates a community driven revenue stream. It’s the one thing they will always have in common. And so I ask … wouldn’t it make sense to distribute media following audio processing best practices resulting in some level of consistency within this passionate subculture?

I suspect that some Podcast producers purposely implement extreme Program Loudness levels in an attempt to establish “supremacy on the dial.” This issue also exists in radio broadcast and music production, although things have improved ever since Loudness War participants were called to task with the inception of mandatory compliance guidelines.

I’ve also noticed that many prolific Podcast Producers (including major networks) are publishing content with a total lack of Program Loudness consistency within their own catalogs form show to show. Even more troubling, Podcast aggregation networks rarely specify standardization guidelines for content creators.

It’s important to note that many people who consume audio delivered on the internet do so in less than ideal ambient spaces (automobiles, subways, airplanes etc.) using low-fi gear (ear buds, headphones, mobile devices, and compromised desktop near fields). Simply adopting the broadcast standards wouldn’t work. The existing Program Loudness targets are just not suitable, especially if the media is highly dynamic. The space needs revised specs. that would optimize the listening experience.

Loudness consistency from a Podcast listener’s perspective is solely in the hands of the producers who create the content. In fact it is possible these producers may even share common subscribers. Like I said – the space is ripe for standardization.

Currently loudness compliance recommendations are sparse within this massive community driven network. In my view it’s time to raise awareness. A target specification would universally improve the listening experience and ultimately legitimize the viability of the platform.

For the record, I advocate:

File Format: Stereo, 128kbps minimum.
Program Loudness: -16.0 LUFS with acceptance of a reasonable deviation.
Loudness Range: 8 LU, or less.
True Peak Ceiling: -1.0 dBTP in the distribution file. Of course this may be lower.

Quick note: when I refer to “Podcasts”, in a general sense I’m referring to audio programs and videos/screencasts/tutorials that primarily consist of spoken word soundtracks. Music based Podcasts or cinema styled videos with high impact driven soundtracks may not necessarily translate well when the Loudness Range (and Dynamic Range) is constricted.


Technorati Tags: ,