Skip to main content

Inside Airwindows' ToTape6 Plugin

Chris Johnson's Airwindows is an audio plugin company run unlike any other. All the 300+ AU and VST plugins are devoid of any graphical interface and are completely free and open source, with continued development supported by Patreon. The vast majority of these plugins are single-purpose effects for mixing and mastering engineers -- EQ filters, distortion units, compressors, DAC simulations, and the occasional delay or reverb.

For a long time, I've wanted to take an Airwindows project and do a detailed breakdown of how the DSP works. We can all learn a few things from Chris, who has developed a refined ear for the design of audio effects with his multiple released iterations on each plugin. In addition, the math that goes into these DSP algorithms is often unusual and quite different from what you find in signal processing textbooks. A lot of open source plugins and DSP learning resources only go over these cookbook algorithms, so it's worth looking at alternative approaches.

There are many plugins to choose from, but I keep coming back to the Airwindows tape simulations, IronOxide and ToTape. I have a background in virtual analog, where we have the luxury of the lumped element model and the mature field of ordinary differential equation solving. By contrast, electromechanical simulations I find extremely daunting, as the phenomena are often not accurately modelable with ODEs. Many tape simulations forgo detailed physical modeling in favor of a more romantic black-box or gray-box approach, capturing the psychoacoustics of tape in broad strokes and carefully tuning the plugin until it sounds realistic. Both IronOxide and ToTape take this route. Chris describes IronOxide as a grungy effect for enhancing specific sounds with distortion, while ToTape is more subtle and transparent, mimicking a reel-to-reel machine. I've gone with ToTape6 as the subject of today's blog post, although I am open to analyzing IronOxide in the future.

As an important note, I'm doing a very precise analysis here, but this doesn't mean that the DSP here is necessarily sacred and holy. Chris's process is essentially jazz improvisation in the medium of DSP. Along the way he makes some decisions I disagree with, and there are things that he himself plans on changing as he iterates on the plugin. It may seem incongruous to do an obsessive breakdown of a plugin developed in a casual manner, but I think both the academic and the intuitive perspectives are valuable in DSP. Chris got ToTape6 to sound good, and I'm here to play the role of scientist and try to explain why it sounds good.

Overview

ToTape6's controls are:

  • input gain

  • soften

  • head bump

  • flutter

  • output gain

  • dry/wet

Tape hiss is not modeled. The plugin processes stereo audio, with left and right channels processed separately and sharing a random flutter LFO.

Output gain and dry/wet (which is a linear fade) are applied in the obvious ways. Input gain is more complicated than a simple multiplication, as its position in the signal chain is dependent on whether it is greater than 1; it also has an interaction with the head bump.

Flutter

Flutter is implemented with a delay line with linear interpolation modulated by an LFO. (The DSP-heads in the audience might be thinking, "why not cubic interpolation?" Despite what the math says, it doesn't always sound better.) The LFO is a sine wave whose frequency is modulated by a smooth random signal. The following pseudocode implements the full flutter algorithm:

scale = sampleRate / 44100
    flutter_trim = 0.0024 * scale * flutter^2
lfo = 0.5
next_max = 0.5

every sample:
    offset = 70 * scale * flutter^2 * (1 + lfo^2 * sin(phase))
    lfo = lfo * (1 - flutter_trim) + nextMax * flutter_trim
    phase += lfo * flutter_trim
    if phase >= 2pi:
        phase -= 2pi
        next_max = 0.24 + random() * 0.74

The output here is offset, which is the number of samples of delay at a given time. The sole control is flutter, ranging from 0 to 1 and defaulting to 0.5. random() generates a random value from 0 to 1.

The smoothed random signal is lfo, which approaches next_max using a one-pole lowpass filter. next_max is a stepped random signal, re-randomized whenever the sine wave completes a cycle.

The flutter input controls three things: the depth of the delay line modulation, the frequency of the LFO, and the smoothing applied to the LFO's frequency signal.

The multiplication of the sine wave by lfo^2 deserves some discussion. When a delay line is modulated, a change in pitch is detected, and there is a straightforward relationship between the delay amount and the playback rate (i.e. frequency ratio or perceived pitch shift):

\begin{equation*} \text{rate} = 1 - \frac{d}{dt}\text{delay} \end{equation*}

I call this the "rate-delay theorem," and it really comes in handy when thinking about modulated delays. If the delay is static, the rate is 1 and pitch is unchanged. If the delay signal is modulated with a sine wave LFO of constant frequency, we have \(\text{delay} = \sin \omega t\) plus a constant, so the rate is \(1 - \omega \cos \omega t\). (If \(\omega\) varies over time then this expression is an approximation, albeit a good one if the frequency variation is slow.) This means that as we increase the LFO frequency, the oscillation of the rate will increase not only in frequency, but in amplitude. To compensate for this, it's common to divide by the angular frequency with \(\text{delay} = \frac{1}{\omega} \sin \omega t\) plus constant.

However, this isn't what Chris did in ToTape6. Instead, he goes in the opposite direction, multiplying by a value proportional to the square of frequency: \(\text{delay} = \omega^2 \sin \omega t\). This means that as the delay randomly varies, it impacts both the rate oscillation's frequency and amplitude. While the decision to use the square of lfo was probably an ad hoc thing that sounded good, it makes sense -- a pitch wobble with constant amplitude is unlikely to sound realistic, and we might as well reuse the smooth random signal we already have as an amplitude modulator for the rate.

Chris cautions against overuse of flutter, especially on master. "The most amazingly awesome tape recorders did NOT have loads of flutter," he writes. "Anyone who's mastering and intentionally adds flutter ought to think hard about whether that's really helping." Also, flutter is intended to be used with the dry/wet control set to 1, as a flanging effect will happen if a delayed signal is mixed with a dry signal.

Soften

I've casually looked at a few Airwindows plugins at this point, and a lot of them have a common feature that Chris sometimes calls interleaved filters. To implement an interleaved filter, we take a standard filter but run it at half speed, only using the even-numbered samples of the input signal. We run another identical filter in parallel on the odd-numbered samples. To produce the output of the interleaved filter, we alternate between taking one sample from the first stream, then one sample from the second stream.

Although naive downsampling and upsampling is involved, this doesn't cause any aliasing at all, and merely creates another linear time-invariant filter. It turns out that this transformation is equivalent to replacing every single-sample delay in the filter's block diagram with a two-sample delay.

For example, if we have a one-pole lowpass with the difference equation

\begin{equation*} y[t] = (1 - k) x[t] + k y[t - 1] \end{equation*}

an interleaved version looks like this:

\begin{equation*} y[t] = (1 - k) x[t] + k y[t - 2] \end{equation*}

In terms of the Z-transform, if the original filter is \(H(z)\) then the interleaved filter has transfer function \(G(z) = H(z^2)\). Thus every zero \(z\) of \(H\) becomes two zeros \(\pm \sqrt{z}\) of \(G\), and the same for poles. Because \(G(e^{j\omega}) = H(e^{2j\omega})\), filter interleaving squashes the frequency response from 0 to \(\pi\) into the range \([0, \pi / 2]\), and the upper half of the spectrum mirrors the bottom half. Interleaving therefore turns a lowpass filter into a notch filter and a highpass filter into a bandpass. In both cases, the center frequency is fixed at half of the Nyquist frequency; if our sample rate is 48 kHz, half Nyquist is a high but audible 12 kHz.

The "soften" feature in ToTape6 starts with a high bandpass filter implemented as an interleaved first-order highpass filter. The following waveshaping function is applied to the highs:

\begin{equation*} s(x) = \text{sgn}(x) (1 - \cos(\min(|x|,1) \pi / 2)) \end{equation*}

The function \(s\) pinches the signal towards zero using quarter sine wave segments and applies hard clipping. As \(s(-x) = -s(x)\), this waveshaping process adds odd harmonics only to a sine wave input.

Downstream, the processed highs are subtracted from the dry signal. (Actually, the dry signal is processed a bit before this -- we'll see the full picture later.) The polarity is important here, as I believe the idea is to partially cancel out the high frequencies in a nonlinear, signal-dependent way.

Head bump

Head bump is also accomplished with interleaved filters, but with some key differences. This time, nonlinearities are embedded in the feedback loops.

The way the parallel half-rate filters are combined is distinct -- here, Chris averages the two signals instead of alternating between them. (Sorry, it's a bit hard to explain in prose.) The effect of this is still straightforward, equivalent to alternating the signals as above and then applying a two-point FIR averaging filter with transfer function \(\frac{1 + z^{-1}}{2}\).

Here is peudocode for the head bump, with comments to help break everything down:

x_1 = 0
x_2 = 0

every sample:

    // Two-sample feedback.
    signal = x_2 + 0.05 * input_signal_1

    // Cubic distortion function.
    signal = signal - signal^3 * 0.12 * sample_rate / 44100

    // Sine wavefolding distortion.
    signal = sin(signal)

    apply interleaved biquad bandpass to signal (see below)

    // Clip and invert the sine wave distortion.
    signal = arcsin(max(min(signal, 1), -1))

    // Apply a signal-dependent waveshaper.
    signal = suppress(signal, 0.00013 * (1 - abs(input_signal_2)))

    // Two-point FIR averaging filter.
    signal = (signal + x_1) / 2
    x_2 = x_1
    x_1 = signal

    output_signal = signal

    // Add to input signal.
    output_signal = input_signal_2 + bump * 0.25 * input_gain * signal

The input audio signals are input_signal_1 and input_signal_2, which come from different stages of processing -- once we see the big picture I'll show how these differ. bump is a user control ranging from 0 to 1, and input_gain is also a control.

The \(\text{suppress}\) function is defined as

\begin{equation*} \text{suppress}(x, r) = x - r \theta(x - r) + r \theta(-x - r) \end{equation*}

where \(\theta\) is the Heaviside step function.

The biquad filter, sans interleaving, is identical to the constant 0 dB peak gain bandpass filter from the Audio EQ Cookbook (although the formula is different, the results are identical). The cutoff frequency \(f_0 = 154.35 \text{ Hz}\) and the resonance is \(Q = 9 \cdot 10^{-4}\), so this is an extremely wide filter with a magnitude response mostly very close to 1, but dropping to 0 at dc and Nyquist.

With interleaving, I compute that the attenuation of this filter at 20 Hz is less than \(10^{-4}\text{ dB}\). There is a deep, narrow notch at half Nyquist as a product of the interleaving. Head bump is something I associate with changes to the bass range, so I found this confusing.

I reached out to Chris to ask what the intent was with this filter, and his response was that the notch is merely an unwanted side effect and that these filters might not be interleaved in the next iteration of ToTape.

Bandpass

Independent of the head bump, another filter is applied to the input signal. It is also an interleaved biquad bandpass filter, but with different settings: \(f_0 = 705.6 \text{ Hz}\) and \(Q = 7 \cdot 10^{-4}\). The effect on bass is still negligible, and like in the above section a deep notch at Nyquist is introduced. Prior to this linear filter, a sine wavefolder is applied, and after the filter comes a clipper followed by an arcsine waveshaper.

Mojo

"Mojo" is a nonlinear waveshaper with the following formula:

\begin{equation*} m(x) = \frac{\sin\left(\frac{\pi}{2} x |x|^{1/4}\right)}{|x|^{1/4}} \text{ for } x \neq 0,\, m(0) = 0 \end{equation*}

Prior to applying this waveshaper, \(x\) is hard-clipped to the range \([-1,\,+1]\). This is once again an odd function, but unlike the waveshapers we've encountered so far, it is a nonmonotonic wavefolding function that peaks at \(x \approx \pm 0.93\).

ADClip

The ADClip algorithm is taken from Chris's previous plugin by the same name. It implements a curious stateful clipper that is implemented like so:

clip = 0.99
softness = 2 / (1 + sqrt(5))

last = 0

def do_clip(x_1, x_2, clip):
    if x_1 >= clip:
        if x_2 < clip:
            return clip * softness + x_2 * (1 - softness);
        else:
            return clip
    if x_1 <= -clip:
        if x_2 > -clip:
            return -clip * softness + x_2 * (1 - softness);
        else:
            return -clip
    return x_1

every sample:
    signal = input_signal
    last = do_clip(last, signal, clip)
    signal = do_clip(signal, last, clip)
    output_signal = signal

Chris describes this:

[I]f a sample clips and the PREVIOUS sample wasn’t, ADClip outputs an intermediate value. And if a sample clips and the NEXT sample isn’t (it runs a sample of latency to do this), likewise.

The big picture

The pseudocode is as follows:

every sample:
    signal = input_signal
    if input_gain < 1:
        signal = signal * input_gain
    signal = flutter.process(signal)
    highs = soften.process(signal)
    dry = signal
    signal = bandpass.process(signal)
    ground = dry - signal
    if input_gain > 1:
        signal = signal * input_gain
    signal = signal - highs
    head_bump_signal = head_bump.process(dry, signal)
    signal = signal + head_bump_signal * 0.25 * bump * input_gain
    signal = mojo(max(min(signal, -1), -1))
    signal = signal + ground
    signal = signal * output_gain
    signal = adclip.process(signal)
    output_signal = signal * wet + input_signal * (1 - wet)