Digital Modulation Synthesis

Nathan Ho

2024-11-14

A while back, I tried recreating the sound of a dialup modem from scratch, which involved a dive into multiple PDFs of telecommunication standards. (I am not the first person to do this.) Very helpful was Windy Tan’s beautiful analysis breaking down a spectrogram of a real dialup handshake. A particularly famous bit is the nasal mid-low “whine” at the end of the handshake, and if you’ve ever wondered what that is, you’re hearing line probing signals L1 and L2, which are synthesized using additive synthesis. The exact partial frequency, phases, and amplitudes are specified in section 10.1.2.4 of ITU-T V.34. They’re at 9 seconds in this recording of a modem by thearchiveguy99 on Freesound:

The whine is legendary, but I want to turn your attention to high-frequency gurgles before it starting at about 4 seconds in, some of them resembling slurping a last bit of soda through a straw in a McDonald’s cup. These tones, specified at a high level in ITU-T V.8bis, are encoded using frequency-shift keying as given in the lower-level ITU-T V.21 spec. In its simplest form, FSK uses a single sine wave to encode a binary signal by modulating its frequency. Following V.21’s description of a “channel 1” signal, at regular intervals of 1/300 seconds, a single bit is transmitted. If the bit is 0, the sine wave’s frequency is set to 1180 Hz. If the bit is 1, the frequency is set to 980 Hz.

[image]

[audio]

Designing that dialup imitation made me realize that I really like how FSK sounds purely as a sound design tool. This got me looking into its various siblings, which are known as the “digital modulation” schemes for transmission of digital signals over an analog medium such as radio. In contrast, “analog modulation,” where the signal being transmitted is continuous, encompassing familiar techniques like FM, AM, and single-sideband modulation. Today we’re going to honor the long tradition of adapting telecommunications methods for creative use by seeing what kind of sounds we can design by just directly listening to signals produced with FSK and related methods. I call this strategy Digital Modulation Synthesis, a name I’m not in love with, but I couldn’t think of a better one. It may be abbreviated to the cooler-sounding “DM Synthesis.”

In telecommunications, modulation is only half the story, and you also need a demodulator which takes the transmission signal and gets the data back. Engineering modulators and demodulators in a way that’s robust to noise is an interesting problem subject to nearly a century of study, but here we’re going to be the weirdos who just want to listen to the modulated signal itself. I will spend no time here discussing the engineering tradeoffs of these digital modulation schemes in a telecommunication settings; those problems are very interesting and deep but not relevant to this post.

This post is pretty light on math, and the DSP isn’t fancy here, as our concentration is more on sound design and synthesis. I was able to implement DM Synthesis with built-in SuperCollider UGens, and I doubt they are difficult in environments like Pd, Max, etc.

On-off keying (OOK)

To introduce some terminology, our digital modulation schemes will have at least one carrier frequency which is just the frequency of the synthesized oscillator, and a signaling rate which is the number of bits per second. The reciprocal of the signaling rate, and therefore the time interval between bits, is notated \(T\) and given in seconds. In our above example, \(T = 1/300\).

Probably the simplest form of digital modulation is on-off keying (OOK), which encodes a binary signal by turning a sine wave on or off. You start by synthesizing a sine wave at the carrier frequency, and every \(T`\) seconds, set the amplitude to either 1 or 0 as determined by the input bit.

[image]

[audio]

The modulation parameters available to us are the carrier frequency and signaling rate. In particular, the ratio between them is of importance; if the carrier frequency is an integer multiple of the signaling rate, then the sine wave will always resume at the same phase. (This is directly analogous to inharmonic vs. harmonic FM synthesis.)

This is a good time to discuss the input bit stream. This is art, so we have the liberty of making up whatever data we want. The most obvious choice would be to coin flip for 0 or 1 bits, but other creative options are available too:

Independent random bit stream, but the probability isn’t necessarily 50%, and instead controlled as a synthesis parameter.
A periodic bitstream, like 0, 1, 1, 0, 1, 1, 0, 1, 1, … These are audibly different from aperiodic signals.
Some mathematical pattern like the Thue-Morse sequence, Fibonacci word, characteristic functions of any integer sequence, etc.
A random bit stream but the probabilities are non-independent, using Markov chains, etc.
Raw data from somewhere else, such as ASCII text or a random file you have sitting around.
Switching among any subset of the above.

Amplitude-shift keying (ASK)

OOK is a special case of amplitude-shift keying (ASK), where instead of just amplitudes 0 and 1 you can have any fixed number of amplitudes. As such the input signal is no longer a bit stream, but a discrete signal with a fixed number of values.

[image]

[audio]

Frequency-shift keying (FSK)

I have already introduced FSK, which modulates a bit stream using a single sine wave that jumps between two carrier frequencies depending on the input bit. To clarify, the transitions between the frequencies are immediate, but the sine wave’s phase is continuous and the jumps do not actually create a discontinuity in the signal. (It’s not properly FSK if you’re just multiplexing between two independent free-running sine waves, but also it’s okay to do that if it sounds good to you.)

FSK is essentially frequency modulation where the modulator signal is binary.

FSK vowel synthesis: Since it generates two frequency peaks, FSK can be used as a formant synthesizer. The human ear is incredibly sensitive to audio with speech-like qualities, and in my experience only needs to hear two formants moving in opposite directions in the 500-3000 kHz range to start getting the impression of something vocal. Long ago I learned that you can make extremely bright growls by FMing a sine wave with a square wave and messing with the modulation index. So here’s a generalization of that: you have formants \(f_1\) and \(f_2\), which you use as the two carrier frequencies for an FSK scheme.

If you need more formants, you can either mix together multiple FSK instances, or use three or more frequencies (see next section).

Multiple frequency-shift keying (MSFK)

FSK usually refers to only two carrier frequencies, but you can use as many as you want to modulate an integer signal with 3 or more discrete values. While straightforward to explain, consider the musical options — you can tune these frequencies to vocal formants, or to a scale or chord, or harmonics of a fundamental, or to peak frequencies analyzed in an input signal.

Phase-shift keying (PSK)

Phase-shift keying has only a single carrier frequency, synthesized with a sine wave, but its phase is modulated in response to the input signal.

In binary phase-shift keying (BPSK), we synthesize carrier sine wave \(x = \sin(2\pi f_c t)\). Every \(T\)-second interval, if the input bit is 1 then output \(x\); if it is 0 then output \(-x\), a 180-degree phase shift. here the ratio between the signaling rate and the carrier frequency matters – \(T\) is chosen so that it is an integer number of periods, so that the sign flips occur when \(x = 0\) and there are no discontinuities in the signal.

[image]

[audio]

Quadrature phase-shift keying (QPSK)

In quadrature phase-shift keying (QPSK), the input signal is no longer binary but has four different values. Let’s say the input signal is \(b(t)\) which takes on values 0, 1, 2, or 3 and is piecewise constant at intervals of \(T\) seconds. The synthesized signal is

\begin{equation*} x = \sin(2\pi f_c t + b \pi / 2) \end{equation*}

In synthesis terms this is phase modulation where the modulator may only take on values \(0, \pi/2, \pi, 3\pi/2\). This scheme will produce discontinuities in the modulated signal. Again, in telecommunications \(T\) must be an integer multiple of \(f_c\).

[image]

[audio]

Offset quadrature phase-shift keying (OQPSK) is a further variant of QPSK with the constraint that the phase may only jump in 90-degree increments. That is, \(b(t)\) may not jump directly between 0 and 2 or between 1 and 3. (The output signal is still discontinuous.) So in reality, each interval is encoding a single bit – as each sample \(b(t)\) either steps up or down by 1 (modulo 3).

[image]

[audio]

For completeness, also in the QPSK family is minimum-shift keying, which is technically a variant of OQPSK but magically turns out to be a special case of FSK such that the two carrier frequencies are \(f_c \pm 1/(4T)\) and \(f_c\) is an integral multiple of \(1/(4T)\). Differential phase-shift keying, as I understand it, is more of a preprocessing step for the input signal and doesn’t need to be treated differently if we’re just making up data.

Other phase-shift keying variants

There’s also some others like “\(\pi/4\)-QPSK,” but broadly, we have the freedom to make up our own PSK variants by configuring the following:

A discrete set of phase offsets. For example, in BPSK the allowed phases are 0 and pi, and in (O)QPSK it’s 0 pi/2 pi 3pi/2. You can divvy it up into 4 phases if you want.
Constraints on how the phase offsets can change between successive samples. BPSK and QPSK have no constraints, but OQPSK adds the constraint that forbids 180-degree jumps.

To visualize this, telecommunications people use a constellation diagram, which places the phases as points on the unit circle and then connects together the allowed transitions using edges.

[constellation diagram]

As an aside, constellation diagrams actually show up in the real world if you run the signal through an oscilloscope, putting the signal on the X axis and a 90-degree phase-shifted version of it on Y.

Gaussian frequency-shift keying (GFSK)

GFSK, used in Bluetooth, is binary FSK where prior to performing frequency modulation, the modulation signal is smoothed out using a Gaussian lowpass filter. (A Gaussian lowpass filter has an impulse response that’s just the normal distribution. Interestingly, the Fourier transform of a Gaussian function is a Gaussian function.) In 1984 by Goode tells me that they implemented this lowpass filter with an FIR filter.

We can generalize this by putting basically any EQ filter between the modulator and carrier in an FSK scheme. The Gaussian filter was chosen because it doesn’t have any Gibbs phenomenon (time-domain overshoot in the filtered signal), but we aren’t beholden to that.

Quadrature Amplitude Modulation (QAM)

QAM is actually more often an analog modulation technique, but here we’re looking at the digital version. In QAM both the phase and the amplitude of the carrier sine wave are modulated simultaneously, drawing from a fixed set of (phase, amplitude) pairs. Here we generalize the constellation diagram so that the phase offset is still plotted as an angle, but now we also plot amplitude using distance from the origin.

This is 16-QAM, a fairly common scheme. The input signal to be modulated takes on 16 different values, or 4 bits.

[image]

Miscellany

This isn’t the full gamut of digital modulation techniques that have appeared in the literature. Instead of piling on more established methods, I’ll throw in some wilder ones that we have the freedom to do in a synthesizer:

In all of these methods, the sine wave carrier can be swapped out for a non-sine-wave carrier.
In FSK-like methods, the “carrier” can be not an oscillator but the cutoff of a filter.
QAM (if \(T\) is a multiple of the period) can be thought of as drawing from 16 short segments of audio, each one selected by the input signal. Those segments of audio need not be sine waves; they can be audio from any source or synthesized with any method.

Discussion

I mentioned in the intro that there’s a long history of creative music DSP arising from telecommunications. That’s an understatement, because the entire field of signal processing basically originated from people designing communication systems, and most effects used in music production originated in that field. FM synthesis of course developed from FM radio, but we also have frequency shifters using tech from SSB radios. The first vocoder, SIGSALY, was developed for the military. A lot of the theory and practice of analog and digital filter design was originally created for radio.

I can envision four different broad approaches for employing codecs in musical applications:

Type 1: data → encode as an audio signal.
Type 2a: input audio → encode → decode.
Type 2b: input audio → encode → add noise or other effects → decode.
Type 3: input audio → decode as another audio signal.
Type 4: input audio → decode to some signal → encode as audio signal.

Here I’ve explored type 1. Types 2a and 2b would describe things like bitcrushers (there’s a surprising number of ways to bitcrush audio, possibly the subject of a future post), linear predictive coding, or audio effects that simulate MP3 artifacts – you encode the signal, usually in some lossy way, and then decode it. Types 3 and 4 would involve demodulating audio signals that were not necessarily intended to be demodulated.