Skip to main content


The more I work with sound synthesis, the more I repeat the mantra that "it's not the patch you use, it's how you modulate it."

When I started out making electronic music, I was fixated on the cool-kid DSP algorithms like granular, FFT, physical modeling, and chaotic oscillators. I assumed my work would be most interesting if I went straight to the nerdiest techniques possible, skipping over the "boring" ones like subtractive, additive, and FM. As I work more with sound synthesis, I've found that the synthesis orthodoxy was right all along. Subtractive, additive, and FM are incredibly powerful synthesis methods, and you could spend a lifetime exploring their parameter space. It's all about layering, finding the sweet spots, and modulating them in the right ways.

Ultimately, both the DSP and the modulation are critical, but modulation has been a major focus for me, and I wanted to write some thoughts about it. Let's tune in to this video tutorial by Frequent demonstrating a workflow in the wavetable synthesizer Serum:

Critical to his sound design process is the is the spiky automation signal that he punches into the LFO section.

Prolific videomaker SeamlessR multiplies this process into a 16-lane highway of automations, each signal mapped to a different knob:

Their results sound great, but it's definitely laborious. Since I work with real-time synthesis, I don't use timeline views or bother with manual entry of modulations. I often use SeamlessR-style modulations by mapping numerous independent random LFOs to different parameters.

Often this is serviceable for use in a piece. Unfortunately, random LFOs often lack a directionality or unity that manual modulations tend to have. They often end up sounding like textures, when timeline-based methods like those of SeamlessR and Frequent yield results that sound like gestures.

One option is to get multidimensional control data from physical sensors operated by a human. There is an entire conference dedicated to making DIY musical interfaces. There is also one in your pocket. Your phone has mouse X and Y position, pressure, touch radius, accelerometer, and gyroscope. Map those ten parameters to a marginally interesting synthesizer patch.

The results sound more interesting than the random LFOs since 1) a human controller is immediately prone to making gestures, not textures, and 2) there is a rapid feedback loop between the sound and the interface. While playing, you're constantly listening for the sweet spots in the synthesizer patch, and adapting your motions to the synthesizer's strengths.

I shy away from hardware musical interfaces since they have a major drawback: reliability. It takes a lot of expertise in electronics and fabrication to make a DIY interface that doesn't break all the time. Even streaming data from a phone requires a reliable network connection.

We also have the option to record gestural data offline and play it back later. This keeps advantage #1, but forfeits advantage #2. Even then, it can sound pretty great.

You can also sonify data outside of gestural data. Behold the Dow Jones LFO.

Cross-modulation or sidechaining is the act of taking another audio signal and using its amplitude envelope as a modulation source. Vocoding is the same idea, but splits the input signal into numerous frequency bands. What about a vocoder where each frequency band controls some arbitrary synth parameter?

Gesture analysis -> gesture synthesis.

Old patent.

12 Principles of Animation.