Decades after Curtis Roads’ Microsound, granular synthesis is making appearances here and there in the commercial plugin market. While it’s nice to see a wider audience for left-field sound design, I have my quibbles with some of the products out there. From what I’ve heard, so many of these products’ demos are covered in reverb in obvious compensation for something, showing that the plugins seem most suited for background textures and transitional moments. In place of sound, the developers seem to prioritize graphics — does watching 3D particles fly around in a physics simulation inspire the process of music production, or distract from it?
Finally, and most importantly, so many granular “synths” are in fact samplers based on buffer playback. The resulting sound is highly dependent on the sampled source, almost more so than the granular transformations. Sample-based granular (including sampling live input such as in Ableton Live’s Grain Delay) is fun and I’ve done it, but in many ways it’s become the default approach to granular. This leaves you and me, the sound design obsessives, with an opportunity to explore an underutilized alternative to sampled grains: synthesized grains.
This post introduces a possibly novel approach to granular synthesis that I call Correlated Granular Synthesis. The intent is specifically to design an approach to granular that can produce musical results with synthesized grains. Sample-based granular can also serve as a backend, but the idea is to work with the inherent “unflattering” quality of pure synthesis instead of piggybacking off the timbres baked into the average sample.
Correlated Granular Synthesis is well suited for randomization in algorithmic music context. Here’s a random sequence of grain clouds generated with this method:
This post is my attempt at explaining my own philosophy of sound design. It’s not in final form, and subject to amendment in the future.
The type of sound design I refer to is specific to my own practice: the creation of sounds for electronic music, especially experimental music, and especially music produced using pure synthesis as opposed to recorded or sampled sound. These ideas presented might have broader applications, but I have no delusions that they’re in any way universal.
The theory I expound on isn’t in the form of constraints or value judgements, but rather a set of traits. Some are general, some specific, and my hope is that considering how a sound or a piece relates to these traits will take me (and possibly you) in some directions not otherwise considered. Some of the traits contain assertions like “if your sound does X, your audience will feel Y,” which in a composition may be employed directly, carefully avoided, or deconstructed. You’ll also note that the theory is concerned mostly with the final product and its impact on the listener, not so much the compositional or technical process. (Friends and collaborators are well aware that I use highly idiosyncratic and constrained processes, but those are less about creating music and more about “creating creating music.”)
No theory of sound design will replace actually working on sound design. Sound design isn’t a spectator sport, nor a cerebral exercise, and it has to be practiced regularly like a musical instrument. Reading this post alone is unlikely to make you a better sound designer, but if it’s a useful supplement to time spent in the studio, I’d consider this work of writing a success.
I will deliberately avoid talking about the topics of melody, harmony, counterpoint, consonance vs. dissonance, and tuning systems, and I’ll only talk about rhythm abstractly. There are many existing resources dedicated to these topics in a wide variety of musical cultures.
Left: spectrogram of a child singing. Right: spectrogram of resynthesized audio.
Background
I was alerted to audio texture resynthesis methods by a student of mine who was interested in the collaborative work of researcher Vincent Lostanlen, musician Florian Hecker, and several others [Lostanlen2019][Lostanlen2021][Andén2019][Muradeli2022]. Their efforts are built on an analysis method called “Joint Time-Frequency Scattering” (JTFS) based on the Continuous Wavelet Transform. In an attempt to understand the work better, I binged a wavelet transform textbook, [1] implemented a simplified version of JTFS-based resynthesis, and and briefly exchanged emails with Lostanlen. His helpful answers gave me the impression is that while JTFS is a powerful analysis technique, resynthesis was more of a side project and there are ways to accomplish similar effects that are more efficient and easier to code without compromising too much on musicality.
Audio texture resynthesis has some history in computer music literature [Schwartz2010], and some researchers have used resynthesis to help understand how the human brain processes audio [McDermott2011].
After some experimentation with these methods, I found that it’s not too hard to build a simple audio texture resynthesizer that exhibits clear musical potential. In this blog post, I’ll walk through a basic technique for making such a system yourself. There won’t be any novel research here, just a demonstration of a minimum viable resynthesizer and my ideas on how to expand on it.
Algorithm
The above-mentioned papers have used fancy techniques including the wavelet transform and auditory filter banks modeled after the human ear. However, I was able to get decent results with a standard STFT spectrogram, then using phase reconstruction to get time-domain audio samples. The full process looks like this:
Compute a magnitude spectrogram \(S\) of the time-domain input signal \(x\). A fairly high overlap is advised.
Compute any number of feature vectors \(F_1(S),\, F_2(S),\, \ldots,\, F_n(S)\) and define their concatenation as \(F(S)\).
Initialize a randomized magnitude spectrogram \(\hat{S}\).
Use gradient descent on \(\hat{S}\) to minimize the error \(E(\hat{S}) = ||F(S) - F(\hat{S})||\) (using any norm such as the squared error).
Use phase reconstruction such as the Griffin-Lim algorithm on \(\hat{S}\) to produce a resynthesized signal \(\hat{x}\).
The cornerstone of making this algorithm work well is that we choose an \(F(S)\) that’s differentiable (or reasonably close). This means that the gradient \(\nabla E\) can be computed with automatic differentiation (classical backpropagation). As such, this algorithm is best implemented in a differentiable computing environment like PyTorch or Tensorflow.
The features \(F(S)\), as well as their relative weights, greatly affect the sound. If \(F(S)\) is highly time-dependent then the resynthesized signal will mimic the original in evolution. On the other hand, if \(F(S)\) does a lot of pooling across the time axis then the resynthesized signal will mostly ignore the large-scale structure of the input signal. I’m mostly interested in the latter case, where \(F(S)\) significantly “remixes” the input signal and disregards the overall structure of the original.
We will represent \(S\) as a 2D tensor where the first dimension is frequency and the second is time. As a matrix, each row is an FFT bin, and each column a frame.
If using a fancy alternative to the magnitude spectrogram such CWT or cochlear filter banks, you may have to do gradient descent all the way back to the time-domain samples \(x\). These analysis methods break down to linear frequency transforms that produce complex numbers followed by computing the absolute value of each bin, so differentiability is maintained.
I’m often asked where to find the best resources for learning SuperCollider. My informed and professional answer, which comes from over a decade of experience with this software as a user, developer, and educator, is as follows:
I’m not really sure?
The landscape of SC learning materials has changed a lot since I started in 2012/13. I’ve heard good things about Eli Fieldsteel’s video tutorials and Bruno Ruviaro’s “A Gentle Introduction” ebook, so I guess check those out. I haven’t perused their tutorials at length, but Eli and Bruno are both seasoned professionals at SC education, so you can’t really go wrong there. Also, I have my own SuperCollider Tips blog post (which I have just updated today), which is not a structured tutorial but addresses common beginner problems. But ultimately, the most efficient learning strategy really depends on what you intend to make, because the applications for SC are so diverse, as are the backgrounds of users coming to SC.
That’s probably not too enlightening. However, I do have a lot of advice for beginning SC users, drawing from my own tortuous path through learning to make electronic music.
What follows are a bunch of guidelines that I recommend beginning SC users follow, or at least consider. A good number of them are specific to my idiosyncratic approach to SC, which I’ve expounded on in my YouTube presence: heavy focus on sound design through synthesis, and only sporadic use of hardware and real-time interaction. Also, although some of my advice may help live coders, I’m pretty uninterested in live coding myself, so I can’t give specific advice there (creative coding I’m all for, but when I share my screen it’s for education, not seamless performance art).
One blog post I’ve been meaning to write for a while is a comprehensive review of the design of dynamic range compressors and limiters, both digital and analog. Textbook compressor designs can be easily found, but like reverbs there are lots of weird little tricks from both hardware and software designs that supposedly define the distinctive musical character of different compressors. It may be a while before I finish that post because, while I’ve read a lot about the DSP of compressors, I don’t feel yet qualified to write on design. I haven’t yet designed a compressor plugin that I’m happy with, nor done a lot of compressor wine tasting, and the musical and psychoacoustic aspects of compressors are to me at least as important as the signal math.
Nevertheless, there’s a weird corner of compressor design that I feel inspired to talk about, and it’s called negative compression. It’s a feature of a few commercial compressors; I’m not sure which was the first, but I first learned about the concept from Klanghelm DC1A. Negative comp is the source of considerable confusion – just watch the Gearspace pundits go at it.
The brief description is that a standard compressor, upon receiving a signal with increasing amplitude, will reach a point where the output amplitude will increase at a slower rate. If the compressor is a perfect limiter, the output amplitude will hit a hard limit and refuse to increase. A negative compressor takes it further – the output signal will eventually get quieter over time as the amplitude increases. If you feed a percussive signal into a negative compressor and drive it hard enough, it will punch a hole in the signal’s amplitude, and can split a transient in two. It can be a pretty bizarre effect, and seems underutilized.
This explanation should be enough for most, but you know this blog. We do the math here. In this post, I will explain the basic mathematics of compressors to demystify negative compression, propose variants of negative compressors, and demonstrate how to do negative compression in SuperCollider.
Exactly three years ago I made an ambient EP under the name “Interstate Hydra” and put it up on Bandcamp. Up until now I have kept this alias a secret and only shown these tunes to a few people, but I figured it’s been long enough.
Nocturnes was made entirely in Audacity using the “sound dumplings” method described in an earlier post. This was a total 180 from my usual workflow, which is to synthesize everything with SuperCollider code.
“For Ellie” is a single released months after Nocturnes, and is formally the simplest Interstate Hydra track, comprising a row of sound dumplings in arch form.
I don’t plan on returning to the Interstate Hydra alias for quite some time, but I hope you enjoy these tracks. Thank you for listening.
In 2005, Clarence Barlow published a paper on Intra-Samplar Interpolating Sinusoids (ISIS), an audio analysis-synthesis algorithm. It did not make much of a splash, with the paper having only 7 citations in Google Scholar. Nevertheless, it produces some interesting sounds, so let’s dive into it.
First, some context. The precursor to ISIS is a technique Barlow calls spectastics. In this method, the short-time Fourier transform is computed of an audio signal, and at each frame, the magnitude spectrum is resampled logarithmically to 12EDO and used as a probability distribution to select a pitch. The pitch sequence forms an extremely rapid melody, which can be synthesized or played on a robotic instrument. Barlow describes the spectasized melody as “remarkably like the original sound recording.”
In ISIS, this concept is taken to an extreme by making the “melody” a constant amplitude sine wave whose frequency is changed every sample. Given a digital signal that doesn’t exceed ±1, we can interpolate between any two successive samples with a partial cycle of a sine wave. An image helps here; the dots show the sampled digital signal.
For example, if the samples are 0 and 1, as seen in the first two samples in the image, we can interpolate with a quarter sine wave with a period of 4 samples and a frequency of 1/4th the sample rate. There are actually infinitely many ways to do this interpolation. For example, from 0 to 1 we can also have a sine wave that completes 5/4ths of a cycle. We can even assume that the initial phase of the sine wave is \(\pi\) and the final phase \(5\pi/2\), or have the sine wave going backwards with phase ramping from 0 to \(-3\pi/2\).
To resolve the ambiguity, ISIS restricts the frequency to be always nonnegative so the phase never goes backward, and always assumes the phase at every sample point is in the range \([-\pi/2, \pi/2]\) modulo \(2\pi\). There are other approaches, such as picking the minimum possible nonnegative frequency or the frequency of minimum absolute value, which may produce interesting alternative sounds. I won’t get into these (this is a relatively low-effort post), but feel free to try them out.
An early obsession of mine when I was first learning about music tech was a paper by Stelios Manousakis titled “Non-Standard Sound Synthesis with L-Systems,” published 2009 in Leonardo Music Journal. At the time I was a snob about making strictly “academic” electronic music (think Stockhausen or Xenakis) and viewed things like FM and subtractive as too plebian, so synthesis methods that branded themselves as “non-standard” were very alluring. If I could give my younger self some advice, I’d tell him that you have to build foundations as a sound designer first before you can get to the crazy experimental stuff. How are you going to make a cool alien soundscape from another dimension if you can’t make a nice snare from scratch? Every good post-tonal composer understands tonality, if only to know how to avoid it. You have to know what rules you’re breaking.
With more artistic maturity under my belt, I revisited Manousakis’ paper recently and still found it interesting, so I decided to do some riffing on it. In this post, I’ll quickly explain what L-systems are (there are lots of better explanations online) and walk through a complete, “minimum viable” system for generating sound with them. Sound examples are embedded.
L-systems and turtle graphics
Briefly, L-systems (short for “Lindenmayer systems”) generate a sequence of words (strings) using rewriting rules applied repeatedly to an initial word called the axiom. A classic L-system is given by the rules a → ab and b → a with the axiom a. At each iteration, each symbol in the word is replaced by the right-hand side of its corresponding rule. The first few iterations of this L-system are:
a
ab
aba
abaab
abaababa
abaababaabaab
abaababaabaababaababa
Most L-systems that are objects of study undergo approximately exponential growth in this way. L-systems can produce simple patterns like abababab as well as aperiodic results like the above, generating apparent randomness while being completely deterministic.
A popular use of L-systems is to apply them to turtle graphics. To do so, each symbol in the L-system corresponds to an instruction to a turtle in 2D space, such as “move forward one unit” or “turn left 30 degrees.” Following the instructions sequentially from one string and plotting the turtle’s trajectory in space produces a drawing. Organic-looking fractals may result.
In “Non-Standard Sound Synthesis with L-Systems” as well as his more detailed 2006 master’s thesis Musical L-Systems, Manousakis proposes a lot of different possibilites to explore, but the basic idea is that he’s sonifying turtle graphics produced by L-systems and mapping the turtle’s state over time (position, orientation, etc.) to synthesis parameters. These L-systems operate at different time scales, from microsound to the form of an entire piece.
I’ve been working on a big project using Blender. More on that in a future post, but here’s a preview. The project has gotten me into a 3D art headspace, and I took a detour recently from this major project to work on a small one.
Blender isn’t my first exposure to 3D. When I was a kid, I played around a lot with POV-Ray – an outdated program even back when I was using it, but still a fun piece of history. I also found out about something called TopMod, which is an obscure research 3D modeling program. I was interested in it primarily because of Bathsheba Grossman, who used it in a series of lovely steel sculptures based on regular polyhedra.
The core idea of TopMod is that, unlike other 3D modeling programs, meshes are always valid 2-manifolds, lacking open edges and other anomalies like doubled faces. This is ensured with a data structure called the Doubly Linked Face List or DLFL. In practice, TopMod is really a quirky collection of miscellaneous modeling algorithms developed by Ergun Akleman’s grad students. These features give a distinctive look to the many sculptures and artworks made with TopMod. I identify the following as the most important:
Subdivision surfaces. TopMod implements the well-known Catmull-Clark subdivision surface algorithm, which rounds off the edges of a mesh. However, it also has a lesser known subsurf algorithm called Doo-Sabin. To my eyes, Doo-Sabin has a “mathematical” look compared to the more organic Catmull-Clark.
Rind modeling. This feature makes the mesh into a thin crust, and punches holes in that crust according to a selected set of faces.
Curved handles. In this tool, the user selects two faces of the mesh, and TopMod interpolates between those two polygons while creating a loop-like trajectory that connects them. The user also picks a representative vertex from each of the two polygons. Selecting different vertices allows adding a “twist” to the handle.
The combination of these three features, as pointed out by Akleman et al., allows creating a family of cool-looking sculptures in just a few steps:
Start with a base polyhedron, often a Platonic solid.
Add various handles.
Perform one iteration of Doo-Sabin.
Apply rind modeling, removing contiguous loops of quadrilateral faces.
Perform one or more iterations of Catmull-Clark or Doo-Sabin.
(I would like to highlight step 3 to point out that while Doo-Sabin and Catmull-Clark look similar to each other in a final, smooth mesh, they produce very different results if you start manipulating the individual polygons they produce, and the choice of Doo-Sabin is critical for the “TopMod look.”)
TopMod has other features, but this workflow and variants thereof are pretty much the reason people use TopMod. The program also has the benefit of being easy to learn and use.
The catch to all this is that, unfortunately, TopMod doesn’t seem to have much of a future. The GitHub has gone dormant and new features haven’t been added in a long time. Plus, it only seems to support Windows, and experienced users know it crashes a lot. It would be a shame if the artistic processes that TopMod pioneered were to die with the software, so I looked into ways of emulating the TopMod workflow in Blender. Let’s go feature by feature.
First we have subdivision surfaces. Blender’s Subdivision Surface modifier only supports Catmull-Clark (and a “Simple” mode that subdivides faces without actually smoothing out the mesh). However, a Doo-Sabin implementation is out there, and I can confirm that it works in Blender 3.3. An issue is that this Doo-Sabin implementation seems to produce duplicated vertices, so you have to go to edit mode and hit Mesh -> Merge -> By Distance, or you’ll get wacky results doing operations downstream. This may be fixable in the Doo-Sabin code if someone wants to take a stab at it. Also worth noting is that this implementation of Doo-Sabin is an operator, not a modifier, so it is destructive.
EDIT: Turns out, Doo-Sabin can be done without an addon, using Geometry Nodes! This StackExchange answer shows how, using the setup in the image below: a Subdivide Mesh (not a Subdivision Surface) followed by a Dual Mesh. The Geometry Nodes modifier can then be applied to start manipulating the individual polygons in the mesh.
Rind modeling can be accomplished by adding a Solidify modifier, entering face select mode, and simply removing the faces where you want holes punched. An advantage over TopMod is that modifiers are nondestructive, so you can create holes in a piecemeal fashion and see the effects interactively. To actually select the faces for rind modeling, TopMod has a tool to speed up the process called “Select Face Loop;” the equivalent in Blender’s edit mode is entering face select mode and holding down Alt while clicking an edge.
Curved handles have no equivalent in Blender. There is an unresolved question on the Blender StackExchange about it. To compensate for this, I spent the past few days making a new Blender addon called blender-handle. It’s a direct port of code in TopMod and is therefore under the same license as TopMod, GPL.
My tool is a little awkward to use – it requires you to select two vertices and then two faces in that order. TopMod, by comparison, requires only two well-placed clicks to create a handle. I’m open to suggestions from more experienced Blender users on how to improve the workflow. That said, this tool also has an advantage over TopMod’s equivalent: parameters can be adjusted with real-time feedback in the 3D view, instead of having to set all the parameters prior to handle creation as TopMod requires. The more immediate the feedback, the more expressive an artistic tool is.
Installation and usage instructions are available at the README. blender-handle is 100% robust software entirely devoid of bugs of any kind, and does not have a seemingly intermittent problem where sometimes face normals are inverted.
The rest of this post will be a nerdy dive into the math of the handle algorithm, so I’ll put that after the break. Enjoy this thing I made in Blender with the above methods (dodecahedron base, six handles with 72-degree twists, Doo-Sabin, rind modeling, then Catmull-Clark):
Most of the music I produce doesn’t use any samples. I’m not prejudiced against sample-based workflows in any way; I just like the challenge of doing everything in SuperCollider, which naturally encourages synthesis-only workflows (as sample auditioning is far more awkward than it is in a DAW). But sometimes, it’s fun and rewarding to try a process that’s the diametric opposite of whatever you do normally.
Two things happened in 2019 that led me to the path of samples. The first was that I started making some mixes, solely for my own listening enjoyment, that mostly consisted of ambient and classical music, pitch shifted in Ardour so they were all in key and required no special transition work. I only did a few of these mixes, but with the later ones I experimented a bit with mashing up multiple tracks. The second was that I caught up to the rest of the electronic music fanbase and discovered Burial’s LPs. I was wowed by his use of Sound Forge, a relatively primitive audio editor.
Not long into my Burial phase, I decided I’d try making sample-based ambient music entirely in Audacity. I grabbed tracks from my MP3 collection (plus a few pirated via youtube-dl), used “Change Speed” to repitch them so they were all in key, threw a few other effects on there, and arranged them together into a piece. I liked the result, and I started making more tracks and refining my process.
Soon I hit on a workflow that I liked a lot. I call this workflow sound dumplings. A sound dumpling is created by the following process:
Grab a number of samples, each at least a few seconds long, that each fit in a diatonic scale and don’t have any strong rhythmic pulse.
Add a fade in and fade out to each sample.
Use Audacity’s “Change Speed” to get them all in a desired key. Use variable speed playback, not pitch shifting or time stretching.
Arrange the samples into a single gesture that increases in density, reaches a peak, then decreases in density. It’s dense in the middle – hence, dumpling.
Bounce the sound dumpling to a single track and normalize it.
The step that is most difficult is repitching. A semitone up is a ratio of 1.059, and a semitone down is 0.944. Memorize those and keep doing them until the sample sounds in key, and use 1.5 (a fifth up) and 0.667 (a fifth down) for larger jumps. It’s better to repitch down than up if you can – particularly with samples containing vocals, “chipmunk” effects can sound grating. Technically repeated applications of “Change Speed” will degrade quality compared to a single run, but embrace your imperfections. Speaking of imperfections, don’t fuss too much about the fades sounding unnatural. You can just hide it by piling on more samples.
Most sound dumplings are at least 30 seconds long. Once you have a few sound dumplings, it is straightforward to arrange them into a piece. Since all your sound dumplings are in the same key, they can overlap arbitrarily. I like to use a formal structure that repeats but with slight reordering. For example, if I have five sound dumplings numbered 1 through 5, I could start with a 123132 A section, then a 454 B section, then 321 for the recap. The formal process is modular since everything sounds good with everything.
Sound dumplings are essentially a process for creating diatonic sound collages, and they allow working quickly and intuitively. I think of them as an approach to manufacturing music as opposed to building everything up from scratch, although plenty of creativity is involved in sample curation. Aside from the obvious choices of ambient and classical music, searching for the right terms on YouTube (like “a cappella” and “violin solo”) and sorting by most recent uploads can get you far. In a few cases, I sampled my previous sound dumpling work to create extra-dense dumplings.
The tracks I’ve produced with this process are some of my favorite music I’ve made. However, like my mixes they were created for personal listening and I only share them with friends, so I won’t be posting them here. (EDIT: I changed my mind, see my music as Interstate Hydra.) If you do want to hear examples of sound dumpling music, I recommend checking out my friend Nathan Turczan’s work under the name Equuipment. He took my sound dumpling idea and expanded on it by introducing key changes and automated collaging of samples in SuperCollider, and the results are very colorful and interesting.
If you make a sound dumpling track, feel free to send it my way. I’d be interested to hear it.