The Duration Trick

Nathan Ho

2023-07-04

The Duration Trick is something I was egotistical enough to believe I discovered, but after a recent conversation with William Fields (a musical hero of mine) I have learned that I’m in no way the first to come across it. Both Fields and the Max/MSP Shop Boys have been using something like this for a while, I’m told. Learning about this case of convergent evolution spurred me to bump up this already-planned post in the queue.

Simply put, the Duration Trick is when a synthesizer patch with discrete note on/off events is given advance notice of the duration of each note. Thus, if the sequencer is modeled as sending messages to a synthesizer:

Every “note on” message is accompanied with an anticipated duration.

It’s possible for such a patch to exactly anticipate the ending of the note, so note offs don’t even need to be transmitted, although I like to give the option to cut off the note prematurely. Additionally, duration can function as a velocity-like parameter that impacts other synthesis parameters such as amplitude or brightness, so short and long notes differ in more ways than just timing.

Imagine a monophonic subtractive synthesis patch with a lowpass filter that gradually opens for each note on. Traditionally, the lowpass filter’s trajectory is independent of the note duration, and may run its course or be cut short:

With duration information, it’s possible to guarantee that the lowpass filter reaches a precise goal at the end of each note:

I find the second example slightly more exciting in this simple demonstration. For a more complex example, jump about an hour into a video session I uploaded in May. The Duration Trick may sound like a small change at first, but it had a pretty radical impact on my music and sound design when I started using it. It shines especially for transitional sweeps that need to arrive right on time. Arguably, anyone who drags a reverse cymbal sample leading up to a drop is in a sense using the Duration Trick.

Note off events in MIDI can arrive at any time, so the Duration Trick isn’t achievable with standard traditional synthesizer hardware in the absence of some CC-based hack. (This is one of many reasons that pigeonholing everything into MIDI events has had long-term negative effects on music tech, but I digress.) The Duration Trick is therefore easiest to implement in one of the “nerd” music software environments like Csound, SuperCollider, etc., particularly anything that permits scripting. The trick is possible in a real-time context, but the sequencer must of course be able to look ahead far enough to know the durations at all, so it’s more semi-real-time than fully real-time. Durations are always available for music sequenced offline, and are generally available in algorithmic composition as well.

Musicians who play or sing melodies generally don’t think in individual note ons and offs, but rather phrases and gestures if not something higher-level. Even the most reactive and on-the-fly improvisations often require calculating at a few notes ahead, and this will impact subtleties of playing style. The Duration Trick alone doesn’t capture the complexities of musicians playing acoustic instruments, but it still appears to be a valuable stepping stone to breathing some more life into a synth patch.

Correlated Granular Synthesis

Nathan Ho

2023-06-04

Decades after Curtis Roads’ Microsound, granular synthesis is making appearances here and there in the commercial plugin market. While it’s nice to see a wider audience for left-field sound design, I have my quibbles with some of the products out there. From what I’ve heard, so many of these products’ demos are covered in reverb in obvious compensation for something, showing that the plugins seem most suited for background textures and transitional moments. In place of sound, the developers seem to prioritize graphics — does watching 3D particles fly around in a physics simulation inspire the process of music production, or distract from it?

Finally, and most importantly, so many granular “synths” are in fact samplers based on buffer playback. The resulting sound is highly dependent on the sampled source, almost more so than the granular transformations. Sample-based granular (including sampling live input such as in Ableton Live’s Grain Delay) is fun and I’ve done it, but in many ways it’s become the default approach to granular. This leaves you and me, the sound design obsessives, with an opportunity to explore an underutilized alternative to sampled grains: synthesized grains.

This post introduces a possibly novel approach to granular synthesis that I call Correlated Granular Synthesis. The intent is specifically to design an approach to granular that can produce musical results with synthesized grains. Sample-based granular can also serve as a backend, but the idea is to work with the inherent “unflattering” quality of pure synthesis instead of piggybacking off the timbres baked into the average sample.

Correlated Granular Synthesis is well suited for randomization in algorithmic music context. Here’s a random sequence of grain clouds generated with this method:

A Preliminary Theory of Sound Design

Nathan Ho

2023-05-24

This post is my attempt at explaining my own philosophy of sound design. It’s not in final form, and subject to amendment in the future.

The type of sound design I refer to is specific to my own practice: the creation of sounds for electronic music, especially experimental music, and especially music produced using pure synthesis as opposed to recorded or sampled sound. These ideas presented might have broader applications, but I have no delusions that they’re in any way universal.

The theory I expound on isn’t in the form of constraints or value judgements, but rather a set of traits. Some are general, some specific, and my hope is that considering how a sound or a piece relates to these traits will take me (and possibly you) in some directions not otherwise considered. Some of the traits contain assertions like “if your sound does X, your audience will feel Y,” which in a composition may be employed directly, carefully avoided, or deconstructed. You’ll also note that the theory is concerned mostly with the final product and its impact on the listener, not so much the compositional or technical process. (Friends and collaborators are well aware that I use highly idiosyncratic and constrained processes, but those are less about creating music and more about “creating creating music.”)

No theory of sound design will replace actually working on sound design. Sound design isn’t a spectator sport, nor a cerebral exercise, and it has to be practiced regularly like a musical instrument. Reading this post alone is unlikely to make you a better sound designer, but if it’s a useful supplement to time spent in the studio, I’d consider this work of writing a success.

I will deliberately avoid talking about the topics of melody, harmony, counterpoint, consonance vs. dissonance, and tuning systems, and I’ll only talk about rhythm abstractly. There are many existing resources dedicated to these topics in a wide variety of musical cultures.

Audio Texture Resynthesis

Nathan Ho

2023-04-25

Spectrograms of the audio signals later in the post.

Left: spectrogram of a child singing. Right: spectrogram of resynthesized audio.

Background

I was alerted to audio texture resynthesis methods by a student of mine who was interested in the collaborative work of researcher Vincent Lostanlen, musician Florian Hecker, and several others [Lostanlen2019] [Lostanlen2021] [Andén2019] [Muradeli2022]. Their efforts are built on an analysis method called “Joint Time-Frequency Scattering” (JTFS) based on the Continuous Wavelet Transform. In an attempt to understand the work better, I binged a wavelet transform textbook, [1] implemented a simplified version of JTFS-based resynthesis, and and briefly exchanged emails with Lostanlen. His helpful answers gave me the impression is that while JTFS is a powerful analysis technique, resynthesis was more of a side project and there are ways to accomplish similar effects that are more efficient and easier to code without compromising too much on musicality.

Audio texture resynthesis has some history in computer music literature [Schwartz2010], and some researchers have used resynthesis to help understand how the human brain processes audio [McDermott2011].

After some experimentation with these methods, I found that it’s not too hard to build a simple audio texture resynthesizer that exhibits clear musical potential. In this blog post, I’ll walk through a basic technique for making such a system yourself. There won’t be any novel research here, just a demonstration of a minimum viable resynthesizer and my ideas on how to expand on it.

Algorithm

The above-mentioned papers have used fancy techniques including the wavelet transform and auditory filter banks modeled after the human ear. However, I was able to get decent results with a standard STFT spectrogram, then using phase reconstruction to get time-domain audio samples. The full process looks like this:

Compute a magnitude spectrogram \(S\) of the time-domain input signal \(x\). A fairly high overlap is advised.
Compute any number of feature vectors \(F_1(S),\, F_2(S),\, \ldots,\, F_n(S)\) and define their concatenation as \(F(S)\).
Initialize a randomized magnitude spectrogram \(\hat{S}\).
Use gradient descent on \(\hat{S}\) to minimize the error \(E(\hat{S}) = ||F(S) - F(\hat{S})||\) (using any norm such as the squared error).
Use phase reconstruction such as the Griffin-Lim algorithm on \(\hat{S}\) to produce a resynthesized signal \(\hat{x}\).

The cornerstone of making this algorithm work well is that we choose an \(F(S)\) that’s differentiable (or reasonably close). This means that the gradient \(\nabla E\) can be computed with automatic differentiation (classical backpropagation). As such, this algorithm is best implemented in a differentiable computing environment like PyTorch or Tensorflow.

The features \(F(S)\), as well as their relative weights, greatly affect the sound. If \(F(S)\) is highly time-dependent then the resynthesized signal will mimic the original in evolution. On the other hand, if \(F(S)\) does a lot of pooling across the time axis then the resynthesized signal will mostly ignore the large-scale structure of the input signal. I’m mostly interested in the latter case, where \(F(S)\) significantly “remixes” the input signal and disregards the overall structure of the original.

We will represent \(S\) as a 2D tensor where the first dimension is frequency and the second is time. As a matrix, each row is an FFT bin, and each column a frame.

If using a fancy alternative to the magnitude spectrogram such CWT or cochlear filter banks, you may have to do gradient descent all the way back to the time-domain samples \(x\). These analysis methods break down to linear frequency transforms that produce complex numbers followed by computing the absolute value of each bin, so differentiability is maintained.

Opinionated Advice for SuperCollider Beginners

Nathan Ho

2023-03-16

I’m often asked where to find the best resources for learning SuperCollider. My informed and professional answer, which comes from over a decade of experience with this software as a user, developer, and educator, is as follows:

I’m not really sure?

The landscape of SC learning materials has changed a lot since I started in 2012/13. I’ve heard good things about Eli Fieldsteel’s video tutorials and Bruno Ruviaro’s “A Gentle Introduction” ebook, so I guess check those out. I haven’t perused their tutorials at length, but Eli and Bruno are both seasoned professionals at SC education, so you can’t really go wrong there. Also, I have my own SuperCollider Tips blog post (which I have just updated today), which is not a structured tutorial but addresses common beginner problems. But ultimately, the most efficient learning strategy really depends on what you intend to make, because the applications for SC are so diverse, as are the backgrounds of users coming to SC.

That’s probably not too enlightening. However, I do have a lot of advice for beginning SC users, drawing from my own tortuous path through learning to make electronic music.

What follows are a bunch of guidelines that I recommend beginning SC users follow, or at least consider. A good number of them are specific to my idiosyncratic approach to SC, which I’ve expounded on in my YouTube presence: heavy focus on sound design through synthesis, and only sporadic use of hardware and real-time interaction. Also, although some of my advice may help live coders, I’m pretty uninterested in live coding myself, so I can’t give specific advice there (creative coding I’m all for, but when I share my screen it’s for education, not seamless performance art).

Negative Compression

Nathan Ho

2023-02-23

One blog post I’ve been meaning to write for a while is a comprehensive review of the design of dynamic range compressors and limiters, both digital and analog. Textbook compressor designs can be easily found, but like reverbs there are lots of weird little tricks from both hardware and software designs that supposedly define the distinctive musical character of different compressors. It may be a while before I finish that post because, while I’ve read a lot about the DSP of compressors, I don’t feel yet qualified to write on design. I haven’t yet designed a compressor plugin that I’m happy with, nor done a lot of compressor wine tasting, and the musical and psychoacoustic aspects of compressors are to me at least as important as the signal math.

Nevertheless, there’s a weird corner of compressor design that I feel inspired to talk about, and it’s called negative compression. It’s a feature of a few commercial compressors; I’m not sure which was the first, but I first learned about the concept from Klanghelm DC1A. Negative comp is the source of considerable confusion – just watch the Gearspace pundits go at it.

The brief description is that a standard compressor, upon receiving a signal with increasing amplitude, will reach a point where the output amplitude will increase at a slower rate. If the compressor is a perfect limiter, the output amplitude will hit a hard limit and refuse to increase. A negative compressor takes it further – the output signal will eventually get quieter over time as the amplitude increases. If you feed a percussive signal into a negative compressor and drive it hard enough, it will punch a hole in the signal’s amplitude, and can split a transient in two. It can be a pretty bizarre effect, and seems underutilized.

This explanation should be enough for most, but you know this blog. We do the math here. In this post, I will explain the basic mathematics of compressors to demystify negative compression, propose variants of negative compressors, and demonstrate how to do negative compression in SuperCollider.

Interstate Hydra - Nocturnes

Nathan Ho

2023-01-16

Album cover for Nocturnes. Abstract design, nearly completely black with some indigo.

Exactly three years ago I made an ambient EP under the name “Interstate Hydra” and put it up on Bandcamp. Up until now I have kept this alias a secret and only shown these tunes to a few people, but I figured it’s been long enough.

Nocturnes was made entirely in Audacity using the “sound dumplings” method described in an earlier post. This was a total 180 from my usual workflow, which is to synthesize everything with SuperCollider code.

“For Ellie” is a single released months after Nocturnes, and is formally the simplest Interstate Hydra track, comprising a row of sound dumplings in arch form.

I don’t plan on returning to the Interstate Hydra alias for quite some time, but I hope you enjoy these tracks. Thank you for listening.

A Closer Look at Clarence Barlow's ISIS

Nathan Ho

2023-01-15

In 2005, Clarence Barlow published a paper on Intra-Samplar Interpolating Sinusoids (ISIS), an audio analysis-synthesis algorithm. It did not make much of a splash, with the paper having only 7 citations in Google Scholar. Nevertheless, it produces some interesting sounds, so let’s dive into it.

First, some context. The precursor to ISIS is a technique Barlow calls spectastics. In this method, the short-time Fourier transform is computed of an audio signal, and at each frame, the magnitude spectrum is resampled logarithmically to 12EDO and used as a probability distribution to select a pitch. The pitch sequence forms an extremely rapid melody, which can be synthesized or played on a robotic instrument. Barlow describes the spectasized melody as “remarkably like the original sound recording.”

In ISIS, this concept is taken to an extreme by making the “melody” a constant amplitude sine wave whose frequency is changed every sample. Given a digital signal that doesn’t exceed ±1, we can interpolate between any two successive samples with a partial cycle of a sine wave. An image helps here; the dots show the sampled digital signal.

Graph showing equally sampled points interpolated by a sine wave with rapidly varying frequency.

For example, if the samples are 0 and 1, as seen in the first two samples in the image, we can interpolate with a quarter sine wave with a period of 4 samples and a frequency of 1/4th the sample rate. There are actually infinitely many ways to do this interpolation. For example, from 0 to 1 we can also have a sine wave that completes 5/4ths of a cycle. We can even assume that the initial phase of the sine wave is \(\pi\) and the final phase \(5\pi/2\), or have the sine wave going backwards with phase ramping from 0 to \(-3\pi/2\).

To resolve the ambiguity, ISIS restricts the frequency to be always nonnegative so the phase never goes backward, and always assumes the phase at every sample point is in the range \([-\pi/2, \pi/2]\) modulo \(2\pi\). There are other approaches, such as picking the minimum possible nonnegative frequency or the frequency of minimum absolute value, which may produce interesting alternative sounds. I won’t get into these (this is a relatively low-effort post), but feel free to try them out.

An Approach to Sound Synthesis with L-Systems

Nathan Ho

2023-01-01

An early obsession of mine when I was first learning about music tech was a paper by Stelios Manousakis titled “Non-Standard Sound Synthesis with L-Systems,” published 2009 in Leonardo Music Journal. At the time I was a snob about making strictly “academic” electronic music (think Stockhausen or Xenakis) and viewed things like FM and subtractive as too plebian, so synthesis methods that branded themselves as “non-standard” were very alluring. If I could give my younger self some advice, I’d tell him that you have to build foundations as a sound designer first before you can get to the crazy experimental stuff. How are you going to make a cool alien soundscape from another dimension if you can’t make a nice snare from scratch? Every good post-tonal composer understands tonality, if only to know how to avoid it. You have to know what rules you’re breaking.

With more artistic maturity under my belt, I revisited Manousakis’ paper recently and still found it interesting, so I decided to do some riffing on it. In this post, I’ll quickly explain what L-systems are (there are lots of better explanations online) and walk through a complete, “minimum viable” system for generating sound with them. Sound examples are embedded.

L-systems and turtle graphics

Briefly, L-systems (short for “Lindenmayer systems”) generate a sequence of words (strings) using rewriting rules applied repeatedly to an initial word called the axiom. A classic L-system is given by the rules a → ab and b → a with the axiom a. At each iteration, each symbol in the word is replaced by the right-hand side of its corresponding rule. The first few iterations of this L-system are:

a
ab
aba
abaab
abaababa
abaababaabaab
abaababaabaababaababa

Most L-systems that are objects of study undergo approximately exponential growth in this way. L-systems can produce simple patterns like abababab as well as aperiodic results like the above, generating apparent randomness while being completely deterministic.

A popular use of L-systems is to apply them to turtle graphics. To do so, each symbol in the L-system corresponds to an instruction to a turtle in 2D space, such as “move forward one unit” or “turn left 30 degrees.” Following the instructions sequentially from one string and plotting the turtle’s trajectory in space produces a drawing. Organic-looking fractals may result.

In “Non-Standard Sound Synthesis with L-Systems” as well as his more detailed 2006 master’s thesis Musical L-Systems, Manousakis proposes a lot of different possibilites to explore, but the basic idea is that he’s sonifying turtle graphics produced by L-systems and mapping the turtle’s state over time (position, orientation, etc.) to synthesis parameters. These L-systems operate at different time scales, from microsound to the form of an entire piece.

TopMod, Blender, and Curved Handles

Nathan Ho

2022-12-06

I’ve been working on a big project using Blender. More on that in a future post, but here’s a preview. The project has gotten me into a 3D art headspace, and I took a detour recently from this major project to work on a small one.

Blender isn’t my first exposure to 3D. When I was a kid, I played around a lot with POV-Ray – an outdated program even back when I was using it, but still a fun piece of history. I also found out about something called TopMod, which is an obscure research 3D modeling program. I was interested in it primarily because of Bathsheba Grossman, who used it in a series of lovely steel sculptures based on regular polyhedra.

The core idea of TopMod is that, unlike other 3D modeling programs, meshes are always valid 2-manifolds, lacking open edges and other anomalies like doubled faces. This is ensured with a data structure called the Doubly Linked Face List or DLFL. In practice, TopMod is really a quirky collection of miscellaneous modeling algorithms developed by Ergun Akleman’s grad students. These features give a distinctive look to the many sculptures and artworks made with TopMod. I identify the following as the most important:

Subdivision surfaces. TopMod implements the well-known Catmull-Clark subdivision surface algorithm, which rounds off the edges of a mesh. However, it also has a lesser known subsurf algorithm called Doo-Sabin. To my eyes, Doo-Sabin has a “mathematical” look compared to the more organic Catmull-Clark.

Seven cubes joined together in a 3D + symbol. — Original mesh.

The above mesh smoothed out into an organic-looking figure comprising quadrilaterals. — Catmull-Clark subdivision surface.

The above mesh smoothed out into a figure looking like an assembly of octagonal rods. — Doo-Sabin subdivision surface.

Rind modeling. This feature makes the mesh into a thin crust, and punches holes in that crust according to a selected set of faces.

A cantellated truncated icosahedron with hexagonal and pentagonal faces highlighted in pink. — Original mesh with faces selected.

The same mesh but with holes punched into the hexagonal and pentagonal faces, revealing that it's a shell. — After rind modeling.

Curved handles. In this tool, the user selects two faces of the mesh, and TopMod interpolates between those two polygons while creating a loop-like trajectory that connects them. The user also picks a representative vertex from each of the two polygons. Selecting different vertices allows adding a “twist” to the handle.

The same cube with a twisted handle connecting two faces. — Handle added.

The combination of these three features, as pointed out by Akleman et al., allows creating a family of cool-looking sculptures in just a few steps:

Start with a base polyhedron, often a Platonic solid.
Add various handles.
Perform one iteration of Doo-Sabin.
Apply rind modeling, removing contiguous loops of quadrilateral faces.
Perform one or more iterations of Catmull-Clark or Doo-Sabin.

(I would like to highlight step 3 to point out that while Doo-Sabin and Catmull-Clark look similar to each other in a final, smooth mesh, they produce very different results if you start manipulating the individual polygons they produce, and the choice of Doo-Sabin is critical for the “TopMod look.”)

TopMod has other features, but this workflow and variants thereof are pretty much the reason people use TopMod. The program also has the benefit of being easy to learn and use.

The catch to all this is that, unfortunately, TopMod doesn’t seem to have much of a future. The GitHub has gone dormant and new features haven’t been added in a long time. Plus, it only seems to support Windows, and experienced users know it crashes a lot. It would be a shame if the artistic processes that TopMod pioneered were to die with the software, so I looked into ways of emulating the TopMod workflow in Blender. Let’s go feature by feature.

First we have subdivision surfaces. Blender’s Subdivision Surface modifier only supports Catmull-Clark (and a “Simple” mode that subdivides faces without actually smoothing out the mesh). However, a Doo-Sabin implementation is out there, and I can confirm that it works in Blender 3.3. An issue is that this Doo-Sabin implementation seems to produce duplicated vertices, so you have to go to edit mode and hit Mesh -> Merge -> By Distance, or you’ll get wacky results doing operations downstream. This may be fixable in the Doo-Sabin code if someone wants to take a stab at it. Also worth noting is that this implementation of Doo-Sabin is an operator, not a modifier, so it is destructive.

EDIT: Turns out, Doo-Sabin can be done without an addon, using Geometry Nodes! This StackExchange answer shows how, using the setup in the image below: a Subdivide Mesh (not a Subdivision Surface) followed by a Dual Mesh. The Geometry Nodes modifier can then be applied to start manipulating the individual polygons in the mesh.

A image of Blender's Geometry Nodes editor showing a Group Input connected to a Subdivide Mesh connected to a Dual Mesh connected to a Group Output.

Rind modeling can be accomplished by adding a Solidify modifier, entering face select mode, and simply removing the faces where you want holes punched. An advantage over TopMod is that modifiers are nondestructive, so you can create holes in a piecemeal fashion and see the effects interactively. To actually select the faces for rind modeling, TopMod has a tool to speed up the process called “Select Face Loop;” the equivalent in Blender’s edit mode is entering face select mode and holding down Alt while clicking an edge.

Curved handles have no equivalent in Blender. There is an unresolved question on the Blender StackExchange about it. To compensate for this, I spent the past few days making a new Blender addon called blender-handle. It’s a direct port of code in TopMod and is therefore under the same license as TopMod, GPL.

My tool is a little awkward to use – it requires you to select two vertices and then two faces in that order. TopMod, by comparison, requires only two well-placed clicks to create a handle. I’m open to suggestions from more experienced Blender users on how to improve the workflow. That said, this tool also has an advantage over TopMod’s equivalent: parameters can be adjusted with real-time feedback in the 3D view, instead of having to set all the parameters prior to handle creation as TopMod requires. The more immediate the feedback, the more expressive an artistic tool is.

Installation and usage instructions are available at the README. blender-handle is 100% robust software entirely devoid of bugs of any kind, and does not have a seemingly intermittent problem where sometimes face normals are inverted.

The rest of this post will be a nerdy dive into the math of the handle algorithm, so I’ll put that after the break. Enjoy this thing I made in Blender with the above methods (dodecahedron base, six handles with 72-degree twists, Doo-Sabin, rind modeling, then Catmull-Clark):

A render of a strange striped object with six symmetrical loops.