Nathan Ho (Posts about effects)

Audio Effects with Wavesets and K-Means Clustering

Nathan Ho — Tue, 09 Apr 2024 12:13:06 GMT

(original)

(processed)

I would like to congratulate wavesets (not wavelets) for entering their 30th year of being largely ignored outside of a very small circle of computer music nerds. Introduced by Trevor Wishart in [Wishart1994] and popularized by Microsound [Roads2002] and the Composers Desktop Project, a waveset is defined as a segment of an audio signal between two consecutive upward zero crossings. For simple oscillators like sine and saw waves, wavesets divide the signal into pitch periods, but for general periodic signals there may be any number of wavesets per period. For signals containing noise or multiple pitches at once, waveset segmentation is completely unpredictable.

Many simple audio effects fall out of this idea. You can reverse individual wavesets, omit every other waveset, repeat each waveset, sort them, whatever.

I like waveset-based effects best on input signals that are monophonic (having only one pitch) and low in noise. Synthetic signals can make for particularly interesting results. Much as the phase vocoder tends to sound blurry and phasey, waveset transformations also have their own “house style” in the form of highly digital glitches and crackles. These glitches are particularly pronounced when a waveset-based algorithm is fed non-monophonic signals or signals containing strong high-frequency noise. Wavesets are extremely sensitive to any kind of prefiltering applied to the input signal; it’s a good idea to highpass filter the signal to block dc, and it’s fun to add pre-filters as a musical parameter.

Today, we’re putting a possibly new spin on wavesets by combining them with basic statistical learning. The idea is to perform k-means clustering on waveset features. The steps of the algorithm are as follows:

Segmentation: Divide a single-channel audio signal into \(N\) wavesets.
Analysis: Compute the feature vector \(\mathbf{x}_i\) for the \(i\)-th waveset. I use just two features: length \(\ell_i\), or number of samples between the zero crossings, and RMS \(r_i\) of the waveset’s samples. All the lengths are compiled into a single size-\(N\) vector \(\mathbf{\ell}\) and the RMSs into \(\mathbf{r}\).
Normalization: Scale \(\mathbf{\ell}\) and \(\mathbf{r}\) so that they each have variance 1.
Weighting: Scale \(\mathbf{\ell}\) by a weighting parameter \(w\), which controls how much the clustering stage emphasizes differences in length vs. differences in amplitude. We’ll talk more about this later, but \(w = 5\) seems to work as a start.
Clustering: Run k-means clustering on the feature vectors \(\mathbf{x}\), producing \(k\) clusters.
For each cluster, pick one representative waveset, the one closest to the centroid of the cluster.
Quantization: In the original audio signal, replace each waveset with the representative waveset from its cluster.

Implementation is very lightweight, clocking in at about 30 lines of Python with scikit-learn. There are only two parameters here other than the input audio: \(w\) and \(k\). [1] The length of the input audio signal is important too, so for musical reasons let’s not think in terms of \(k\) but rather “clusters per second” \(c\), which is \(k\) divided by the signal length in seconds. As we will see, with only two-dimensional control we can produce a tremendous variety of sounds.

Exploration

I started with an a cappella cover of “Stay With Me” by a singer named Aiva, which I use often for testing monophonic algorithms. This segment is 17 seconds long, 44.1 kHz — as I mentioned, the length of the audio file is important.

This file splits into about 26,000 wavesets. Running the algorithm with \(w = 5.0\) and \(c = 4.0\):

The pitch content is retained, but the voice is now speckled with all kinds of glitchy mutations. (As an aside, sklearn.cluster.KMeans crunches it in a fraction of a second, and I have no concerns with efficiency.)

It is always instructive to look at plots if we can, and in this case it is quite easy given that we have only two-dimensional data to plot. I’ve placed waveset duration on the X-axis and waveset RMS on the Y-axis, and assigned each cluster to a random color. (Sorry, some different clusters might have similar-looking colors.)

\(c\) is the most important parameter here — let’s reduce it, which changes the severity of the glitches.

\(c = 10.0\):

\(c = 5.0\):

\(c = 2.0\):

\(c = 1.0\):

\(c = 0.5\):

We also have \(w\) to tune, which affects the “warbliness” of pitch. Higher \(w\) values result in clusters forming more vertically, while lower values result in more horizontal stripes. For the sake of space, I will not demonstrate this tuning here; I don’t find it a particularly musical parameter to tune and prefer to leave it at 5.

Trying on different audio files

This algorithm is highly signal-dependent, and worth trying on a wide variety of audio files. Here are just a few. The morphology of the 2D plot is particularly interesting.

Speech

Singing tends to hover around quantized pitches, but speech does not, and the clustering algorithm is aware of this. The effect on pitch is particularly interesting here, and very squelchy. (This is perhaps not a fair comparison because this file is considerably shorter than the sung example.)

(original)

(processed)

Percussion

Manhattan Transfer’s “Soul Food to Go” sounds like a Clipping beat.

(original)

(processed)

Full mix

Scott Krippayne’s “I’m Not Cool” turns into digital soup.

(original)

(processed)

Discussion

No question this could be done in real time, by performing k-means on a sliding window of the last N seconds of wavesets. There is considerable literature on “streaming k-means” algorithms [Ailon2009], but for our application the accuracy of clustering is not too important and simply re-running an iteration step of the standard k-means algorithm might work fine. Now imagine filtering it with EQ to screw up the zero crossings, and routing it back into itself with a delay.

I messed a bit with expanding the contextual information of the feature vectors, adding the features of nearby wavesets (with reduced weights so they have less impact on the clustering). However, I was pleased enough with the context-free wavesets that I decided not to push this. I also feel there are other approaches to context that are perhaps more immediately musical. You could for example train a Markov chain on the sequenced wavesets.

This use of clustering is known as vector quantization (VQ), where a vector is replaced with the closest one in a “codebook” of vectors. A very important application of VQ is compression, so it appears commonly in lossy codecs, including audio codecs where it is often used to quantize speech parameters (see Code-Excited Linear Prediction).

Another precedent for what we’re doing here is concatenative synthesis [2], whose standard pipeline involves segmenting audio, analyzing the segments to produce feature vectors, and performing any manner of statistical operations on said feature vectors. Maybe I didn’t look hard enough, but I was unable to find anyone combining wavesets with concatenative synthesis. I did find that a lot of concatenative synthesizers pile on all sorts of fancy machine listening features. There’s nothing wrong with doing that, but the success of the method presented here demonstrates that a “dumb” algorithm — in this case an elementary segmentation method and just two features — is fully capable of sounding distinctive and musical.

Footnotes

References

[Wishart1994]

Wishart, Trevor. 1994. Audible Design.

[Roads2002]

Roads, Curtis. 2002. Microsound.

[Ailon2009]

Ailon, Nir et al. 2009. “Streaming k-means approximation.” https://proceedings.neurips.cc/paper/2009/hash/4f16c818875d9fcb6867c7bdc89be7eb-Abstract.html

Audio Effects with Cepstral Processing

Nathan Ho — Fri, 24 Nov 2023 02:02:56 GMT

Much like the previously discussed wavelet transforms, the cepstrum is a frequency-domain method that I see talked about a lot in the scientific research literature, but only occasionally applied to the creative arts. The cepstrum is sometimes described as “the FFT of the FFT” (although this is an oversimplification since there are nonlinear operations sandwiched in between those two transforms, and the second is really the Discrete Cosine Transform). In contrast to wavelets, the cepstrum is very popular in audio processing, most notably in the ubiquitous mel-frequency cepstral coefficients (MFCCs). Some would not consider the MFCCs a true “cepstrum,” others would say the term “cepstrum” is broad enough to encompass them. I have no strong opinion.

In almost all applications of the cepstrum, it is used solely for analysis and generally isn’t invertible. This is the case for MFCCs, where the magnitude spectrum is downsampled in the conversion to the mel scale, resulting in a loss of information. Resynthesizing audio from the cepstral descriptors commonly used in the literature is an underdetermined problem, usually tackled with machine learning or other complex optimization methods.

However, it is actually possible to implement audio effects in the MFCC domain with perfect reconstruction. You just have to keep around all the information that gets discarded, resulting in this signal chain:

Take the STFT. The following steps apply for each frame.
Compute the power spectrum (square of magnitude spectrum) and the phases.
Compute a bank of bandpass filters on the power spectrum, equally spaced on the mel-frequency scale. This is the mel spectrum, and it downsamples the magnitude spectrum, losing information.
Upsample the mel spectrum back up to full spectral envelope. Divide the magnitude spectrum by the envelope to produce the residual spectrum. (You have to add a little epsilon to the envelope to prevent zero division.)
Compute the logarithm and then the Discrete Cosine Transform of the mel spectrum to produce the MFCCs.
Perform any processing desired.
Invert step 5: take the inverse DCT and then the exponent to produce the mel spectrum.
Invert step 4: upsample the mel spectrum to the spectral envelope, and multiply it by the residual spectrum to produce the power spectrum.
Recombine the power spectrum with the phases to produce the complex spectrum.
Inverse FFT, then overlap-add to resynthesize the signal.

It’s a lot of steps, but as an extension of the basic MFCC algorithm, it’s not that much of a leap. I would not be surprised if someone has done this before, storing all residuals when computing the MFCCs so the process can be inverted, but I had difficulty finding prior work on this for the particular application of musical effects. Something similar is done in MFCC-based vocoders, where the “residual spectrum” instead replaced with speech parameters such as pitch, but I haven’t seen this done on general, non-speech signals.

I will be testing on the following mono snippet of Ed Sheeran’s “Perfect.” (If you plan on doing many listening tests on a musical signal, never use a sample of music you enjoy.)

As for the parameters: mono, 48 kHz sample rate, 2048-sample FFT buffer with Hann window and 50% overlap, 30-band mel spectrum from 20 Hz to 20 kHz.

Cepstral EQ

Because of the nonlinearities involved in the signal chain, merely multiplying the MFCCs by a constant can do some pretty strange things. Zeroing out all MFCCs has the effect of removing the spectral envelope and whitening the signal. The effect on vocal signals is pronounced, turning Ed into a bumblebee.

Multiplying all MFCCs by 2 has a subtle, hollower quality, acting as an expander for the spectral envelope.

MFCCs are signed and can also be multiplied by negative values, which inverts the phase of a cosine wave component. The effect on the signal is hard to describe:

We can apply any MFCC envelope desired. Here’s a sine wave:

Cepstral frequency shifting

Technically this would be “quefrency shifting.” This cyclically rotates the MFCCs to brighten the signal:

And here’s the downward equivalent:

Cepstral frequency scaling

Resampling the MFCCs sounds reminiscent of formant shifting. This is related to the time-scaling property of the Fourier transform: if you resample the spectrum, you’re also resampling the signal. Here’s upward scaling:

Here’s downward scaling:

Cepstral time-based effects

Here’s what happens when we freeze the MFCCs every few frames:

Lowpass filtering the MFCCs over time tends to slur speech:

Stray thoughts

I have barely scratched the surface of cepstral effects here, opting only to explore the most mathematically straightforward operations. That the MFCCs produce some very weird and very musical effects, even with such simple transformations, is encouraging.

In addition to playing with additional types of effects, it is also worthwhile to adjust the trasforms being used. The DCT as the space for the spectral envelope could be improved on. One (strange) possibility that came to mind is messing with the Multiresolution Analysis of the mel spectrum; I have no idea if that would sound interesting or not, but it’s worth a shot.

It’s possible to bypass the MFCCs and just do the DCT of the log-spectrogram. I experimented with this and found that I couldn’t get it to sound as musical as the mel-based equivalent. I believe this is because the resolution of the FFT isn’t very perceptually salient. The mel scale is in fact doing a lot of heavy lifting here.

Audio Texture Resynthesis

Nathan Ho — Tue, 25 Apr 2023 19:58:19 GMT

Left: spectrogram of a child singing. Right: spectrogram of resynthesized audio.

Background

I was alerted to audio texture resynthesis methods by a student of mine who was interested in the collaborative work of researcher Vincent Lostanlen, musician Florian Hecker, and several others [Lostanlen2019] [Lostanlen2021] [Andén2019] [Muradeli2022]. Their efforts are built on an analysis method called “Joint Time-Frequency Scattering” (JTFS) based on the Continuous Wavelet Transform. In an attempt to understand the work better, I binged a wavelet transform textbook, [1] implemented a simplified version of JTFS-based resynthesis, and and briefly exchanged emails with Lostanlen. His helpful answers gave me the impression is that while JTFS is a powerful analysis technique, resynthesis was more of a side project and there are ways to accomplish similar effects that are more efficient and easier to code without compromising too much on musicality.

Audio texture resynthesis has some history in computer music literature [Schwartz2010], and some researchers have used resynthesis to help understand how the human brain processes audio [McDermott2011].

After some experimentation with these methods, I found that it’s not too hard to build a simple audio texture resynthesizer that exhibits clear musical potential. In this blog post, I’ll walk through a basic technique for making such a system yourself. There won’t be any novel research here, just a demonstration of a minimum viable resynthesizer and my ideas on how to expand on it.

Algorithm

The above-mentioned papers have used fancy techniques including the wavelet transform and auditory filter banks modeled after the human ear. However, I was able to get decent results with a standard STFT spectrogram, then using phase reconstruction to get time-domain audio samples. The full process looks like this:

Compute a magnitude spectrogram \(S\) of the time-domain input signal \(x\). A fairly high overlap is advised.
Compute any number of feature vectors \(F_1(S),\, F_2(S),\, \ldots,\, F_n(S)\) and define their concatenation as \(F(S)\).
Initialize a randomized magnitude spectrogram \(\hat{S}\).
Use gradient descent on \(\hat{S}\) to minimize the error \(E(\hat{S}) = ||F(S) - F(\hat{S})||\) (using any norm such as the squared error).
Use phase reconstruction such as the Griffin-Lim algorithm on \(\hat{S}\) to produce a resynthesized signal \(\hat{x}\).

The cornerstone of making this algorithm work well is that we choose an \(F(S)\) that’s differentiable (or reasonably close). This means that the gradient \(\nabla E\) can be computed with automatic differentiation (classical backpropagation). As such, this algorithm is best implemented in a differentiable computing environment like PyTorch or Tensorflow.

The features \(F(S)\), as well as their relative weights, greatly affect the sound. If \(F(S)\) is highly time-dependent then the resynthesized signal will mimic the original in evolution. On the other hand, if \(F(S)\) does a lot of pooling across the time axis then the resynthesized signal will mostly ignore the large-scale structure of the input signal. I’m mostly interested in the latter case, where \(F(S)\) significantly “remixes” the input signal and disregards the overall structure of the original.

We will represent \(S\) as a 2D tensor where the first dimension is frequency and the second is time. As a matrix, each row is an FFT bin, and each column a frame.

If using a fancy alternative to the magnitude spectrogram such CWT or cochlear filter banks, you may have to do gradient descent all the way back to the time-domain samples \(x\). These analysis methods break down to linear frequency transforms that produce complex numbers followed by computing the absolute value of each bin, so differentiability is maintained.

Negative Compression

Nathan Ho — Thu, 23 Feb 2023 17:59:46 GMT

One blog post I’ve been meaning to write for a while is a comprehensive review of the design of dynamic range compressors and limiters, both digital and analog. Textbook compressor designs can be easily found, but like reverbs there are lots of weird little tricks from both hardware and software designs that supposedly define the distinctive musical character of different compressors. It may be a while before I finish that post because, while I’ve read a lot about the DSP of compressors, I don’t feel yet qualified to write on design. I haven’t yet designed a compressor plugin that I’m happy with, nor done a lot of compressor wine tasting, and the musical and psychoacoustic aspects of compressors are to me at least as important as the signal math.

Nevertheless, there’s a weird corner of compressor design that I feel inspired to talk about, and it’s called negative compression. It’s a feature of a few commercial compressors; I’m not sure which was the first, but I first learned about the concept from Klanghelm DC1A. Negative comp is the source of considerable confusion – just watch the Gearspace pundits go at it.

The brief description is that a standard compressor, upon receiving a signal with increasing amplitude, will reach a point where the output amplitude will increase at a slower rate. If the compressor is a perfect limiter, the output amplitude will hit a hard limit and refuse to increase. A negative compressor takes it further – the output signal will eventually get quieter over time as the amplitude increases. If you feed a percussive signal into a negative compressor and drive it hard enough, it will punch a hole in the signal’s amplitude, and can split a transient in two. It can be a pretty bizarre effect, and seems underutilized.

This explanation should be enough for most, but you know this blog. We do the math here. In this post, I will explain the basic mathematics of compressors to demystify negative compression, propose variants of negative compressors, and demonstrate how to do negative compression in SuperCollider.

A Closer Look at Clarence Barlow's ISIS

Nathan Ho — Sun, 15 Jan 2023 22:37:23 GMT

In 2005, Clarence Barlow published a paper on Intra-Samplar Interpolating Sinusoids (ISIS), an audio analysis-synthesis algorithm. It did not make much of a splash, with the paper having only 7 citations in Google Scholar. Nevertheless, it produces some interesting sounds, so let’s dive into it.

First, some context. The precursor to ISIS is a technique Barlow calls spectastics. In this method, the short-time Fourier transform is computed of an audio signal, and at each frame, the magnitude spectrum is resampled logarithmically to 12EDO and used as a probability distribution to select a pitch. The pitch sequence forms an extremely rapid melody, which can be synthesized or played on a robotic instrument. Barlow describes the spectasized melody as “remarkably like the original sound recording.”

In ISIS, this concept is taken to an extreme by making the “melody” a constant amplitude sine wave whose frequency is changed every sample. Given a digital signal that doesn’t exceed ±1, we can interpolate between any two successive samples with a partial cycle of a sine wave. An image helps here; the dots show the sampled digital signal.

For example, if the samples are 0 and 1, as seen in the first two samples in the image, we can interpolate with a quarter sine wave with a period of 4 samples and a frequency of 1/4th the sample rate. There are actually infinitely many ways to do this interpolation. For example, from 0 to 1 we can also have a sine wave that completes 5/4ths of a cycle. We can even assume that the initial phase of the sine wave is \(\pi\) and the final phase \(5\pi/2\), or have the sine wave going backwards with phase ramping from 0 to \(-3\pi/2\).

To resolve the ambiguity, ISIS restricts the frequency to be always nonnegative so the phase never goes backward, and always assumes the phase at every sample point is in the range \([-\pi/2, \pi/2]\) modulo \(2\pi\). There are other approaches, such as picking the minimum possible nonnegative frequency or the frequency of minimum absolute value, which may produce interesting alternative sounds. I won’t get into these (this is a relatively low-effort post), but feel free to try them out.

Resource: "The Tube Screamer's Secret"

Nathan Ho — Tue, 23 Aug 2022 16:50:01 GMT

A few years ago I bookmarked Boğaç Topaktaş’ 2005 article titled “The Tube Screamer’s Secret,” but today I was dismayed to discover that the domain had expired. This ensures that the page is now nearly impossible to find unless you already know the URL. I don’t normally make posts that are just a link to a third party, but this valuable resource might be forgotten otherwise. Here’s the page in the Wayback Machine:

https://web.archive.org/web/20180127031808/http://bteaudio.com/articles/TSS/TSS.html

Integer Ring Modulation

Nathan Ho — Thu, 31 Mar 2022 22:34:42 GMT

When I think of ring modulation – or multiplication of two bipolar audio signals – I usually think of a complex, polyphonic signal being ring modulated by an unrelated sine wave, producing an inharmonic effect. Indeed, this is what “ring modulator” means in many synthesizers’ effect racks. I associate it with early electronic music and frankly find it a little cheesy, so I don’t use it often.

But if both signals are periodic and their frequencies are small integer multiples of a common fundamental, the resulting sound is harmonic. Mathematically this is no surprise, but the timbres you can get out of this are pretty compelling.

I tend to get the best results from pulse waves, in which case ring modulation is identical to an XOR gate (plus an additional inversion). Here’s a 100 Hz square wave multiplied by a second square wave that steps from 100 Hz, 200 Hz, etc. to 2000 Hz and back.

As usual, here is SuperCollider code:

(
{
    var freq, snd;
    freq = 100;
    snd = Pulse.ar(freq) * Pulse.ar(freq * LFTri.ar(0.3, 3).linlin(-1, 1, 1, 20).round);
    snd ! 2;
}.play(fadeTime: 0);
)

Try pulse-width modulation, slightly detuning oscillators for a beating effect, multiplying three or more oscillators, and filtering the oscillators prior to multiplication. There are applications here to synthesizing 1-bit music.

Credit goes to Sahy Uhns for showing me this one some years ago.

EDIT 2023-01-12: I have learned that Dave Rossum used this technique in Trident, calling it “zing modulation.” See this YouTube video.

Moisture Bass

Nathan Ho — Sat, 18 Sep 2021 01:34:03 GMT

If you haven’t heard of the YouTube channel Bunting, it gets my strong recommendation. Bunting creates excellent style imitations of experimental bass music artists and breaks them down with succinct explanations. Notable is his minimal tooling: he uses mostly Ableton Live stock plugins and the free and open source wavetable synth Vital.

His latest tutorial, mimicking the style of the artist Resonant Language, contains several bass sounds with a property he calls “moisture” (timestamp). These bass sounds are created by starting with a low saw wave, boosting the highs, and running the result through Ableton Live’s vocoder set on “Modulator” mode. According to the manual, this enables self-vocoding, where the same signal is the modulator and carrier. An abstract view of a vocoder would suggest that this does little or nothing to the saw wave other than change its spectral tilt, but the reality is much more interesting. Hear for yourself an EQ’d saw wave before and after self-vocoding:

A closer inspection of the latter waveform shows why the self-vocoded saw sounds the way it does. Here’s a single pitch period:

The discontinuity in the saw signal is decorated with a chirp, or a sine wave that rapidly descends in frequency. This little 909 kick drum every pitch period is responsible for the “moisture” sound. Certainly there have been no studies on the psychoacoustics of moisture bass (for lack of a better term), but I suspect that it mimics dispersive behavior, lending a vaguely acoustic sound.

The chirp originates from the bandpass filters in the vocoder. The frequencies of the vocoder are exponentially spaced, so the bandpass filters have to increase in bandwidth for higher frequencies to cover the gaps. Larger bandwidth means lower Q, and lower Q reduces the ring time in the filter’s impulse response. The result is that low frequencies ring longer when the vocoder is pinged and high frequencies ring shorter. Mix them all together, and you have an impulse response resembling a chirp.

Self-vocoding with exponentially spaced bands is clever, but it isn’t the only way to create this effect. One option is to eliminate the vocoding part and use only the exponentially spaced bandpass filters, like an old-school filter bank. This sounds just like self-vocoding but requires fewer bandpass filters to work. In my experiments, I found that putting the resulting signal through nonlinear distortion is necessary to bring out the moisture property.

A more direct approach is to use wavefolding on a curved signal. The slope of the input signal controls the rate that it scrubs through wavefolding function, and thus controls the frequency of the resulting triangle wave. By modulating the slope from high in absolute value down to zero, a triangle wave descending in frequency is created. This is best explained visually:

And here’s how it sounds:

Chip Fuzzing Synthesis

Nathan Ho — Sun, 22 Nov 2020 08:00:00 GMT

I’m unsure whether I read about this or dreamt it (this year has been a blur) but I recall someone fuzzing a retro sound chip, most likely the Yamaha OPL3, by sending it random bits for its synthesis parameters and recording the output. Drawing from this, we can explore “chip fuzzing synthesis,” the art of feeding total digital randomness into a synthesis algorithm and seeing what comes out.

There is no specific need for a real retro sound chip or even an accurate emulation of one, but it helps to understand how some old sound chips operate to look for inspiration. As an example, we can look at the Commodore 64’s SID. This chip is an analog subtractive synthesizer, providing three oscillators with frequency inputs, waveform selection (saw, pulse, triangle, noise), and ADSR envelope generators, all mixed into a filter with controllable cutoff and famously nonfunctioning resonance.

The parameters of the SID are controlled by an internal set of 32 8-bit registers, which are written to using a 5-bit parallel address bus and an 8-bit parallel data bus. In C-like pseudocode, communication with the SID can be emulated like so: [1]

char sidRegisters[32];

// Parallel ports used to communicate with SID.
char addressBus = 0;
char dataBus = 0;

void writeSid(char address, char value)
{
    addressBus = address & 31;
    dataBus = value;
    sidRegisters[addressBus] = dataBus;
}

The SID interprets the sidRegisters array and maps various bits and bytes to analog synth parameters. For example, registers 0 and 1, taken as a 16-bit integer, control the frequency of an oscillator, and individual bits in register 4 select the waveform and enable ring modulation and hard sync.

Fuzzing the address and data buses is the equivalent of calling writeSid repeatedly with randomized address and value, writing random data to random registers. The exact rate at which random data is written is up to you. I find that slow randomization produces the most coherent results and has the least chance of turning the output into white noise. A few hundred times a second is a good start.

It also suffices to take a simpler route and feed high-frequency random noise (sample-and-hold, maybe) into every parameter of a synth. Again, we don’t need a vintage emulation at all – a minimal subtractive monosynth with waveform selection, ADSR envelope, and a few switchable filter types is adequate to get glitchy sounds. So here’s a little patch:

This is sonically uncompromising (and hey, maybe that’s your thing), but still makes a useful raw source for more polished sound design. Here’s the same patch as above with some minor modifications and a lot of post-effects like granulators, distortion, and reverb:

The outcome of chip fuzzing synthesis is highly dependent on the choice of synthesis algorithm, the set of parameters, and the ranges for said parameters. I can imagine fuzzing FM, subtractive, additive, physical modelling, and parameters of an effects chain. The more inputs to fuzz, the better – especially inputs that switch features on and off, exhibit complex interactions with other inputs, and/or unearth bugs and artifacts.

Low Battery Audio Effects

Nathan Ho — Mon, 25 May 2020 07:00:00 GMT

Searching YouTube for videos of low battery toys and keyboards brings up results like “Demon Possessed Singing Trout.” Please watch the video before proceeding.

I have a limited understanding of electronics, but a compulsory need to explain this phenomenon due to tech blogger ego syndrome. A low battery has an abnormally high internal resistance, causing its voltage to sag in response to the loads it’s supporting. If it’s powering multiple things, they will interact in strange ways. The distorted audio from the singing fish sounds like the clock rate is dropping in reaction to the load of the speaker. (The servo motors might also be causing voltage sag, although it isn’t entirely clear from the video.)

The speaker/clock interaction is interesting since it works in a feedback loop: the clock controls the playback rate, and the amplitude of the output audio draws current that affects the clock. This inspires a general method for turning an audio algorithm into a “low battery” version:

Run a DSP algorithm such as sample playback, synthesizer, effect, etc. that can be operated at a variable clock rate.
Apply filters like a full-wave rectifier, envelope follower, or simple lowpass to simulate speaker load. Optional.
Apply a highpass filter to block dc. (This helps prevent the algorithm from getting stuck.)
Use this signal to control the clock rate of the DSP algorithm, so that a signal of higher amplitude lowers the clock rate.

The casual experiments I’ve done with this are promising. At subtle settings, this creates wandering, droopy pitch bends. Pushed to the extreme, it produces squelchy signal-dependent distortion. I especially like its effect on percussive signals, where louder transients are stretched out and any rhythmic pulse becomes irregular. I’m imagining software plugins that emulate digital hardware could be augmented with a “battery” knob that lets the user control how much the clock rate sags in response to the output signal.