<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Nathan Ho (Posts about effects)</title><link>https://nathan.ho.name/</link><description></description><atom:link href="https://nathan.ho.name/categories/effects.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>© 2026</copyright><lastBuildDate>Thu, 07 May 2026 05:13:17 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Audio Effects with Wavesets and K-Means Clustering</title><link>https://nathan.ho.name/posts/wavesets-clustering/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;div&gt;&lt;img alt="/images/waveset-clustering/stay_with_me_5.0.png" src="https://nathan.ho.name/images/waveset-clustering/stay_with_me_5.0.png"&gt;
&lt;p&gt;&lt;audio controls src="https://nathan.ho.name/audio/waveset-clustering/stay_with_me_original.mp3"&gt;&lt;/audio&gt; (original)&lt;/p&gt;
&lt;p&gt;&lt;audio controls src="https://nathan.ho.name/audio/waveset-clustering/stay_with_me_5.0.mp3"&gt;&lt;/audio&gt; (processed)&lt;/p&gt;&lt;p&gt;I would like to congratulate wavesets (not &lt;a class="reference external" href="https://nathan.ho.name/posts/wavelets/"&gt;wavelets&lt;/a&gt;) for entering their 30th year of being largely ignored outside of a very small circle of computer music nerds. Introduced by Trevor Wishart in &lt;a class="citation-reference" href="https://nathan.ho.name/posts/wavesets-clustering/#wishart1994" id="citation-reference-1" role="doc-biblioref"&gt;[Wishart1994]&lt;/a&gt; and popularized by &lt;em&gt;Microsound&lt;/em&gt; &lt;a class="citation-reference" href="https://nathan.ho.name/posts/wavesets-clustering/#roads2002" id="citation-reference-2" role="doc-biblioref"&gt;[Roads2002]&lt;/a&gt; and the &lt;a class="reference external" href="https://composersdesktop.com/"&gt;Composers Desktop Project&lt;/a&gt;, a waveset is defined as a segment of an audio signal between two consecutive upward zero crossings. For simple oscillators like sine and saw waves, wavesets divide the signal into pitch periods, but for general periodic signals there may be any number of wavesets per period. For signals containing noise or multiple pitches at once, waveset segmentation is completely unpredictable.&lt;/p&gt;
&lt;p&gt;Many simple audio effects fall out of this idea. You can reverse individual wavesets, omit every other waveset, repeat each waveset, sort them, whatever.&lt;/p&gt;
&lt;p&gt;I like waveset-based effects best on input signals that are monophonic (having only one pitch) and low in noise. Synthetic signals can make for particularly interesting results. Much as the phase vocoder tends to sound blurry and phasey, waveset transformations also have their own “house style” in the form of highly digital glitches and crackles. These glitches are particularly pronounced when a waveset-based algorithm is fed non-monophonic signals or signals containing strong high-frequency noise. Wavesets are extremely sensitive to any kind of prefiltering applied to the input signal; it’s a good idea to highpass filter the signal to block dc, and it’s fun to add pre-filters as a musical parameter.&lt;/p&gt;
&lt;p&gt;Today, we’re putting a possibly new spin on wavesets by combining them with basic statistical learning. The idea is to perform &lt;a class="reference external" href="https://en.wikipedia.org/wiki/K-means_clustering"&gt;k-means clustering&lt;/a&gt; on waveset features. The steps of the algorithm are as follows:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Segmentation:&lt;/strong&gt; Divide a single-channel audio signal into &lt;span class="math"&gt;\(N\)&lt;/span&gt; wavesets.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Analysis:&lt;/strong&gt; Compute the feature vector &lt;span class="math"&gt;\(\mathbf{x}_i\)&lt;/span&gt; for the &lt;span class="math"&gt;\(i\)&lt;/span&gt;-th waveset. I use just two features: length &lt;span class="math"&gt;\(\ell_i\)&lt;/span&gt;, or number of samples between the zero crossings, and RMS &lt;span class="math"&gt;\(r_i\)&lt;/span&gt; of the waveset’s samples. All the lengths are compiled into a single size-&lt;span class="math"&gt;\(N\)&lt;/span&gt; vector &lt;span class="math"&gt;\(\mathbf{\ell}\)&lt;/span&gt; and the RMSs into &lt;span class="math"&gt;\(\mathbf{r}\)&lt;/span&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Normalization:&lt;/strong&gt; Scale &lt;span class="math"&gt;\(\mathbf{\ell}\)&lt;/span&gt; and &lt;span class="math"&gt;\(\mathbf{r}\)&lt;/span&gt; so that they each have variance 1.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Weighting:&lt;/strong&gt; Scale &lt;span class="math"&gt;\(\mathbf{\ell}\)&lt;/span&gt; by a weighting parameter &lt;span class="math"&gt;\(w\)&lt;/span&gt;, which controls how much the clustering stage emphasizes differences in length vs. differences in amplitude. We’ll talk more about this later, but &lt;span class="math"&gt;\(w = 5\)&lt;/span&gt; seems to work as a start.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Clustering:&lt;/strong&gt; Run k-means clustering on the feature vectors &lt;span class="math"&gt;\(\mathbf{x}\)&lt;/span&gt;, producing &lt;span class="math"&gt;\(k\)&lt;/span&gt; clusters.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For each cluster, pick one &lt;em&gt;representative waveset&lt;/em&gt;, the one closest to the centroid of the cluster.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; In the original audio signal, replace each waveset with the representative waveset from its cluster.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Implementation is very lightweight, clocking in at about 30 lines of Python with scikit-learn. There are only two parameters here other than the input audio: &lt;span class="math"&gt;\(w\)&lt;/span&gt; and &lt;span class="math"&gt;\(k\)&lt;/span&gt;. &lt;a class="brackets" href="https://nathan.ho.name/posts/wavesets-clustering/#footnote-1" id="footnote-reference-1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; The length of the input audio signal is important too, so for musical reasons let’s not think in terms of &lt;span class="math"&gt;\(k\)&lt;/span&gt; but rather “clusters per second” &lt;span class="math"&gt;\(c\)&lt;/span&gt;, which is &lt;span class="math"&gt;\(k\)&lt;/span&gt; divided by the signal length in seconds. As we will see, with only two-dimensional control we can produce a tremendous variety of sounds.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://nathan.ho.name/posts/wavesets-clustering/"&gt;Read more…&lt;/a&gt; (11 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>data science</category><category>dsp</category><category>effects</category><category>granular synthesis</category><category>machine learning</category><guid>https://nathan.ho.name/posts/wavesets-clustering/</guid><pubDate>Tue, 09 Apr 2024 12:13:06 GMT</pubDate></item><item><title>Audio Effects with Cepstral Processing</title><link>https://nathan.ho.name/posts/cepstrum/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;p&gt;Much like the previously discussed &lt;a class="reference external" href="https://nathan.ho.name/posts/wavelets/"&gt;wavelet transforms&lt;/a&gt;, the cepstrum is a frequency-domain method that I see talked about a lot in the scientific research literature, but only occasionally applied to the creative arts. The cepstrum is sometimes described as “the FFT of the FFT” (although this is an oversimplification since there are nonlinear operations sandwiched in between those two transforms, and the second is really the Discrete Cosine Transform). In contrast to wavelets, the cepstrum is very popular in audio processing, most notably in the ubiquitous mel-frequency cepstral coefficients (MFCCs). Some would not consider the MFCCs a true “cepstrum,” others would say the term “cepstrum” is broad enough to encompass them. I have no strong opinion.&lt;/p&gt;
&lt;p&gt;In almost all applications of the cepstrum, it is used solely for analysis and generally isn’t invertible. This is the case for MFCCs, where the magnitude spectrum is downsampled in the conversion to the mel scale, resulting in a loss of information. Resynthesizing audio from the cepstral descriptors commonly used in the literature is an underdetermined problem, usually tackled with machine learning or other complex optimization methods.&lt;/p&gt;
&lt;p&gt;However, it is actually possible to implement audio effects in the MFCC domain with perfect reconstruction. You just have to keep around all the information that gets discarded, resulting in this signal chain:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Take the STFT. The following steps apply for each frame.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Compute the power spectrum (square of magnitude spectrum) and the phases.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Compute a bank of bandpass filters on the power spectrum, equally spaced on the mel-frequency scale. This is the mel spectrum, and it downsamples the magnitude spectrum, losing information.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Upsample the mel spectrum back up to full spectral envelope. Divide the magnitude spectrum by the envelope to produce the &lt;em&gt;residual spectrum&lt;/em&gt;. (You have to add a little epsilon to the envelope to prevent zero division.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Compute the logarithm and then the Discrete Cosine Transform of the mel spectrum to produce the MFCCs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Perform any processing desired.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Invert step 5: take the inverse DCT and then the exponent to produce the mel spectrum.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Invert step 4: upsample the mel spectrum to the spectral envelope, and multiply it by the residual spectrum to produce the power spectrum.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Recombine the power spectrum with the phases to produce the complex spectrum.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Inverse FFT, then overlap-add to resynthesize the signal.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;It’s a lot of steps, but as an extension of the basic MFCC algorithm, it’s not that much of a leap. I would not be surprised if someone has done this before, storing all residuals when computing the MFCCs so the process can be inverted, but I had difficulty finding prior work on this for the particular application of musical effects. Something similar is done in MFCC-based vocoders, where the “residual spectrum” instead replaced with speech parameters such as pitch, but I haven’t seen this done on general, non-speech signals.&lt;/p&gt;
&lt;p&gt;I will be testing on the following mono snippet of Ed Sheeran’s “Perfect.” (If you plan on doing many listening tests on a musical signal, never use a sample of music you enjoy.)&lt;/p&gt;
&lt;audio controls src="https://nathan.ho.name/audio/cepstral/NoOp.mp3"&gt;&lt;/audio&gt;&lt;p&gt;As for the parameters: mono, 48 kHz sample rate, 2048-sample FFT buffer with Hann window and 50% overlap, 30-band mel spectrum from 20 Hz to 20 kHz.&lt;/p&gt;
&lt;section id="cepstral-eq"&gt;
&lt;h2&gt;Cepstral EQ&lt;/h2&gt;
&lt;p&gt;Because of the nonlinearities involved in the signal chain, merely multiplying the MFCCs by a constant can do some pretty strange things. Zeroing out all MFCCs has the effect of removing the spectral envelope and whitening the signal. The effect on vocal signals is pronounced, turning Ed into a bumblebee.&lt;/p&gt;
&lt;audio controls src="https://nathan.ho.name/audio/cepstral/Whiten.mp3"&gt;&lt;/audio&gt;&lt;p&gt;Multiplying all MFCCs by 2 has a subtle, hollower quality, acting as an expander for the spectral envelope.&lt;/p&gt;
&lt;audio controls src="https://nathan.ho.name/audio/cepstral/Deepen.mp3"&gt;&lt;/audio&gt;&lt;p&gt;MFCCs are signed and can also be multiplied by negative values, which inverts the phase of a cosine wave component. The effect on the signal is hard to describe:&lt;/p&gt;
&lt;audio controls src="https://nathan.ho.name/audio/cepstral/SignFlip.mp3"&gt;&lt;/audio&gt;&lt;p&gt;We can apply any MFCC envelope desired. Here’s a sine wave:&lt;/p&gt;
&lt;audio controls src="https://nathan.ho.name/audio/cepstral/PartialSignFlip.mp3"&gt;&lt;/audio&gt;&lt;/section&gt;
&lt;section id="cepstral-frequency-shifting"&gt;
&lt;h2&gt;Cepstral frequency shifting&lt;/h2&gt;
&lt;p&gt;Technically this would be “quefrency shifting.” This cyclically rotates the MFCCs to brighten the signal:&lt;/p&gt;
&lt;audio controls src="https://nathan.ho.name/audio/cepstral/ShiftUp.mp3"&gt;&lt;/audio&gt;&lt;p&gt;And here’s the downward equivalent:&lt;/p&gt;
&lt;audio controls src="https://nathan.ho.name/audio/cepstral/ShiftDown.mp3"&gt;&lt;/audio&gt;&lt;/section&gt;
&lt;section id="cepstral-frequency-scaling"&gt;
&lt;h2&gt;Cepstral frequency scaling&lt;/h2&gt;
&lt;p&gt;Resampling the MFCCs sounds reminiscent of formant shifting. This is related to the time-scaling property of the Fourier transform: if you resample the spectrum, you’re also resampling the signal. Here’s upward scaling:&lt;/p&gt;
&lt;audio controls src="https://nathan.ho.name/audio/cepstral/ScaleUp.mp3"&gt;&lt;/audio&gt;&lt;p&gt;Here’s downward scaling:&lt;/p&gt;
&lt;audio controls src="https://nathan.ho.name/audio/cepstral/ScaleDown.mp3"&gt;&lt;/audio&gt;&lt;/section&gt;
&lt;section id="cepstral-time-based-effects"&gt;
&lt;h2&gt;Cepstral time-based effects&lt;/h2&gt;
&lt;p&gt;Here’s what happens when we freeze the MFCCs every few frames:&lt;/p&gt;
&lt;audio controls src="https://nathan.ho.name/audio/cepstral/Latch.mp3"&gt;&lt;/audio&gt;&lt;p&gt;Lowpass filtering the MFCCs over time tends to slur speech:&lt;/p&gt;
&lt;audio controls src="https://nathan.ho.name/audio/cepstral/Smooth.mp3"&gt;&lt;/audio&gt;&lt;/section&gt;
&lt;section id="stray-thoughts"&gt;
&lt;h2&gt;Stray thoughts&lt;/h2&gt;
&lt;p&gt;I have barely scratched the surface of cepstral effects here, opting only to explore the most mathematically straightforward operations. That the MFCCs produce some very weird and very musical effects, even with such simple transformations, is encouraging.&lt;/p&gt;
&lt;p&gt;In addition to playing with additional types of effects, it is also worthwhile to adjust the trasforms being used. The DCT as the space for the spectral envelope could be improved on. One (strange) possibility that came to mind is messing with the Multiresolution Analysis of the mel spectrum; I have no idea if that would sound interesting or not, but it’s worth a shot.&lt;/p&gt;
&lt;p&gt;It’s possible to bypass the MFCCs and just do the DCT of the log-spectrogram. I experimented with this and found that I couldn’t get it to sound as musical as the mel-based equivalent. I believe this is because the resolution of the FFT isn’t very perceptually salient. The mel scale is in fact doing a lot of heavy lifting here.&lt;/p&gt;
&lt;/section&gt;</description><category>dsp</category><category>effects</category><category>frequency transforms</category><guid>https://nathan.ho.name/posts/cepstrum/</guid><pubDate>Fri, 24 Nov 2023 02:02:56 GMT</pubDate></item><item><title>Audio Texture Resynthesis</title><link>https://nathan.ho.name/posts/texture-resynthesis/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;div&gt;&lt;img alt="Spectrograms of the audio signals later in the post." class="align-center" src="https://nathan.ho.name/images/texture_resynthesis.png"&gt;
&lt;p&gt;&lt;em&gt;Left: spectrogram of a child singing. Right: spectrogram of resynthesized audio.&lt;/em&gt;&lt;/p&gt;
&lt;section id="background"&gt;
&lt;h2&gt;Background&lt;/h2&gt;
&lt;p&gt;I was alerted to audio texture resynthesis methods by a student of mine who was interested in the collaborative work of researcher Vincent Lostanlen, musician Florian Hecker, and several others &lt;a class="citation-reference" href="https://nathan.ho.name/posts/texture-resynthesis/#lostanlen2019" id="citation-reference-1" role="doc-biblioref"&gt;[Lostanlen2019]&lt;/a&gt; &lt;a class="citation-reference" href="https://nathan.ho.name/posts/texture-resynthesis/#lostanlen2021" id="citation-reference-2" role="doc-biblioref"&gt;[Lostanlen2021]&lt;/a&gt; &lt;a class="citation-reference" href="https://nathan.ho.name/posts/texture-resynthesis/#anden2019" id="citation-reference-3" role="doc-biblioref"&gt;[Andén2019]&lt;/a&gt; &lt;a class="citation-reference" href="https://nathan.ho.name/posts/texture-resynthesis/#muradeli2022" id="citation-reference-4" role="doc-biblioref"&gt;[Muradeli2022]&lt;/a&gt;. Their efforts are built on an analysis method called “Joint Time-Frequency Scattering” (JTFS) based on the Continuous Wavelet Transform. In an attempt to understand the work better, I binged a wavelet transform textbook, &lt;a class="brackets" href="https://nathan.ho.name/posts/texture-resynthesis/#footnote-1" id="footnote-reference-1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; implemented a simplified version of JTFS-based resynthesis, and and briefly exchanged emails with Lostanlen. His helpful answers gave me the impression is that while JTFS is a powerful analysis technique, resynthesis was more of a side project and there are ways to accomplish similar effects that are more efficient and easier to code without compromising too much on musicality.&lt;/p&gt;
&lt;p&gt;Audio texture resynthesis has some history in computer music literature &lt;a class="citation-reference" href="https://nathan.ho.name/posts/texture-resynthesis/#schwartz2010" id="citation-reference-5" role="doc-biblioref"&gt;[Schwartz2010]&lt;/a&gt;, and some researchers have used resynthesis to help understand how the human brain processes audio &lt;a class="citation-reference" href="https://nathan.ho.name/posts/texture-resynthesis/#mcdermott2011" id="citation-reference-6" role="doc-biblioref"&gt;[McDermott2011]&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;After some experimentation with these methods, I found that it’s not too hard to build a simple audio texture resynthesizer that exhibits clear musical potential. In this blog post, I’ll walk through a basic technique for making such a system yourself. There won’t be any novel research here, just a demonstration of a minimum viable resynthesizer and my ideas on how to expand on it.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="algorithm"&gt;
&lt;h2&gt;Algorithm&lt;/h2&gt;
&lt;p&gt;The above-mentioned papers have used fancy techniques including the wavelet transform and auditory filter banks modeled after the human ear. However, I was able to get decent results with a standard STFT spectrogram, then using phase reconstruction to get time-domain audio samples. The full process looks like this:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Compute a magnitude spectrogram &lt;span class="math"&gt;\(S\)&lt;/span&gt; of the time-domain input signal &lt;span class="math"&gt;\(x\)&lt;/span&gt;. A fairly high overlap is advised.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Compute any number of feature vectors &lt;span class="math"&gt;\(F_1(S),\, F_2(S),\, \ldots,\, F_n(S)\)&lt;/span&gt; and define their concatenation as &lt;span class="math"&gt;\(F(S)\)&lt;/span&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Initialize a randomized magnitude spectrogram &lt;span class="math"&gt;\(\hat{S}\)&lt;/span&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use gradient descent on &lt;span class="math"&gt;\(\hat{S}\)&lt;/span&gt; to minimize the error &lt;span class="math"&gt;\(E(\hat{S}) = ||F(S) - F(\hat{S})||\)&lt;/span&gt; (using any norm such as the squared error).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use phase reconstruction such as the Griffin-Lim algorithm on &lt;span class="math"&gt;\(\hat{S}\)&lt;/span&gt; to produce a resynthesized signal &lt;span class="math"&gt;\(\hat{x}\)&lt;/span&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The cornerstone of making this algorithm work well is that we choose an &lt;span class="math"&gt;\(F(S)\)&lt;/span&gt; that’s differentiable (or reasonably close). This means that the gradient &lt;span class="math"&gt;\(\nabla E\)&lt;/span&gt; can be computed with automatic differentiation (classical backpropagation). As such, this algorithm is best implemented in a differentiable computing environment like PyTorch or Tensorflow.&lt;/p&gt;
&lt;p&gt;The features &lt;span class="math"&gt;\(F(S)\)&lt;/span&gt;, as well as their relative weights, greatly affect the sound. If &lt;span class="math"&gt;\(F(S)\)&lt;/span&gt; is highly time-dependent then the resynthesized signal will mimic the original in evolution. On the other hand, if &lt;span class="math"&gt;\(F(S)\)&lt;/span&gt; does a lot of pooling across the time axis then the resynthesized signal will mostly ignore the large-scale structure of the input signal. I’m mostly interested in the latter case, where &lt;span class="math"&gt;\(F(S)\)&lt;/span&gt; significantly “remixes” the input signal and disregards the overall structure of the original.&lt;/p&gt;
&lt;p&gt;We will represent &lt;span class="math"&gt;\(S\)&lt;/span&gt; as a 2D tensor where the first dimension is frequency and the second is time. As a matrix, each row is an FFT bin, and each column a frame.&lt;/p&gt;
&lt;p&gt;If using a fancy alternative to the magnitude spectrogram such CWT or cochlear filter banks, you may have to do gradient descent all the way back to the time-domain samples &lt;span class="math"&gt;\(x\)&lt;/span&gt;. These analysis methods break down to linear frequency transforms that produce complex numbers followed by computing the absolute value of each bin, so differentiability is maintained.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://nathan.ho.name/posts/texture-resynthesis/"&gt;Read more…&lt;/a&gt; (8 min remaining to read)&lt;/p&gt;&lt;/section&gt;&lt;/div&gt;</description><category>data science</category><category>dsp</category><category>effects</category><category>machine learning</category><category>machine listening</category><category>projects</category><guid>https://nathan.ho.name/posts/texture-resynthesis/</guid><pubDate>Tue, 25 Apr 2023 19:58:19 GMT</pubDate></item><item><title>Negative Compression</title><link>https://nathan.ho.name/posts/negative-compression/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;div&gt;&lt;p&gt;One blog post I’ve been meaning to write for a while is a comprehensive review of the design of dynamic range compressors and limiters, both digital and analog. Textbook compressor designs can be easily found, but like reverbs there are lots of weird little tricks from both hardware and software designs that supposedly define the distinctive musical character of different compressors. It may be a while before I finish that post because, while I’ve read a lot about the DSP of compressors, I don’t feel yet qualified to write on design. I haven’t yet designed a compressor plugin that I’m happy with, nor done a lot of compressor wine tasting, and the musical and psychoacoustic aspects of compressors are to me at least as important as the signal math.&lt;/p&gt;
&lt;p&gt;Nevertheless, there’s a weird corner of compressor design that I feel inspired to talk about, and it’s called negative compression. It’s a feature of a few commercial compressors; I’m not sure which was the first, but I first learned about the concept from &lt;a class="reference external" href="https://klanghelm.com/contents/products/DC1A.php"&gt;Klanghelm DC1A&lt;/a&gt;. Negative comp is the source of considerable confusion – just watch the &lt;a class="reference external" href="https://gearspace.com/board/so-much-gear-so-little-time/919446-negative-compression-ratios.html"&gt;Gearspace pundits&lt;/a&gt; go at it.&lt;/p&gt;
&lt;p&gt;The brief description is that a standard compressor, upon receiving a signal with increasing amplitude, will reach a point where the output amplitude will increase at a slower rate. If the compressor is a perfect limiter, the output amplitude will hit a hard limit and refuse to increase. A negative compressor takes it further – the output signal will eventually get quieter over time as the amplitude increases. If you feed a percussive signal into a negative compressor and drive it hard enough, it will punch a hole in the signal’s amplitude, and can split a transient in two. It can be a pretty bizarre effect, and seems underutilized.&lt;/p&gt;
&lt;p&gt;This explanation should be enough for most, but you know this blog. We do the math here. In this post, I will explain the basic mathematics of compressors to demystify negative compression, propose variants of negative compressors, and demonstrate how to do negative compression in SuperCollider.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://nathan.ho.name/posts/negative-compression/"&gt;Read more…&lt;/a&gt; (6 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>dsp</category><category>dynamic range compression</category><category>effects</category><category>supercollider</category><guid>https://nathan.ho.name/posts/negative-compression/</guid><pubDate>Thu, 23 Feb 2023 17:59:46 GMT</pubDate></item><item><title>A Closer Look at Clarence Barlow's ISIS</title><link>https://nathan.ho.name/posts/clarence-barlow-isis/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;div&gt;&lt;p&gt;In 2005, Clarence Barlow published a paper on &lt;em&gt;Intra-Samplar Interpolating Sinusoids&lt;/em&gt; (ISIS), an audio analysis-synthesis algorithm. It did not make much of a splash, with the paper having only 7 citations in Google Scholar. Nevertheless, it produces some interesting sounds, so let’s dive into it.&lt;/p&gt;
&lt;p&gt;First, some context. The precursor to ISIS is a technique Barlow calls &lt;em&gt;spectastics&lt;/em&gt;. In this method, the short-time Fourier transform is computed of an audio signal, and at each frame, the magnitude spectrum is resampled logarithmically to 12EDO and used as a probability distribution to select a pitch. The pitch sequence forms an extremely rapid melody, which can be synthesized or played on a robotic instrument. Barlow describes the spectasized melody as “remarkably like the original sound recording.”&lt;/p&gt;
&lt;p&gt;In ISIS, this concept is taken to an extreme by making the “melody” a constant amplitude sine wave whose frequency is changed every sample. Given a digital signal that doesn’t exceed ±1, we can interpolate between any two successive samples with a partial cycle of a sine wave. An image helps here; the dots show the sampled digital signal.&lt;/p&gt;
&lt;img alt="Graph showing equally sampled points interpolated by a sine wave with rapidly varying frequency." class="align-center" src="https://nathan.ho.name/images/isis_interpolation_example.png"&gt;
&lt;p&gt;For example, if the samples are 0 and 1, as seen in the first two samples in the image, we can interpolate with a quarter sine wave with a period of 4 samples and a frequency of 1/4th the sample rate. There are actually infinitely many ways to do this interpolation. For example, from 0 to 1 we can also have a sine wave that completes 5/4ths of a cycle. We can even assume that the initial phase of the sine wave is &lt;span class="math"&gt;\(\pi\)&lt;/span&gt; and the final phase &lt;span class="math"&gt;\(5\pi/2\)&lt;/span&gt;, or have the sine wave going backwards with phase ramping from 0 to &lt;span class="math"&gt;\(-3\pi/2\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;To resolve the ambiguity, ISIS restricts the frequency to be always nonnegative so the phase never goes backward, and always assumes the phase at every sample point is in the range &lt;span class="math"&gt;\([-\pi/2, \pi/2]\)&lt;/span&gt; modulo &lt;span class="math"&gt;\(2\pi\)&lt;/span&gt;. There are other approaches, such as picking the minimum possible nonnegative frequency or the frequency of minimum absolute value, which may produce interesting alternative sounds. I won’t get into these (this is a relatively low-effort post), but feel free to try them out.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://nathan.ho.name/posts/clarence-barlow-isis/"&gt;Read more…&lt;/a&gt; (7 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>dsp</category><category>effects</category><category>electronic music</category><guid>https://nathan.ho.name/posts/clarence-barlow-isis/</guid><pubDate>Sun, 15 Jan 2023 22:37:23 GMT</pubDate></item><item><title>Resource: "The Tube Screamer's Secret"</title><link>https://nathan.ho.name/posts/the-tube-screamers-secret/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;p&gt;A few years ago I bookmarked Boğaç Topaktaş’ 2005 article titled “The Tube Screamer’s Secret,” but today I was dismayed to discover that the domain had expired. This ensures that the page is now nearly impossible to find unless you already know the URL. I don’t normally make posts that are just a link to a third party, but this valuable resource might be forgotten otherwise. Here’s the page in the Wayback Machine:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://web.archive.org/web/20180127031808/http://bteaudio.com/articles/TSS/TSS.html"&gt;https://web.archive.org/web/20180127031808/http://bteaudio.com/articles/TSS/TSS.html&lt;/a&gt;&lt;/p&gt;</description><category>distortion</category><category>dsp</category><category>effects</category><category>virtual analog</category><guid>https://nathan.ho.name/posts/the-tube-screamers-secret/</guid><pubDate>Tue, 23 Aug 2022 16:50:01 GMT</pubDate></item><item><title>Integer Ring Modulation</title><link>https://nathan.ho.name/posts/integer-ring-modulation/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;p&gt;When I think of ring modulation – or multiplication of two bipolar audio signals – I usually think of a complex, polyphonic signal being ring modulated by an unrelated sine wave, producing an inharmonic effect. Indeed, this is what “ring modulator” means in many synthesizers’ effect racks. I associate it with early electronic music and frankly find it a little cheesy, so I don’t use it often.&lt;/p&gt;
&lt;p&gt;But if both signals are periodic and their frequencies are small integer multiples of a common fundamental, the resulting sound is harmonic. Mathematically this is no surprise, but the timbres you can get out of this are pretty compelling.&lt;/p&gt;
&lt;p&gt;I tend to get the best results from pulse waves, in which case ring modulation is identical to an XOR gate (plus an additional inversion). Here’s a 100 Hz square wave multiplied by a second square wave that steps from 100 Hz, 200 Hz, etc. to 2000 Hz and back.&lt;/p&gt;
&lt;audio controls src="https://nathan.ho.name/audio/integer_ring_modulation.mp3"&gt;&lt;/audio&gt;&lt;p&gt;As usual, here is SuperCollider code:&lt;/p&gt;
&lt;pre class="literal-block"&gt;(
{
    var freq, snd;
    freq = 100;
    snd = Pulse.ar(freq) * Pulse.ar(freq * LFTri.ar(0.3, 3).linlin(-1, 1, 1, 20).round);
    snd ! 2;
}.play(fadeTime: 0);
)&lt;/pre&gt;
&lt;p&gt;Try pulse-width modulation, slightly detuning oscillators for a beating effect, multiplying three or more oscillators, and filtering the oscillators prior to multiplication. There are applications here to synthesizing &lt;a class="reference external" href="https://www.ludomusicology.org/2018/12/09/what-is-1-bit-music/"&gt;1-bit music&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Credit goes to &lt;a class="reference external" href="https://proximalrecords.bandcamp.com/album/courtship-dances"&gt;Sahy Uhns&lt;/a&gt; for showing me this one some years ago.&lt;/p&gt;
&lt;p&gt;EDIT 2023-01-12: I have learned that Dave Rossum used this technique in Trident, calling it “zing modulation.” See &lt;a class="reference external" href="https://www.youtube.com/watch?v=dhBRItwQhag"&gt;this YouTube video&lt;/a&gt;.&lt;/p&gt;</description><category>dsp</category><category>effects</category><category>oscillators</category><category>synthesis</category><guid>https://nathan.ho.name/posts/integer-ring-modulation/</guid><pubDate>Thu, 31 Mar 2022 22:34:42 GMT</pubDate></item><item><title>Moisture Bass</title><link>https://nathan.ho.name/posts/moisture-bass/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;p&gt;If you haven’t heard of the YouTube channel &lt;a class="reference external" href="https://www.youtube.com/channel/UCVav0C4mJwkZkFBKlb35VhA"&gt;Bunting&lt;/a&gt;, it gets my strong recommendation. Bunting creates excellent style imitations of experimental bass music artists and breaks them down with succinct explanations. Notable is his minimal tooling: he uses mostly Ableton Live stock plugins and the free and open source wavetable synth &lt;a class="reference external" href="https://vital.audio/"&gt;Vital&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;His &lt;a class="reference external" href="https://www.youtube.com/watch?v=m-29y7vwPqs"&gt;latest tutorial&lt;/a&gt;, mimicking the style of the artist Resonant Language, contains several bass sounds with a property he calls “moisture” (&lt;a class="reference external" href="https://youtu.be/m-29y7vwPqs?t=1089"&gt;timestamp&lt;/a&gt;). These bass sounds are created by starting with a low saw wave, boosting the highs, and running the result through Ableton Live’s vocoder set on “Modulator” mode. According to the &lt;a class="reference external" href="https://www.ableton.com/en/manual/live-audio-effect-reference/#24-44-vocoder"&gt;manual&lt;/a&gt;, this enables self-vocoding, where the same signal is the modulator and carrier. An abstract view of a vocoder would suggest that this does little or nothing to the saw wave other than change its spectral tilt, but the reality is much more interesting. Hear for yourself an EQ’d saw wave before and after self-vocoding:&lt;/p&gt;
&lt;p&gt;&lt;audio controls src="https://nathan.ho.name/audio/moisture_saw.mp3"&gt;&lt;/audio&gt;&lt;/p&gt;
&lt;p&gt;&lt;audio controls src="https://nathan.ho.name/audio/moisture_vocoder.mp3"&gt;&lt;/audio&gt;&lt;/p&gt;&lt;p&gt;A closer inspection of the latter waveform shows why the self-vocoded saw sounds the way it does. Here’s a single pitch period:&lt;/p&gt;
&lt;img alt="A single pitch period of the second audio. It is mostly smooth but contains some very spiky oscillations in one portion of the waveform. The oscillations have a sudden onset and rapidly slow down." class="align-center" src="https://nathan.ho.name/images/moisture.png"&gt;
&lt;p&gt;The discontinuity in the saw signal is decorated with a chirp, or a sine wave that rapidly descends in frequency. This little 909 kick drum every pitch period is responsible for the “moisture” sound. Certainly there have been no studies on the psychoacoustics of moisture bass (for lack of a better term), but I suspect that it mimics dispersive behavior, lending a vaguely acoustic sound.&lt;/p&gt;
&lt;p&gt;The chirp originates from the bandpass filters in the vocoder. The frequencies of the vocoder are exponentially spaced, so the bandpass filters have to increase in bandwidth for higher frequencies to cover the gaps. Larger bandwidth means lower Q, and lower Q reduces the ring time in the filter’s impulse response. The result is that low frequencies ring longer when the vocoder is pinged and high frequencies ring shorter. Mix them all together, and you have an impulse response resembling a chirp.&lt;/p&gt;
&lt;p&gt;Self-vocoding with exponentially spaced bands is clever, but it isn’t the only way to create this effect. One option is to eliminate the vocoding part and use only the exponentially spaced bandpass filters, like an old-school filter bank. This sounds just like self-vocoding but requires fewer bandpass filters to work. In my experiments, I found that putting the resulting signal through nonlinear distortion is necessary to bring out the moisture property.&lt;/p&gt;
&lt;p&gt;A more direct approach is to use wavefolding on a curved signal. The slope of the input signal controls the rate that it scrubs through wavefolding function, and thus controls the frequency of the resulting triangle wave. By modulating the slope from high in absolute value down to zero, a triangle wave descending in frequency is created. This is best explained visually:&lt;/p&gt;
&lt;img alt='A graph of two functions. One is labeled "curved signal," looking like a downward saw wave but with decreasing slope as it reaches a trough. The other is labeled "wavefolder output," showing the result of wavefolding on the curved signal, which displays triangle wave oscillations that onset at each pitch period and quickly decrease in frequency.' class="align-center" src="https://nathan.ho.name/images/moisture_wavefold.png"&gt;
&lt;p&gt;And here’s how it sounds:&lt;/p&gt;
&lt;audio controls src="https://nathan.ho.name/audio/moisture_wavefold.mp3"&gt;&lt;/audio&gt;</description><category>dsp</category><category>effects</category><category>oscillators</category><category>synthesis</category><guid>https://nathan.ho.name/posts/moisture-bass/</guid><pubDate>Sat, 18 Sep 2021 01:34:03 GMT</pubDate></item><item><title>Chip Fuzzing Synthesis</title><link>https://nathan.ho.name/posts/chip-fuzzing-synthesis/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;p&gt;I’m unsure whether I read about this or dreamt it (this year has been a blur) but I recall someone fuzzing a retro sound chip, most likely the Yamaha OPL3, by sending it random bits for its synthesis parameters and recording the output. Drawing from this, we can explore “chip fuzzing synthesis,” the art of feeding total digital randomness into a synthesis algorithm and seeing what comes out.&lt;/p&gt;
&lt;p&gt;There is no specific need for a real retro sound chip or even an accurate emulation of one, but it helps to understand how some old sound chips operate to look for inspiration. As an example, we can look at the Commodore 64’s &lt;a class="reference external" href="https://www.c64-wiki.com/wiki/SID"&gt;SID&lt;/a&gt;. This chip is an analog subtractive synthesizer, providing three oscillators with frequency inputs, waveform selection (saw, pulse, triangle, noise), and ADSR envelope generators, all mixed into a filter with controllable cutoff and famously nonfunctioning resonance.&lt;/p&gt;
&lt;p&gt;The parameters of the SID are controlled by an internal set of 32 8-bit registers, which are written to using a 5-bit parallel address bus and an 8-bit parallel data bus. In C-like pseudocode, communication with the SID can be emulated like so: &lt;a class="brackets" href="https://nathan.ho.name/posts/chip-fuzzing-synthesis/#footnote-1" id="footnote-reference-1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code text"&gt;&lt;a id="rest_code_b0701dbf731640dfbffcfdb792ada133-1" name="rest_code_b0701dbf731640dfbffcfdb792ada133-1" href="https://nathan.ho.name/posts/chip-fuzzing-synthesis/#rest_code_b0701dbf731640dfbffcfdb792ada133-1"&gt;&lt;/a&gt;char sidRegisters[32];
&lt;a id="rest_code_b0701dbf731640dfbffcfdb792ada133-2" name="rest_code_b0701dbf731640dfbffcfdb792ada133-2" href="https://nathan.ho.name/posts/chip-fuzzing-synthesis/#rest_code_b0701dbf731640dfbffcfdb792ada133-2"&gt;&lt;/a&gt;
&lt;a id="rest_code_b0701dbf731640dfbffcfdb792ada133-3" name="rest_code_b0701dbf731640dfbffcfdb792ada133-3" href="https://nathan.ho.name/posts/chip-fuzzing-synthesis/#rest_code_b0701dbf731640dfbffcfdb792ada133-3"&gt;&lt;/a&gt;// Parallel ports used to communicate with SID.
&lt;a id="rest_code_b0701dbf731640dfbffcfdb792ada133-4" name="rest_code_b0701dbf731640dfbffcfdb792ada133-4" href="https://nathan.ho.name/posts/chip-fuzzing-synthesis/#rest_code_b0701dbf731640dfbffcfdb792ada133-4"&gt;&lt;/a&gt;char addressBus = 0;
&lt;a id="rest_code_b0701dbf731640dfbffcfdb792ada133-5" name="rest_code_b0701dbf731640dfbffcfdb792ada133-5" href="https://nathan.ho.name/posts/chip-fuzzing-synthesis/#rest_code_b0701dbf731640dfbffcfdb792ada133-5"&gt;&lt;/a&gt;char dataBus = 0;
&lt;a id="rest_code_b0701dbf731640dfbffcfdb792ada133-6" name="rest_code_b0701dbf731640dfbffcfdb792ada133-6" href="https://nathan.ho.name/posts/chip-fuzzing-synthesis/#rest_code_b0701dbf731640dfbffcfdb792ada133-6"&gt;&lt;/a&gt;
&lt;a id="rest_code_b0701dbf731640dfbffcfdb792ada133-7" name="rest_code_b0701dbf731640dfbffcfdb792ada133-7" href="https://nathan.ho.name/posts/chip-fuzzing-synthesis/#rest_code_b0701dbf731640dfbffcfdb792ada133-7"&gt;&lt;/a&gt;void writeSid(char address, char value)
&lt;a id="rest_code_b0701dbf731640dfbffcfdb792ada133-8" name="rest_code_b0701dbf731640dfbffcfdb792ada133-8" href="https://nathan.ho.name/posts/chip-fuzzing-synthesis/#rest_code_b0701dbf731640dfbffcfdb792ada133-8"&gt;&lt;/a&gt;{
&lt;a id="rest_code_b0701dbf731640dfbffcfdb792ada133-9" name="rest_code_b0701dbf731640dfbffcfdb792ada133-9" href="https://nathan.ho.name/posts/chip-fuzzing-synthesis/#rest_code_b0701dbf731640dfbffcfdb792ada133-9"&gt;&lt;/a&gt;    addressBus = address &amp;amp; 31;
&lt;a id="rest_code_b0701dbf731640dfbffcfdb792ada133-10" name="rest_code_b0701dbf731640dfbffcfdb792ada133-10" href="https://nathan.ho.name/posts/chip-fuzzing-synthesis/#rest_code_b0701dbf731640dfbffcfdb792ada133-10"&gt;&lt;/a&gt;    dataBus = value;
&lt;a id="rest_code_b0701dbf731640dfbffcfdb792ada133-11" name="rest_code_b0701dbf731640dfbffcfdb792ada133-11" href="https://nathan.ho.name/posts/chip-fuzzing-synthesis/#rest_code_b0701dbf731640dfbffcfdb792ada133-11"&gt;&lt;/a&gt;    sidRegisters[addressBus] = dataBus;
&lt;a id="rest_code_b0701dbf731640dfbffcfdb792ada133-12" name="rest_code_b0701dbf731640dfbffcfdb792ada133-12" href="https://nathan.ho.name/posts/chip-fuzzing-synthesis/#rest_code_b0701dbf731640dfbffcfdb792ada133-12"&gt;&lt;/a&gt;}
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The SID interprets the &lt;code class="docutils literal"&gt;sidRegisters&lt;/code&gt; array and maps various bits and bytes to analog synth parameters. For example, registers 0 and 1, taken as a 16-bit integer, control the frequency of an oscillator, and individual bits in register 4 select the waveform and enable ring modulation and hard sync.&lt;/p&gt;
&lt;p&gt;Fuzzing the address and data buses is the equivalent of calling &lt;code class="docutils literal"&gt;writeSid&lt;/code&gt; repeatedly with randomized &lt;code class="docutils literal"&gt;address&lt;/code&gt; and &lt;code class="docutils literal"&gt;value&lt;/code&gt;, writing random data to random registers. The exact rate at which random data is written is up to you. I find that slow randomization produces the most coherent results and has the least chance of turning the output into white noise. A few hundred times a second is a good start.&lt;/p&gt;
&lt;p&gt;It also suffices to take a simpler route and feed high-frequency random noise (sample-and-hold, maybe) into every parameter of a synth. Again, we don’t need a vintage emulation at all – a minimal subtractive monosynth with waveform selection, ADSR envelope, and a few switchable filter types is adequate to get glitchy sounds. So here’s a little patch:&lt;/p&gt;
&lt;audio controls src="https://nathan.ho.name/audio/chip_fuzzing.mp3"&gt;&lt;/audio&gt;&lt;p&gt;This is sonically uncompromising (and hey, maybe that’s your thing), but still makes a useful raw source for more polished sound design. Here’s the same patch as above with some minor modifications and a lot of post-effects like granulators, distortion, and reverb:&lt;/p&gt;
&lt;audio controls src="https://nathan.ho.name/audio/chip_fuzzing_processed.mp3"&gt;&lt;/audio&gt;&lt;p&gt;The outcome of chip fuzzing synthesis is highly dependent on the choice of synthesis algorithm, the set of parameters, and the ranges for said parameters. I can imagine fuzzing FM, subtractive, additive, physical modelling, and parameters of an effects chain. The more inputs to fuzz, the better – especially inputs that switch features on and off, exhibit complex interactions with other inputs, and/or unearth bugs and artifacts.&lt;/p&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="footnote-1" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="https://nathan.ho.name/posts/chip-fuzzing-synthesis/#footnote-reference-1"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;The above code may seem on the vacuous side, but writing it out this way allows for transformations that fall under the umbrella of bit bending. For example, performing a bitwise AND on the &lt;code class="docutils literal"&gt;addressBus&lt;/code&gt; or &lt;code class="docutils literal"&gt;dataBus&lt;/code&gt; is the equivalent of disconnecting pins of the chip. Similar bitwise operations permit shorting pins to 1, rewiring pins to other pins, downsampling individual pins, etc. Bit bending operations can be used in conjunction with chip fuzzing, or for glitching out “normal” musical input.&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;</description><category>bending</category><category>dsp</category><category>effects</category><guid>https://nathan.ho.name/posts/chip-fuzzing-synthesis/</guid><pubDate>Sun, 22 Nov 2020 08:00:00 GMT</pubDate></item><item><title>Low Battery Audio Effects</title><link>https://nathan.ho.name/posts/low-battery-audio-effects/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;p&gt;Searching YouTube for videos of low battery toys and keyboards brings up results like “&lt;a class="reference external" href="https://www.youtube.com/watch?v=dHchmWsrfUo"&gt;Demon Possessed Singing Trout&lt;/a&gt;.” Please watch the video before proceeding.&lt;/p&gt;
&lt;p&gt;I have a limited understanding of electronics, but a compulsory need to explain this phenomenon due to tech blogger ego syndrome. A low battery has an abnormally high internal resistance, causing its voltage to sag in response to the loads it’s supporting. If it’s powering multiple things, they will interact in strange ways. The distorted audio from the singing fish sounds like the clock rate is dropping in reaction to the load of the speaker. (The servo motors might also be causing voltage sag, although it isn’t entirely clear from the video.)&lt;/p&gt;
&lt;p&gt;The speaker/clock interaction is interesting since it works in a feedback loop: the clock controls the playback rate, and the amplitude of the output audio draws current that affects the clock. This inspires a general method for turning an audio algorithm into a “low battery” version:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Run a DSP algorithm such as sample playback, synthesizer, effect, etc. that can be operated at a variable clock rate.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Apply filters like a full-wave rectifier, envelope follower, or simple lowpass to simulate speaker load. Optional.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Apply a highpass filter to block dc. (This helps prevent the algorithm from getting stuck.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use this signal to control the clock rate of the DSP algorithm, so that a signal of higher amplitude lowers the clock rate.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The casual experiments I’ve done with this are promising. At subtle settings, this creates wandering, droopy pitch bends. Pushed to the extreme, it produces squelchy signal-dependent distortion. I especially like its effect on percussive signals, where louder transients are stretched out and any rhythmic pulse becomes irregular. I’m imagining software plugins that emulate digital hardware could be augmented with a “battery” knob that lets the user control how much the clock rate sags in response to the output signal.&lt;/p&gt;</description><category>bending</category><category>dsp</category><category>effects</category><category>virtual analog</category><guid>https://nathan.ho.name/posts/low-battery-audio-effects/</guid><pubDate>Mon, 25 May 2020 07:00:00 GMT</pubDate></item></channel></rss>