<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Nathan Ho (Posts about projects)</title><link>https://nathan.ho.name/</link><description></description><atom:link href="https://nathan.ho.name/categories/projects.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>© 2026</copyright><lastBuildDate>Mon, 11 May 2026 05:43:11 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Nathan Ho - Steganography</title><link>https://nathan.ho.name/posts/steganography/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;a class="reference external image-reference" href="https://nathanho.bandcamp.com/album/steganography"&gt;
&lt;img alt="Album cover for Steganography. Diamond-shaped white fractal against a black background." class="align-center" src="https://nathan.ho.name/images/steganography_cover_art_medium_res.png" style="width: 30em;"&gt;
&lt;/a&gt;
&lt;p&gt;&lt;a class="reference external" href="https://nathanho.bandcamp.com/album/steganography"&gt;&lt;em&gt;Steganography&lt;/em&gt;&lt;/a&gt; is my second album. Eleven tracks, 30 minutes. Fully synthesized. Distributed by &lt;a class="reference external" href="https://3op.xyz/"&gt;3OP&lt;/a&gt;. Digital download on Bandcamp.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Vertex Figurine&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Growing Plums in the Desert&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Why Becoming Selfish is the Best Thing I Ever Did&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Unmaker Process&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Despite Our Best Efforts&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Warring Factions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Irreversible Information Acquisition&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Things Toxic Friends Say&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Systematic Instinct&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Stab Variation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Treachery&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Mastering by Nil Hartman.&lt;/p&gt;
&lt;p&gt;Available as a hand-bound 250-page hardcover book containing the complete SuperCollider source code to the album. A hole cut in it contains a USB stick with 48k/24 audio files. &lt;strong&gt;Limited run of five books&lt;/strong&gt;. I am making them all by hand. (I may make another batch if they sell out, but no promises!)&lt;/p&gt;
&lt;p&gt;The source code is not available digitally. These books are the only place to get it.&lt;/p&gt;
&lt;a class="reference external image-reference" href="https://nathanho.bandcamp.com/album/steganography"&gt;
&lt;img alt="Photo of the Steganography book." class="align-center" src="https://nathan.ho.name/images/steganography/steganography_book_photo_1_small.jpg" style="width: 30em;"&gt;
&lt;/a&gt;
&lt;a class="reference external image-reference" href="https://nathanho.bandcamp.com/album/steganography"&gt;
&lt;img alt="Photo of the Steganography book." class="align-center" src="https://nathan.ho.name/images/steganography/steganography_book_photo_2_small.jpg" style="width: 30em;"&gt;
&lt;/a&gt;
&lt;a class="reference external image-reference" href="https://nathanho.bandcamp.com/album/steganography"&gt;
&lt;img alt="Photo of the Steganography book." class="align-center" src="https://nathan.ho.name/images/steganography/steganography_book_photo_3_small.jpg" style="width: 30em;"&gt;
&lt;/a&gt;</description><category>electronic music</category><category>projects</category><guid>https://nathan.ho.name/posts/steganography/</guid><pubDate>Mon, 11 May 2026 05:00:00 GMT</pubDate></item><item><title>Lime68k &amp; Nathan Ho - Striations of Grace</title><link>https://nathan.ho.name/posts/striations-of-grace/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;a class="reference external image-reference" href="https://evel.bandcamp.com/album/striations-of-grace"&gt;
&lt;img alt="Album cover for Striations of Grace. Abstract 3D art with dark red and black." class="align-center" src="https://nathan.ho.name/images/striations_of_grace_cover.png" style="width: 30em;"&gt;
&lt;/a&gt;
&lt;p&gt;&lt;a class="reference external" href="https://evel.bandcamp.com/album/striations-of-grace"&gt;&lt;em&gt;Striations of Grace&lt;/em&gt;&lt;/a&gt;, a collaborative album on by &lt;a class="reference external" href="https://l68k.com/"&gt;Lime68k&lt;/a&gt; and me, has been announced on EVEL Records, releasing October 3rd on digital, CD, and a 7” vinyl edition. Seven tracks, 53 minutes of cosmic horror synthesis mutations.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Lime68k &amp;amp; Nathan Ho - SOLVE&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Nathan Ho - The Kraken&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lime68k - Out to Own Concern&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lime68k &amp;amp; Nathan Ho - We Regret the Error I&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lime68k - Who’s Out Concerns [vinyl release side A]&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Nathan Ho - Exterior Algebra and Heroin [vinyl release side B]&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lime68k &amp;amp; Nathan Ho - We Regret the Error II&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Lime68k used Max/MSP, I used SuperCollider. One year in the making.&lt;/p&gt;
&lt;p&gt;Mastering by Alfonso. Cover art by Aplauso.&lt;/p&gt;</description><category>electronic music</category><category>projects</category><guid>https://nathan.ho.name/posts/striations-of-grace/</guid><pubDate>Tue, 30 Sep 2025 15:23:17 GMT</pubDate></item><item><title>Nathan Ho - Haywire Frontier</title><link>https://nathan.ho.name/posts/haywire-frontier/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;a class="reference external image-reference" href="https://nathanho.bandcamp.com/album/haywire-frontier"&gt;
&lt;img alt="Album cover for Haywire Frontier. Digital drawing of an androgynous figure, mid-leap, brandishing two swords above their head." class="align-center" src="https://nathan.ho.name/images/haywire_frontier_cover.png" style="width: 30em;"&gt;
&lt;/a&gt;
&lt;p&gt;My first full-length solo album, &lt;em&gt;Haywire Frontier&lt;/em&gt;, is releasing on Saturday, September 9th on the Japanese label Tokinogake. It is &lt;a class="reference external" href="https://nathanho.bandcamp.com/album/haywire-frontier"&gt;available for preorder&lt;/a&gt; now, and you can listen to the opening track “Trickster Deity.”&lt;/p&gt;
&lt;p&gt;Here are the liner notes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In 2008, at the age of 11, I created Googology Wiki on my parents’ computer. “Googology” is word for the study of large numbers and fast-growing functions, deriving from the 9-year-old Milton Sirotta’s coinage of the term “googol.” The website was never meant to go beyond my personal use, and I gradually drifted away from it. Fifteen years later, it has grown to tens of thousands of articles and a community of hundreds of active users.&lt;/p&gt;
&lt;p&gt;Haywire Frontier is a 40-minute musical tribute to a strange corner of amateur mathematics whose growth I somewhat-inadvertently catalyzed, with rhythmic and formal material deriving from Georg Cantor’s “ordinal number” system, integral to the study of large numbers.&lt;/p&gt;
&lt;p&gt;The album was sequenced and synthesized entirely in SuperCollider with no samples, external hardware, or third-party plugins.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Credits:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://paletterecordings.com/"&gt;John Tejada&lt;/a&gt;, mastering&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://www.isawiitch.com/"&gt;Isa Hanssen&lt;/a&gt; (&lt;a class="reference external" href="https://www.instagram.com/isawiitch/"&gt;Instagram&lt;/a&gt;), cover art&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Special thanks to a0n0, Charlie Burgin (Sahy Uhns), William Fields, RM Francis, Joonas Siren (Forces), Ben Tillotson, Nathan Turczan.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I expect to write about this project in the near future. Thank you for listening, and for all your support.&lt;/p&gt;</description><category>electronic music</category><category>projects</category><guid>https://nathan.ho.name/posts/haywire-frontier/</guid><pubDate>Sun, 03 Sep 2023 00:57:37 GMT</pubDate></item><item><title>Audio Texture Resynthesis</title><link>https://nathan.ho.name/posts/texture-resynthesis/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;div&gt;&lt;img alt="Spectrograms of the audio signals later in the post." class="align-center" src="https://nathan.ho.name/images/texture_resynthesis.png"&gt;
&lt;p&gt;&lt;em&gt;Left: spectrogram of a child singing. Right: spectrogram of resynthesized audio.&lt;/em&gt;&lt;/p&gt;
&lt;section id="background"&gt;
&lt;h2&gt;Background&lt;/h2&gt;
&lt;p&gt;I was alerted to audio texture resynthesis methods by a student of mine who was interested in the collaborative work of researcher Vincent Lostanlen, musician Florian Hecker, and several others &lt;a class="citation-reference" href="https://nathan.ho.name/posts/texture-resynthesis/#lostanlen2019" id="citation-reference-1" role="doc-biblioref"&gt;[Lostanlen2019]&lt;/a&gt; &lt;a class="citation-reference" href="https://nathan.ho.name/posts/texture-resynthesis/#lostanlen2021" id="citation-reference-2" role="doc-biblioref"&gt;[Lostanlen2021]&lt;/a&gt; &lt;a class="citation-reference" href="https://nathan.ho.name/posts/texture-resynthesis/#anden2019" id="citation-reference-3" role="doc-biblioref"&gt;[Andén2019]&lt;/a&gt; &lt;a class="citation-reference" href="https://nathan.ho.name/posts/texture-resynthesis/#muradeli2022" id="citation-reference-4" role="doc-biblioref"&gt;[Muradeli2022]&lt;/a&gt;. Their efforts are built on an analysis method called “Joint Time-Frequency Scattering” (JTFS) based on the Continuous Wavelet Transform. In an attempt to understand the work better, I binged a wavelet transform textbook, &lt;a class="brackets" href="https://nathan.ho.name/posts/texture-resynthesis/#footnote-1" id="footnote-reference-1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; implemented a simplified version of JTFS-based resynthesis, and and briefly exchanged emails with Lostanlen. His helpful answers gave me the impression is that while JTFS is a powerful analysis technique, resynthesis was more of a side project and there are ways to accomplish similar effects that are more efficient and easier to code without compromising too much on musicality.&lt;/p&gt;
&lt;p&gt;Audio texture resynthesis has some history in computer music literature &lt;a class="citation-reference" href="https://nathan.ho.name/posts/texture-resynthesis/#schwartz2010" id="citation-reference-5" role="doc-biblioref"&gt;[Schwartz2010]&lt;/a&gt;, and some researchers have used resynthesis to help understand how the human brain processes audio &lt;a class="citation-reference" href="https://nathan.ho.name/posts/texture-resynthesis/#mcdermott2011" id="citation-reference-6" role="doc-biblioref"&gt;[McDermott2011]&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;After some experimentation with these methods, I found that it’s not too hard to build a simple audio texture resynthesizer that exhibits clear musical potential. In this blog post, I’ll walk through a basic technique for making such a system yourself. There won’t be any novel research here, just a demonstration of a minimum viable resynthesizer and my ideas on how to expand on it.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="algorithm"&gt;
&lt;h2&gt;Algorithm&lt;/h2&gt;
&lt;p&gt;The above-mentioned papers have used fancy techniques including the wavelet transform and auditory filter banks modeled after the human ear. However, I was able to get decent results with a standard STFT spectrogram, then using phase reconstruction to get time-domain audio samples. The full process looks like this:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Compute a magnitude spectrogram &lt;span class="math"&gt;\(S\)&lt;/span&gt; of the time-domain input signal &lt;span class="math"&gt;\(x\)&lt;/span&gt;. A fairly high overlap is advised.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Compute any number of feature vectors &lt;span class="math"&gt;\(F_1(S),\, F_2(S),\, \ldots,\, F_n(S)\)&lt;/span&gt; and define their concatenation as &lt;span class="math"&gt;\(F(S)\)&lt;/span&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Initialize a randomized magnitude spectrogram &lt;span class="math"&gt;\(\hat{S}\)&lt;/span&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use gradient descent on &lt;span class="math"&gt;\(\hat{S}\)&lt;/span&gt; to minimize the error &lt;span class="math"&gt;\(E(\hat{S}) = ||F(S) - F(\hat{S})||\)&lt;/span&gt; (using any norm such as the squared error).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use phase reconstruction such as the Griffin-Lim algorithm on &lt;span class="math"&gt;\(\hat{S}\)&lt;/span&gt; to produce a resynthesized signal &lt;span class="math"&gt;\(\hat{x}\)&lt;/span&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The cornerstone of making this algorithm work well is that we choose an &lt;span class="math"&gt;\(F(S)\)&lt;/span&gt; that’s differentiable (or reasonably close). This means that the gradient &lt;span class="math"&gt;\(\nabla E\)&lt;/span&gt; can be computed with automatic differentiation (classical backpropagation). As such, this algorithm is best implemented in a differentiable computing environment like PyTorch or Tensorflow.&lt;/p&gt;
&lt;p&gt;The features &lt;span class="math"&gt;\(F(S)\)&lt;/span&gt;, as well as their relative weights, greatly affect the sound. If &lt;span class="math"&gt;\(F(S)\)&lt;/span&gt; is highly time-dependent then the resynthesized signal will mimic the original in evolution. On the other hand, if &lt;span class="math"&gt;\(F(S)\)&lt;/span&gt; does a lot of pooling across the time axis then the resynthesized signal will mostly ignore the large-scale structure of the input signal. I’m mostly interested in the latter case, where &lt;span class="math"&gt;\(F(S)\)&lt;/span&gt; significantly “remixes” the input signal and disregards the overall structure of the original.&lt;/p&gt;
&lt;p&gt;We will represent &lt;span class="math"&gt;\(S\)&lt;/span&gt; as a 2D tensor where the first dimension is frequency and the second is time. As a matrix, each row is an FFT bin, and each column a frame.&lt;/p&gt;
&lt;p&gt;If using a fancy alternative to the magnitude spectrogram such CWT or cochlear filter banks, you may have to do gradient descent all the way back to the time-domain samples &lt;span class="math"&gt;\(x\)&lt;/span&gt;. These analysis methods break down to linear frequency transforms that produce complex numbers followed by computing the absolute value of each bin, so differentiability is maintained.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://nathan.ho.name/posts/texture-resynthesis/"&gt;Read more…&lt;/a&gt; (8 min remaining to read)&lt;/p&gt;&lt;/section&gt;&lt;/div&gt;</description><category>data science</category><category>dsp</category><category>effects</category><category>machine learning</category><category>machine listening</category><category>projects</category><guid>https://nathan.ho.name/posts/texture-resynthesis/</guid><pubDate>Tue, 25 Apr 2023 19:58:19 GMT</pubDate></item><item><title>Interstate Hydra - Nocturnes</title><link>https://nathan.ho.name/posts/interstate-hydra-nocturnes/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;a class="reference external image-reference" href="https://interstatehydra.bandcamp.com/album/nocturnes"&gt;
&lt;img alt="Album cover for Nocturnes. Abstract design, nearly completely black with some indigo." class="align-center" src="https://nathan.ho.name/images/nocturnes_cover.jpg" style="width: 30em;"&gt;
&lt;/a&gt;
&lt;p&gt;Exactly three years ago I made an ambient EP under the name “&lt;a class="reference external" href="https://interstatehydra.bandcamp.com/"&gt;Interstate Hydra&lt;/a&gt;” and put it up on Bandcamp. Up until now I have kept this alias a secret and only shown these tunes to a few people, but I figured it’s been long enough.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://interstatehydra.bandcamp.com/album/nocturnes"&gt;&lt;em&gt;Nocturnes&lt;/em&gt;&lt;/a&gt;  was made entirely in Audacity using the “&lt;a class="reference external" href="https://nathan.ho.name/posts/sound-dumplings/"&gt;sound dumplings&lt;/a&gt;” method described in an earlier post. This was a total 180 from my usual workflow, which is to synthesize everything with SuperCollider code.&lt;/p&gt;
&lt;p&gt;“&lt;a class="reference external" href="https://interstatehydra.bandcamp.com/track/for-ellie"&gt;For Ellie&lt;/a&gt;” is a single released months after &lt;em&gt;Nocturnes&lt;/em&gt;, and is formally the simplest Interstate Hydra track, comprising a row of sound dumplings in arch form.&lt;/p&gt;
&lt;p&gt;I don’t plan on returning to the Interstate Hydra alias for quite some time, but I hope you enjoy these tracks. Thank you for listening.&lt;/p&gt;</description><category>electronic music</category><category>projects</category><category>releases</category><guid>https://nathan.ho.name/posts/interstate-hydra-nocturnes/</guid><pubDate>Mon, 16 Jan 2023 08:00:00 GMT</pubDate></item><item><title>OddVoices Dev Log 3: Pitch Contours</title><link>https://nathan.ho.name/posts/oddvoices-dev-log-3/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;p&gt;This is part of an ongoing series of posts about &lt;a class="reference external" href="https://gitlab.com/oddvoices/oddvoices/"&gt;OddVoices&lt;/a&gt;, a singing synthesizer I’ve been building. OddVoices has a Web version, which you can now access at the newly registered domain &lt;a class="reference external" href="https://oddvoices.org/"&gt;oddvoices.org&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Unless we’re talking about pitch correction settings, the pitch of a human voice is generally not piecewise constant. A big part of any vocal style is pitch inflections, and I’m happy to say that these have been greatly improved in OddVoices based on studies of real pitch data. But first, we need…&lt;/p&gt;
&lt;section id="pitch-detection"&gt;
&lt;h2&gt;Pitch detection&lt;/h2&gt;
&lt;p&gt;A robust and high-precision monophonic pitch detector is vital to OddVoices for two reasons: first, the input vocal database needs to be normalized in pitch during the PSOLA analysis process, and second, the experiments we conduct later in this blog post require such a pitch detector.&lt;/p&gt;
&lt;p&gt;There’s probably tons of Python code out there for pitch detection, but I felt like writing my own implementation to learn a bit about the process. My requirements are that the pitch detector should work on speech signals, have high accuracy, be as immune to octave errors as possible, and not require an expensive GPU or a massive dataset to train. I don’t need real time capabilities (although reasonable speed is desirable), high background noise tolerance, or polyphonic operation.&lt;/p&gt;
&lt;p&gt;I shopped around a few different papers and spent long hours implementing different algorithms. I coded up the following:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Cepstral analysis &lt;a class="citation-reference" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#noll1966" id="citation-reference-1" role="doc-biblioref"&gt;[Noll1966]&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Autocorrelation function (ACF) with prefiltering &lt;a class="citation-reference" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#rabiner1977" id="citation-reference-2" role="doc-biblioref"&gt;[Rabiner1977]&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Harmonic Product Spectrum (HPS)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A simplified variant of Spectral Peak Analysis (SPA) &lt;a class="citation-reference" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#dziubinski2004" id="citation-reference-3" role="doc-biblioref"&gt;[Dziubinski2004]&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Special Normalized Autocorrelation (SNAC) &lt;a class="citation-reference" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#mcleod2008" id="citation-reference-4" role="doc-biblioref"&gt;[McLeod2008]&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fourier Approximation Method (FAM) &lt;a class="citation-reference" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#kumaraswamy2015" id="citation-reference-5" role="doc-biblioref"&gt;[Kumaraswamy2015]&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There are tons more algorithms out there, but these were the ones that caught my eye for some reason or another. All methods have their own upsides and downsides, and all of them are clever in their own ways. Some algorithms have parameters that can be tweaked, and I did my best to experiment with those parameters to try to maximize results for the test dataset.&lt;/p&gt;
&lt;p&gt;I created a test dataset of 10000 random single-frame synthetic waveforms with fundamentals ranging from 60 Hz to 1000 Hz. Each one has harmonics ranging up to the Nyquist frequency, and the amplitudes of the harmonics are randomized and multiplied by &lt;span class="math"&gt;\(1 / n\)&lt;/span&gt; where &lt;span class="math"&gt;\(n\)&lt;/span&gt; is the harmonic number. Whether this is really representative of speech is not an easy question, but I figured it would be a good start.&lt;/p&gt;
&lt;p&gt;I scored each algorithm by how many times it produced a pitch within a semitone of the actual fundamental frequency. We’ll address accuracy issues in a moment. The scores are:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th class="head"&gt;&lt;p&gt;Algorithm&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Score&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;Cepstrum&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;9961/10000&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;SNAC&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;9941/10000&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;FAM&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;9919/10000&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;Simplified SPA&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;9789/10000&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;ACF&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;9739/10000&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;HPS&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;7743/10000&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;All the algorithms performed quite acceptably with the exception of the Harmonic Product Spectrum, which leads me to conclude that HPS is not really appropriate for pitch detection, although it does have other applications such as computing the chroma &lt;a class="citation-reference" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#lee2006" id="citation-reference-6" role="doc-biblioref"&gt;[Lee2006]&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;What surprised me most is that one of the simplest algorithms, cepstral analysis, also appears to be the best! Confusingly, a subjective study of seven pitch detection algorithms by McGonegal et al. &lt;a class="citation-reference" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#mcgonegal1977" id="citation-reference-7" role="doc-biblioref"&gt;[McGonegal1977]&lt;/a&gt; ranked the cepstrum as the 2nd worst. Go figure.&lt;/p&gt;
&lt;p&gt;I hope this comparison was an interesting one in spite of how small and unscientific the study is. Be reminded that it is always possible that I implemented one or more of the algorithms wrong, didn’t tweak it in the right way, or didn’t look much into strategies for improving it.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="the-final-algorithm"&gt;
&lt;h2&gt;The final algorithm&lt;/h2&gt;
&lt;p&gt;I arrived at the following algorithm by crossbreeding my favorite approaches:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Compute the “modified cepstrum” as the absolute value of the IFFT of &lt;span class="math"&gt;\(\log(1 + |X|)\)&lt;/span&gt;, where &lt;span class="math"&gt;\(X\)&lt;/span&gt; is the FFT of a 2048-sample input frame &lt;span class="math"&gt;\(x\)&lt;/span&gt; at a sample rate of 48000 Hz. The input frame is &lt;em&gt;not&lt;/em&gt; windowed – for whatever reason that worked better!&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Find the highest peak in the modified cepstrum whose quefrency is above a threshold derived from the maximum frequency we want to detect.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Find all peaks that exceed 0.5 times the value of the highest peak.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Find the peak closest to the last detected pitch, or if there is no last detected pitch, use the highest peak.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Convert quefrency into frequency to get the initial estimate of pitch.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Recompute the magnitude spectrum of &lt;span class="math"&gt;\(x\)&lt;/span&gt;, this time with a Hann window.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Find the values of the three bins around the FFT peak at the estimated pitch.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use an artificial neural network (ANN) on the bin values to interpolate the exact frequency.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The idea of the modified cepstrum, i.e. adding 1 before taking the logarithm of the magnitude spectrum, is borrowed from Philip McLeod’s dissertation on SNAC, and prevents taking the logarithm of values too close to zero. The peak picking method is also taken from the same resource.&lt;/p&gt;
&lt;p&gt;The use of an artificial neural network to refine the estimate is from the SPA paper &lt;a class="citation-reference" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#dziubinski2004" id="citation-reference-8" role="doc-biblioref"&gt;[Dziubinski2004]&lt;/a&gt;. The ANN in question is a classic feedforward perceptron, and takes as input the magnitudes of three FFT bins around a peak, normalized so the center bin has an amplitude of 1.0. This means that the center bin’s amplitude is not needed and only two input neurons are necessary. Next, there is a hidden layer with four neurons and a tanh activation function, and finally an output layer with a single neuron and a linear activation function. The output format of the ANN ranges from -1 to +1 and indicates the offset of the sinusoidal frequency from the center bin, measured in bins.&lt;/p&gt;
&lt;p&gt;The ANN is trained on a set of synthetic data similar to the test data described above. I used the &lt;code class="docutils literal"&gt;MLPRegressor&lt;/code&gt; in scikit-learn, set to the default “adam” optimizer. The ANN works astonishingly well, yielding average errors less than 1 cent against my synthetic test set.&lt;/p&gt;
&lt;p&gt;In spite of the efforts to find a nearly error-free pitch detector, the above algorithm still sometimes produces errors. Errors are identified as pitch data points that exceed a manually specified range. Errors are corrected by linearly interpolating the surrounding good data points.&lt;/p&gt;
&lt;p&gt;Source code for the pitch detector is in need of some cleanup and is not yet publicly available as of this writing, but should be soon.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="vocal-pitch-contour-phenomena"&gt;
&lt;h2&gt;Vocal pitch contour phenomena&lt;/h2&gt;
&lt;p&gt;I’m sure the above was a bit dry for most readers, but now that we’re armed with an accurate pitch detector, we can study the following phenomena:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Drift: low frequency noise from 0 to 6 Hz &lt;a class="citation-reference" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#cook1996" id="citation-reference-9" role="doc-biblioref"&gt;[Cook1996]&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Jitter: high frequency noise from 6 to 12 Hz.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Vibrato: deliberate sinusoidal pitch variation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Portamento: lagging effect when changing notes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Overshoot: when moving from one pitch to another, the singer may extend beyond the target pitch and slide back into it &lt;a class="citation-reference" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#lai2009" id="citation-reference-10" role="doc-biblioref"&gt;[Lai2009]&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Preparation: when moving from one pitch to another, the singer may first move away from the target pitch before approaching it.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There is useful literature on most of these six phenomena, but I also wanted to gather my own data and do a little replication work. I had a gracious volunteer sing a number of melodies consisting of one or two notes, with and without vibrato, and I ran them through my pitch detector to determine the pitch contours.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Drift and jitter:&lt;/strong&gt; In his study, Cook reported drift of roughly -50 dB and jitter at about -60 to -70 dB. Drift has a roughly flat spectrum and jitter has a sloping spectrum of around -8.5 dB per octave. My data is broadly consistent with these figures, as can be seen in the below spectra.&lt;/p&gt;
&lt;img alt='A diagram labeled "Measured drift and jitter, frequency domain" of four magnitude spectra with frequencies from 0 to 12 Hz. The specta are quite complicated but a general downward slope is visible.' class="align-center" src="https://nathan.ho.name/images/pitch/measured_drift_and_jitter_frequency_domain.png"&gt;
&lt;p&gt;Drift and jitter are modeled as &lt;span class="math"&gt;\(f \cdot (1 + x)\)&lt;/span&gt; where &lt;span class="math"&gt;\(f\)&lt;/span&gt; is the static base frequency and &lt;span class="math"&gt;\(x\)&lt;/span&gt; is the deviation signal. The ratio &lt;span class="math"&gt;\(x / f\)&lt;/span&gt; is treated as an amplitude and converted to decibels, and this is what is meant by drift and jitter having a decibel value.&lt;/p&gt;
&lt;p&gt;Cook also notes that drift and jitter also exhibit a small peak around the natural vibrato frequency, here around 3.5 Hz. Curiously, I don’t see any such peak in my data.&lt;/p&gt;
&lt;p&gt;Synthesis can be done with interpolated value noise for drift and “clipped brown noise” for jitter, added together. Interpolated value noise is downsampled white noise with sine wave segment interpolation. Clipped brown noise is defined as a random walk that can’t exceed the range [-1, +1].&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Vibrato&lt;/strong&gt; is, not surprisingly, a sine wave LFO. However, a perfect sine wave sounds pretty unrealistic. Based on visual inspection of vibrato data, I multiplied the sine wave by random amplitude modulation with interpolated value noise. The frequency of the interpolated value noise is the same as the vibrato frequency.&lt;/p&gt;
&lt;img alt='A plot labeled "Measured vibrato, time domain." The X-axis is a time span of four seconds, and the Y-axis is frequency. The vibrato is quite smooth, with a peak-to-peak amplitude of about 10 Hz and a frequency of about 4 Hz, but the peaks and troughs vary.' class="align-center" src="https://nathan.ho.name/images/pitch/measured_vibrato_time_domain.png"&gt;
&lt;img alt='A plot labeled "Synthetic vibrato, time domain." Again, the X-axis is a time span of four seconds, and the Y-axis is frequency. It looks similar to the measured plot but a bit smoother.' class="align-center" src="https://nathan.ho.name/images/pitch/synthetic_vibrato_time_domain.png"&gt;
&lt;p&gt;Also note that vibrato takes a moment to kick in, which is simple enough to emulate with a little envelope at the beginning of each note.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Portamento, overshoot, and preparation&lt;/strong&gt; I couldn’t find much research on, so I sought to collect a good amount of data on them. I asked the singer to perform two-note melodies consisting of ascending and descending m2, m3, P4, P5, and P8, each four times, with instructions to use “natural portamento.” I then ran all the results through the pitch tracker and visually measured rough averages of preparation time, preparation amount, portamento time, overshoot time, and overshoot amount. Here’s the table of my results.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th class="head"&gt;&lt;p&gt;Interval&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Prep. time&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Prep. amount&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Port. time&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Over. time&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Over. amount&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;m3 ascending&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.1&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.7&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.15&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.2&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.5&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;m3 descending&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="2"&gt;&lt;p&gt;no preparation&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.1&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.3&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;P4 ascending&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="2"&gt;&lt;p&gt;no preparation&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.1&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.3&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.5&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;P4 descending&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="2"&gt;&lt;p&gt;no preparation&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.2&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.2&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;P5 ascending&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.1&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.5&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.2&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="2"&gt;&lt;p&gt;no overshoot&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;P5 descending&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="2"&gt;&lt;p&gt;no preparation&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.2&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.1&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;P8 ascending&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.1&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.25&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="2"&gt;&lt;p&gt;no overshoot&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;P8 descending&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="2"&gt;&lt;p&gt;no preparation&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.15&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;0.1&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;1.5&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;As one might expect, portamento time gently increases as the interval gets larger. There is no preparation for downward intervals, and spotty overshoot for upward intervals, both of which make some sense physiologically – you’re much more likely to involuntarily relax in pitch rather than tense up. Overshoot and preparation amounts have a slight upward trend with interval size. The overshoot time seems to have a downward trend, but overshoot measurement is pretty unreliable.&lt;/p&gt;
&lt;p&gt;Worth noting is the actual shape of overshoot and preparation.&lt;/p&gt;
&lt;img alt="A plot titled &amp;quot;Portamento, ascending m3&amp;quot; showing four measured pitch signals. The X-axis is time and the Y-axis is frequency measured in MIDI note, which starts at about 52 and jumps to about 55. Preparation is only clearly visible for two of the signals, but it's about 70 cents." class="align-center" src="https://nathan.ho.name/images/pitch/portamento_ascending_m3.png"&gt;
&lt;img alt='A plot titled "Portamento, descending m3" showing four measured pitch signals. The X-axis is time and the Y-axis is frequency measured in MIDI note, which starts at about 55 and jumps to about 52. Overshoot is clearly visible for three of the four signals, nearly a semitone and a half.' class="align-center" src="https://nathan.ho.name/images/pitch/portamento_descending_m3.png"&gt;
&lt;p&gt;In OddVoices, I model these three pitch phenomena by using quarter-sine-wave segments, and assuming no overshoot when ascending and no preparation when descending.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="further-updates"&gt;
&lt;h2&gt;Further updates&lt;/h2&gt;
&lt;p&gt;Pitch detection and pitch contours consumed most of my time and energy recently, but there are a few other updates too.&lt;/p&gt;
&lt;p&gt;As mentioned earlier, I registered the domain &lt;a class="reference external" href="https://oddvoices.org/"&gt;oddvoices.org&lt;/a&gt;, which currently hosts a copy of the OddVoices Web interface. The Web interface itself looks a little bland – I’d even say unprofessional – so I have plans to overhaul it especially as new parameters are on the way.&lt;/p&gt;
&lt;p&gt;The &lt;a class="reference external" href="https://gitlab.com/oddvoices/oddvoices/-/blob/develop/README.md"&gt;README&lt;/a&gt; has been heavily updated, taking inspiration from the article &lt;a class="reference external" href="https://github.com/hackergrrl/art-of-readme"&gt;“Art of README”&lt;/a&gt;. I tried to keep it concise and prioritize information that a casual reader would want to know.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;div role="list" class="citation-list"&gt;
&lt;div class="citation" id="noll1966" role="doc-biblioentry"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#citation-reference-1"&gt;Noll1966&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Noll, A. Michael. 1966. “Cepstrum Pitch Determination.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="citation" id="rabiner1977" role="doc-biblioentry"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#citation-reference-2"&gt;Rabiner1977&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Rabiner, L. 1977. “On the Use of Autocorrelation Analysis for Pitch Detection.”&lt;/p&gt;
&lt;/div&gt;
&lt;div class="citation" id="dziubinski2004" role="doc-biblioentry"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;Dziubinski2004&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#citation-reference-3"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#citation-reference-8"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;Diubinski, M. and Kostek, B. 2004. “High Accuracy and Octave Error Immune Pitch Detection Algorithms.”&lt;/p&gt;
&lt;/div&gt;
&lt;div class="citation" id="mcleod2008" role="doc-biblioentry"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#citation-reference-4"&gt;McLeod2008&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;McLeod, Philip. 2008. “Fast, Accurate Pitch Detection Tools for Music Analysis.”&lt;/p&gt;
&lt;/div&gt;
&lt;div class="citation" id="kumaraswamy2015" role="doc-biblioentry"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#citation-reference-5"&gt;Kumaraswamy2015&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Kumaraswamy, B. and Poonacha, P. G. 2015. “Improved Pitch Detection Using Fourier Approximation Method.”&lt;/p&gt;
&lt;/div&gt;
&lt;div class="citation" id="cook1996" role="doc-biblioentry"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#citation-reference-9"&gt;Cook1996&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Cook, P. R. 1996. “Identification of Control Parameters in an Articulatory Vocal Tract Model with Applications to the Synthesis of Singing.”&lt;/p&gt;
&lt;/div&gt;
&lt;div class="citation" id="lai2009" role="doc-biblioentry"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#citation-reference-10"&gt;Lai2009&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Lai, Wen-Hsing. 2009. “An F0 Contour Fitting Model for Singing Synthesis.”&lt;/p&gt;
&lt;/div&gt;
&lt;div class="citation" id="lee2006" role="doc-biblioentry"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#citation-reference-6"&gt;Lee2006&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Lee, Kyogu. 2006. “Automatic Chord Recognition from Audio Using Enhanced Pitch Class Profile.”&lt;/p&gt;
&lt;/div&gt;
&lt;div class="citation" id="mcgonegal1977" role="doc-biblioentry"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="https://nathan.ho.name/posts/oddvoices-dev-log-3/#citation-reference-7"&gt;McGonegal1977&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;McGonegal, Carol A. et al. 1977. “A Subjective Evaluation of Pitch Detection Methods Using LPC Synthesized Speech.”&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;</description><category>dsp</category><category>oddvoices</category><category>projects</category><category>vocal</category><guid>https://nathan.ho.name/posts/oddvoices-dev-log-3/</guid><pubDate>Mon, 07 Feb 2022 22:56:37 GMT</pubDate></item><item><title>OddVoices Dev Log 2: Phase and Volume</title><link>https://nathan.ho.name/posts/oddvoices-dev-log-2/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;p&gt;This is the second in an ongoing series of dev updates about &lt;a class="reference external" href="https://gitlab.com/oddvoices/oddvoices/"&gt;OddVoices&lt;/a&gt;, a singing synthesizer I’ve been developing over the past year. Since we last checked in, I’ve released &lt;a class="reference external" href="https://gitlab.com/oddvoices/oddvoices/-/releases"&gt;version 0.0.1&lt;/a&gt;. Here are some of the major changes.&lt;/p&gt;
&lt;section id="new-voice"&gt;
&lt;h2&gt;New voice!&lt;/h2&gt;
&lt;p&gt;Exciting news: OddVoices now has a third voice. To recap, we’ve had Quake Chesnokov, a powerful and dark basso profondo, and Cicada Lumen, a bright and almost synth-like baritone. The newest voice joining us is Air Navier (nav-YEH), a soft, breathy alto. Air Navier makes a lovely contrast to the two more classical voices, and I’m imagining it will fit in great in a pop or indie rock track.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="goodbye-github"&gt;
&lt;h2&gt;Goodbye GitHub&lt;/h2&gt;
&lt;p&gt;OddVoices makes copious use of Git LFS to store original recordings for voices, and this caused some problems for me this past week. GitHub’s free tier caps the amount of Git LFS storage and the monthly download bandwidth to 1 gigabyte. It is possible to pay $5 to add 50 GB to both storage and bandwidth limits. These purchases are “data packs” and are orthogonal to GitHub Pro.&lt;/p&gt;
&lt;p&gt;What’s unfortunate is that all downloads by anyone (including those on forks) contribute to the monthly download bandwidth, and even worse, downloads from GitHub Actions do also. I am easily running CI dozens of times per week, and multiplied by the gigabyte or so of audio data, the plan is easily maxed out.&lt;/p&gt;
&lt;p&gt;A free GitLab account has a much more workable storage limit of 10 GB, and claims unlimited bandwidth for now. GitLab it is. Consider this a word of warning for anyone making serious use of Git LFS together with GitHub, and especially GitHub Actions.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="goodbye-mbr-psola"&gt;
&lt;h2&gt;Goodbye MBR-PSOLA&lt;/h2&gt;
&lt;p&gt;OddVoices, taking after speech synthesizers of the 90’s, is based on concatenation of recorded segments. These segments are processed using PSOLA, which turns them into a sequence of frames (grains), each for one pitch period. PSOLA then allows manipulation of the segment in pitch, time, and formants, and sounds pretty clean. The synthesis component is also computationally efficient.&lt;/p&gt;
&lt;p&gt;One challenge with a concatenative synthesizer is making the segments blend together nicely. We are using a crossfade, but a problem arises – if the phases of the overlapping frames don’t approximately match, then unnatural croaks and “doubling” artifacts happen.&lt;/p&gt;
&lt;p&gt;There is a way to solve this: manually. If one lines up the locations of the frames so they are centered on the exact times when the vocal folds close (the so-called “glottal closure instant” or GCI), the phases will match. Since it’s difficult to find the GCI from a microphone signal, an electroglottograph (EGG) setup is typically used. I don’t have an EGG on hand, and I’m working remotely with singers, so this solution has to be ruled out.&lt;/p&gt;
&lt;p&gt;A less daunting solution is to use FFT processing to make all phases zero, or set every frame to minimum phase. These solve the phase mismatch problem but sound overtly robotic and buzzy. (Forrest Mozer’s TSI S14001A speech synthesis IC, memorialized in chipspeech’s &lt;a class="reference external" href="https://www.youtube.com/watch?v=-SkiekH5oRk"&gt;Otto Mozer&lt;/a&gt;, uses the zero phase method – see &lt;a class="reference external" href="https://patents.google.com/patent/US4214125A/"&gt;US4214125A&lt;/a&gt;.) MBR-PSOLA softens the blows of these methods by using a random set of phases that are fixed throughout the voice database. Dutoit recommends only randomizing the lower end of the spectrum while leaving the highs untouched. It sounds pretty good, but there is still an unnatural hollow and phasey quality to it.&lt;/p&gt;
&lt;p&gt;I decided to search around the literature and see if there’s any way OddVoices can improve on MBR-PSOLA. I found &lt;a class="citation-reference" href="https://nathan.ho.name/posts/oddvoices-dev-log-2/#stylianou2001" id="citation-reference-1" role="doc-biblioref"&gt;[Stylianou2001]&lt;/a&gt;, which seems to fit the bill. It recommends computing the “center” of a grain, then offsetting the frame so it is centered on that point. The center is not the exact same as the GCI, but it acts as a useful stand-in. When all grains are aligned on their centers, their phases should be roughly matched too – and all this happens without modifying the timbre of the voice, since all we’re doing is a time offset.&lt;/p&gt;
&lt;p&gt;I tried this on the Cicada voice, and it worked! I didn’t conduct any formal listening experiment, but it definitely sounded clearer and lacking the weird hollowness of the MBROLA voice. Then I tried it on the Quake voice, and it sounded extremely creaky and hoarse. This is the result of instabilities in the algorithm, producing random timing offsets for each grain.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="frame-adjustment"&gt;
&lt;h2&gt;Frame adjustment&lt;/h2&gt;
&lt;p&gt;Let &lt;span class="math"&gt;\(x[t]\)&lt;/span&gt; be a sampled quasiperiodic voice signal with period &lt;span class="math"&gt;\(T\)&lt;/span&gt;, with a sample rate of &lt;span class="math"&gt;\(f_s\)&lt;/span&gt;. We round &lt;span class="math"&gt;\(T\)&lt;/span&gt; to an integer, which works well enough for our application. Let &lt;span class="math"&gt;\(w[t]\)&lt;/span&gt; be a window function (I use a Hann window) of length &lt;span class="math"&gt;\(2T\)&lt;/span&gt;. Brackets are zero-indexed, because we are sensible people here.&lt;/p&gt;
&lt;p&gt;The PSOLA algorithm divides &lt;span class="math"&gt;\(x\)&lt;/span&gt; into a number of frames of length &lt;span class="math"&gt;\(2T\)&lt;/span&gt;, where the &lt;span class="math"&gt;\(n\)&lt;/span&gt;-th frame is given by &lt;span class="math"&gt;\(s_n[t] = w[t] x[t + nT]\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;Stylianou proposes the “differentiated phase spectrum” center, or DPS center, which is computed like so:&lt;/p&gt;
&lt;div class="math"&gt;
\begin{equation*}
\eta = \frac{T}{2\pi} \arg \sum_{i = -T}^{T - 1} s_n^2[t] e^{2 \pi j t / T}
\end{equation*}
&lt;/div&gt;
&lt;p&gt;&lt;span class="math"&gt;\(\eta\)&lt;/span&gt; is here expressed in samples. The DPS center is not the GCI. It’s… something else, and it’s admitted in the paper that it isn’t well defined. However, it is claimed that it will be close enough to the GCI, hopefully by a near-constant offset. To normalize a frame on its DPS center, we recalculate the frame with an offset of &lt;span class="math"&gt;\(\eta\)&lt;/span&gt;: &lt;span class="math"&gt;\(s'_n[t] = w[t] x[t + nT + \text{round}(\eta)]\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;The paper also discusses the center of gravity of a signal as a center close to the GCI. However, the center of gravity is less robust than the DPS center, as it can be shown that the center can be computed from just a single bin in the discrete Fourier transform, whereas the DPS center involves the entire spectrum.&lt;/p&gt;
&lt;p&gt;Here’s where we go beyond the paper. As discussed above, for certain signals &lt;span class="math"&gt;\(\eta\)&lt;/span&gt; can be noisy, and using this algorithm as-is can result in audible jitter in the result. The goal, then, is to find a way to remove noise from &lt;span class="math"&gt;\(\eta\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;After many hours of experimenting with different solutions, I ended up doing a lowpass filter on &lt;span class="math"&gt;\(\eta\)&lt;/span&gt; to remove high-frequency noise. A caveat is that &lt;span class="math"&gt;\(\eta\)&lt;/span&gt; is a circular value that wraps around with period &lt;span class="math"&gt;\(T\)&lt;/span&gt;, and performing a standard lowpass filter will smooth out discontinuities produced by wrapping, which is not what we want. The trick is to use an encoding common in &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Directional_statistics"&gt;circular statistics&lt;/a&gt;, and &lt;a class="reference external" href="https://stats.stackexchange.com/questions/218407/encoding-angle-data-for-neural-network"&gt;especially in machine learning&lt;/a&gt;: convert it to sine and cosine, perform filtering on both signals, and convert it back with atan2. A rectangular FIR filter worked perfectly well for my application.&lt;/p&gt;
&lt;p&gt;Overall the result sounds pretty good. There are still some minor issues with it, but I hope to iron those out in future versions.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="volume-normalization"&gt;
&lt;h2&gt;Volume normalization&lt;/h2&gt;
&lt;p&gt;I encountered two separate but related issues regarding the volume of the voices. The first is that the voices are inconsistent in volume – Cicada was much louder than the other two. The second, and the more serious of the two, is that segments can have different volumes when they are joined, and this results in a “choppy” sound with discontinuities.&lt;/p&gt;
&lt;p&gt;I fixed global volume inconsistency by taking the RMS amplitude of the entire segment database and normalizing it to -20 dBFS. For voices with higher dynamic range, this caused some of the louder consonants to clip, so I added a safety limiter that ensures the peak amplitude of each frame is no greater than -6 dBFS.&lt;/p&gt;
&lt;p&gt;Segment-level volume inconsistency can be addressed by examining diphones that join together and adjusting their amplitudes accordingly. Take the phoneme /k/, and gather a list of all diphones of the form &lt;code class="docutils literal"&gt;k*&lt;/code&gt; and &lt;code class="docutils literal"&gt;*k&lt;/code&gt;. Now inspect the amplitudes at the beginning of &lt;code class="docutils literal"&gt;k*&lt;/code&gt; diphones, and the amplitudes at the end of &lt;code class="docutils literal"&gt;*k&lt;/code&gt; diphones. Take the RMS of all these amplitudes together to form the “phoneme amplitude.” Repeat for all other phonemes. Then, for each diphone, apply a linear amplitude envelope so that the beginning frames match the first phoneme’s amplitude and the ending frames match the second phoneme’s amplitude. The result is that all joined diphones will have a matched amplitude.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The volume normalization problem in particular taught me that developing a practical speech or singing synthesizer requires a lot more work than papers and textbooks might make you think. Rather, the descriptions in the literature are only baselines for a real system.&lt;/p&gt;
&lt;p&gt;More is on the way for OddVoices. I haven’t yet planned out the 0.0.2 release, but my hope is to work on refining the existing voices for intelligibility and naturalness instead of adding new ones.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;div role="list" class="citation-list"&gt;
&lt;div class="citation" id="stylianou2001" role="doc-biblioentry"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="https://nathan.ho.name/posts/oddvoices-dev-log-2/#citation-reference-1"&gt;Stylianou2001&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Stylianou, Yannis. 2001. “Removing Linear Phase Mismatches in Concatenative Speech Synthesis.”&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;</description><category>dsp</category><category>oddvoices</category><category>projects</category><category>vocal</category><guid>https://nathan.ho.name/posts/oddvoices-dev-log-2/</guid><pubDate>Sun, 23 Jan 2022 18:25:00 GMT</pubDate></item><item><title>OddVoices Dev Log 1: Hello World!</title><link>https://nathan.ho.name/posts/oddvoices-dev-log-1/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;p&gt;The free and open source singing synthesizer landscape has a few projects worth checking out, such as &lt;a class="reference external" href="http://sinsy.sourceforge.net/"&gt;Sinsy&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/divVerent/ecantorix"&gt;eCantorix&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/usdivad/mesing"&gt;meSing&lt;/a&gt;, and &lt;a class="reference external" href="https://github.com/numediart/mage"&gt;MAGE&lt;/a&gt;. While each one has its own unique voice and there’s no such thing as a bad speech or singing synthesizer, I looked into all of them and more and couldn’t find a satisfactory one for my musical needs.&lt;/p&gt;
&lt;p&gt;So, I’m happy to announce &lt;a class="reference external" href="https://github.com/oddvoices/oddvoices"&gt;OddVoices&lt;/a&gt;, my own free and open source singing synthesizer based on diphone concatenation. It comes with two English voices, with more on the way. If you’re not some kind of nerd who uses the command line, check out &lt;a class="reference external" href="https://nathan.ho.name/oddvoices-web/"&gt;OddVoices Web&lt;/a&gt;, a Web interface I built for it with WebAssembly. Just upload a MIDI file and write some lyrics and you’ll have a WAV file in your browser.&lt;/p&gt;
&lt;p&gt;OddVoices is based on MBR-PSOLA, which stands for Multi-Band Resynthesis Pitch Synchronous Overlap Add. PSOLA is a granular synthesis-based algorithm for playback of monophonic sounds such that the time, formant, and pitch axes can be manipulated independently. The MBR part is a slight modification to PSOLA that prevents unwanted phase cancellation when crossfading between concatenated samples, and solves other problems too. For more detail, check out papers from the MBROLA project. The &lt;a class="reference external" href="https://github.com/numediart/MBROLA"&gt;MBROLA&lt;/a&gt; codebase itself has some tech and licensing issues I won’t get into, but the algorithm is perfect for what I want in a singing synth. Note that OddVoices doesn’t interface with MBROLA.&lt;/p&gt;
&lt;p&gt;I’ll use this post to discuss some of the more interesting challenges I had to work on in the course of the project so far. This is the first in a series of posts I will be making about the technical side of OddVoices.&lt;/p&gt;
&lt;section id="vowel-mergers"&gt;
&lt;h2&gt;Vowel mergers&lt;/h2&gt;
&lt;p&gt;OddVoices currently only supports General American English (GA), or more specifically the varieties of English that I and the singers speak. I hope in the future that I can correct this bias by including other languages and other dialects of English.&lt;/p&gt;
&lt;p&gt;When assembling the list of phonemes, the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Cot%E2%80%93caught_merger"&gt;cot-caught merger&lt;/a&gt; immediately came up. I decided to merge them, and make /O/ and /A/ aliases except for /Or/ and /Ar/ (here using X-SAMPA). To reduce the number of phonemes and therefore phoneme combinations, I represent /Or/ internally as /oUr/.&lt;/p&gt;
&lt;p&gt;A more interesting merger concerns the problem of the schwa. In English, the schwa is used to represent an unstressed syllable, but the actual phonetics of that syllable can vary wildly. In singing, a syllable that would be unstressed in spoken English can be drawn out for multiple seconds and become stressed. The schwa isn’t actually sung in these cases, and is replaced with another phoneme. As one of the singers put it, “the schwa is a big lie.”&lt;/p&gt;
&lt;p&gt;This matters when working with the &lt;a class="reference external" href="http://www.speech.cs.cmu.edu/cgi-bin/cmudict/"&gt;CMU Pronouncing Dictionary&lt;/a&gt;, which I’m using for pronouncing text. Take a word like “imitate” – the second syllable is unstressed, and the CMUDict transcribes it as a schwa. But when sung, it’s more like /I/. This is simply a limitation of the CMUDict that I don’t have a good solution for. In the end I merge /@/ with /V/, since the two are closely related in GA. Similarly, /3`/ and /@`/ are merged, and the CMUDict doesn’t even distinguish those.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="real-time-vs-semi-real-time-operation"&gt;
&lt;h2&gt;Real-time vs. semi-real-time operation&lt;/h2&gt;
&lt;p&gt;A special advantage of OddVoices over alternative offerings is that it’s built from scratch to work in real time. That means that it can become a UGen for platforms like SuperCollider and Pure Data, or even a VST plugin in the far future. I have a SuperCollider UGen in the works, but there’s some tricky engineering work involving communication between RT and NRT threads that I haven’t tackled yet. Stay tuned.&lt;/p&gt;
&lt;p&gt;There is a huge caveat to real time operation: singers don’t operate in perfect real time! To see why, imagine the lyrics “rice cake,” sung with two half notes. The final /s/ in “rice” has to happen before the initial /k/ in “cake,” and the latter happens right on the third beat, so the singer has to anticipate the third beat with the consonant /s/. But in MIDI and real-time keyboard playing, there is no way to predict when the note off will happen until the third beat has already arrived.&lt;/p&gt;
&lt;p&gt;VOCALOID handles this by being its own DAW with a built-in sequencer, so it can look ahead as much as it needs. &lt;a class="reference external" href="https://plogue.com/products/chipspeech.html"&gt;chipspeech&lt;/a&gt; and &lt;a class="reference external" href="https://plogue.com/products/alter-ego.html"&gt;Alter/Ego&lt;/a&gt; work in real time. In their user guides, they ask the user to shorten every MIDI note to around 50%-75% of its length to accommodate final consonant clusters. If this is not done, a phenomenon I call “lyric drift” happens and the lyrics misalign from the notes.&lt;/p&gt;
&lt;p&gt;OddVoices supports two possible modes: true real-time mode and semi-real-time mode. In true real-time mode, we don’t know the durations of notes, so we trigger the final consonant cluster on a note off. Like chipspeech and Alter/Ego, this requires manual shortening of notes to prevent lyric drift. Alternatively, OddVoices supports a semi-real-time mode where every note on is accompanied by the duration of the note. This way OddVoices can predict the timing of the final consonant cluster, but still operate in otherwise real-time.&lt;/p&gt;
&lt;p&gt;Semi-real-time mode is used in OddVoices’ MIDI frontend, and can also be used in powerful sequencing environments like SC and Pd by sending a “note length” signal along with the note on trigger. I think it’s a nice compromise between the constraints of real-time and the omniscience of non-real-time.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="syllable-compression"&gt;
&lt;h2&gt;Syllable compression&lt;/h2&gt;
&lt;p&gt;After I implemented semi-real-time mode, another problem remained that reared its head in fast singing passages. This happens when, say, the lyric “rice cake” is sung very quickly, and the diphones &lt;code class="docutils literal"&gt;_r raI aIs&lt;/code&gt; (here using X-SAMPA notation), when concatenated, will be longer than the note length. The result is more lyric drift – the notes and the lyrics diverge.&lt;/p&gt;
&lt;p&gt;The fix for this was to peek ahead in the diphone queue and find the end of the final consonant cluster, then add up all the segment lengths from the beginning to that point. This is how long the entire syllable would last. This is then compared to the note length, and if it is longer, the playback speed is increased for that syllable to compensate. In short, consonants have to be spoken quickly in order to fit in quickly sung passages.&lt;/p&gt;
&lt;p&gt;The result is still not entirely satisfactory to my ears, and I plan to improve it in future versions of the software. Syllable compression is of course only available in semi-real-time mode.&lt;/p&gt;
&lt;p&gt;Syllable compression is evidence that fast singing is phonetically quite different from slow singing, and perhaps more comparable to speech.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="stray-thoughts"&gt;
&lt;h2&gt;Stray thoughts&lt;/h2&gt;
&lt;p&gt;This is my second time using Emscripten and WebAssembly in a project, and I find it an overall pleasant technology to work with (especially with embind for C++ bindings). I did run into an obstacle, however, which was that I couldn’t figure out how to compile libsndfile to WASM. The only feature I needed was writing a 16-bit mono WAV file, so I dropped libsndfile and wrote my own code for that.&lt;/p&gt;
&lt;p&gt;I was surprised by the compactness of this project so far. The real-time C++ code adds up to 1,400 lines, and the Python offline analysis code only 600.&lt;/p&gt;
&lt;/section&gt;</description><category>dsp</category><category>oddvoices</category><category>projects</category><category>vocal</category><guid>https://nathan.ho.name/posts/oddvoices-dev-log-1/</guid><pubDate>Wed, 12 Jan 2022 02:00:09 GMT</pubDate></item><item><title>Hearing Graphs</title><link>https://nathan.ho.name/posts/hearing-graphs/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;p&gt;My latest project is titled &lt;em&gt;Hearing Graphs&lt;/em&gt;, and you can access it by clicking on the image below.&lt;/p&gt;
&lt;a class="reference external image-reference" href="https://nathan.ho.name/hearing-graphs/"&gt;
&lt;img alt="A screenshot of the project, with an undirected graph to the left and music notation to the right." class="align-center" src="https://nathan.ho.name/images/hearing_graphs.png"&gt;
&lt;/a&gt;
&lt;p&gt;&lt;em&gt;Hearing Graphs&lt;/em&gt; is a sonification of graphs using the &lt;em&gt;graph spectrum&lt;/em&gt; – the eigenvalues of the graph’s adjacency matrix. Both negative and positive eigenvalues are represented by piano samples, and zero eigenvalues are interpreted with a bass drum. The multiplicity of each eigenvalue is represented by hitting notes multiple times.&lt;/p&gt;
&lt;p&gt;Most sonifications establish some audible relationship between the source material and the resulting audio. This one remains mostly incomprehensible, especially to a general audience, so I consider it a failed experiment in that regard. Still, it was fun to make.&lt;/p&gt;</description><category>mathematics</category><category>projects</category><category>sonification</category><guid>https://nathan.ho.name/posts/hearing-graphs/</guid><pubDate>Tue, 21 Sep 2021 00:38:19 GMT</pubDate></item><item><title>Announcing Canvas</title><link>https://nathan.ho.name/posts/canvas/</link><dc:creator>Nathan Ho</dc:creator><description>&lt;p&gt;&lt;a class="reference external" href="https://github.com/nhthn/canvas"&gt;Canvas&lt;/a&gt; (working title) is a visual additive synthesizer for Windows, macOS, and Linux where you can create sound by drawing an image. This is accomplished with a bank of 239 oscillators spaced at quarter tones, with stereo amplitudes mapped to the red and blue channels of the image. You can import images and sonify them, you can import sounds and convert their spectrograms to images, and you can apply several image filters like reverb, tremolo, and chorus. Check out the demo:&lt;/p&gt;
&lt;p&gt;Canvas is directly inspired by the Image Synth from U&amp;amp;I Software’s &lt;a class="reference external" href="https://uisoftware.com/metasynth/"&gt;MetaSynth&lt;/a&gt;. When I first heard of MetaSynth, I wrote it off as a gimmick that could only produce sweeps and whooshes. It wasn’t until I heard &lt;a class="reference external" href="https://www.youtube.com/watch?v=4-GDOIOAuU4"&gt;Benn Jordan’s recent demo&lt;/a&gt; and learned of its powerful set of image filters that I was immediately sold on the approach. I decided to build an alternative that’s just featureful enough to yield musical results, a sort of MS Paint to MetaSynth’s Photoshop.&lt;/p&gt;
&lt;p&gt;Canvas has a lot of rough edges, and currently requires building from scratch if using Linux or macOS. Nevertheless, I hope it is of some interest to the music tech community.&lt;/p&gt;</description><category>dsp</category><category>projects</category><category>synthesis</category><guid>https://nathan.ho.name/posts/canvas/</guid><pubDate>Wed, 15 Sep 2021 01:26:30 GMT</pubDate></item></channel></rss>