TL;DR

A writer experimented with applying common raster-image operations to audio data and vice versa, revealing how similar-looking manipulations produce very different perceptual artifacts across the two domains. The piece walks through examples — pixelation-style downsampling, bit-depth quantization, delay/comb filters, and FFT-based spectral editing — and explains why techniques like Hann-window overlap are needed to avoid audible glitches.

What happened

The author treated audio streams like raster images and ran a variety of image-processing routines on sound files, then ran audio effects on pictures to compare outcomes. Simple bucketed averaging (an image "pixelation" analogue) produced a staircase waveform that introduces strong, metallic overtones when played back; a subsequent rolling-average (blurring) reduces those artifacts. Reducing bit depth at original sample rate produced broadband hiss rather than discrete squeals, demonstrating that quantization errors manifest differently than sample-rate loss. The writer also applied delayed, attenuated copies of data to images (producing blurs and double-exposure looks) and described how the same approach in audio produces reverb, flanging and chorus effects. Finally, the post explains short-window FFT analysis (20–100 ms), the use of Hann windows with 50% overlap to avoid clicks, and simple spectral pitch-shifting experiments, with source code linked for the implementations.

Why it matters

  • Directly porting image techniques to audio often creates perceptually harsh artifacts because human hearing responds to waveform discontinuities differently than vision does.
  • Quantization and downsampling affect audio fidelity faster and in different ways than they affect images, so audio needs different filters and DAC-stage smoothing.
  • Spectral editing requires careful windowing and overlap to prevent audible stitching artifacts; the Hann window plus overlapping is a practical solution.
  • Basic spectral tools can be used for pitch shifting and creative effects, but production-grade vocal correction typically relies on more sophisticated approaches.

Key facts

  • Bucketed averaging of contiguous samples (audio "pixelation") creates stairstep waveforms that add high-frequency overtones.
  • Applying a rolling-average (temporal blur) to the stairstep waveform reduces the metallic artifacts introduced by naive downsampling.
  • Reducing bit depth at full sample rate tends to produce hiss, since quantization errors occur at higher frequencies on average.
  • Digital-to-analog converters and sound cards normally include lowpass filtering to mask some quantization artifacts; injected errors larger than that filter tolerance remain audible.
  • Adding a delayed, scaled copy of a signal produces familiar audio effects (small delays → room presence, larger delays → echo; phase relationships yield flanger/phaser/chorus).
  • FFT-based spectral analysis requires chopping audio into short windows (roughly 20–100 ms) for useful frequency-domain snapshots.
  • Using Hann-shaped windows and overlapping adjacent windows by half a cycle prevents discontinuities when reconstructing the time-domain signal.
  • The author provided C code examples for an in-place FFT and for simple spectral pitch shifting; sample visuals included a dog photo (Skye) and a cover of "It Must Have Been Love" by Effie Passero.
  • Approximating spectral transforms on images produced visible jitter or blur depending on how frequency buckets were shifted.

What to watch next

  • Source code for the FFT and the pitch-shifting example is available from the author for people who want to reproduce the experiments.
  • How different overlap/window choices (window length, 50% vs other overlaps) affect audible stitching in spectral edits is a practical next step.
  • not confirmed in the source

Quick glossary

  • Downsampling: Reducing the number of samples in a signal, often by grouping samples and replacing them with a representative value.
  • Quantization / Bit depth: The process of mapping continuous amplitude values to a set of discrete levels; fewer bits mean fewer possible amplitudes and more quantization noise.
  • Fast Fourier Transform (FFT): An algorithm that converts a finite segment of a time-domain signal into its frequency-domain representation.
  • Hann window: A tapering function applied to each analysis window that reduces edge discontinuities and, when combined with overlapping windows, helps avoid reconstruction artifacts.
  • Spectrogram: A visual representation showing how the frequency content of a signal changes over time, typically built from successive FFT windows.

Reader FAQ

Does "pixelating" audio just sound like low sample rate audio?
No. Naive bucket averaging produces stairstep waveforms that add metallic overtones rather than the muffled character usually associated with downsampling.

Is lowering bit depth equivalent to downsampling?
Not exactly. Lowering bit depth at the original sample rate tends to create broadband hiss due to higher-frequency quantization errors, which differs from the effects of reduced sample rate.

Can you edit audio in the frequency domain without introducing clicks?
Yes — using window functions like the Hann window combined with overlapping windows (e.g., 50% overlap) prevents abrupt discontinuities when reconstructing the time-domain signal.

Is the author’s code production-quality pitch shifting?
The post notes the example code is short and easy to experiment with, but also states that high-quality pitch shifting is typically done with more complex methods; for full production quality, more advanced techniques are generally used.

See it with your lying ears This blog has a history of answering questions that no one should be asking. Today, we continue that proud legacy. JAN 10, 2026 7…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *