Signal Processing


Timbral Shepard-illusion reveals perceptual ambiguity and context sensitivity of brightness perception

Kai Siedenburg

BACKGROUND. Recent research has described rapid and long-lasting effects of prior context on the perception of ambiguous pitch shifts of Shepard tones  (Chambers et al., 2017, Nat. Commun.). Here, very similar effects are demonstrated for brightness shift judgments of harmonic complexes with partially cyclic spectral envelopes and fixed fundamental frequency. It is shown that frequency shifts of the cyclic envelopes are perceived as up- or downward shifts of brightness. Completely analogous to Chambers et al., the perceptual ambiguity of half-octave shifts resolves with the presentation of prior context tones. These results constitute a context effect for the perceptual processing of spectral envelope shifts and indicate so-far unknown commonalities of pitch and timbre.

Sound examples

[Please make sure to listen at a sufficient level using headphones with good low-frequency response. Otherwise the sounds might not be properly transmitted.]

Single pairs of tones were used in Exp. 1, and the quasi-cyclic envelopes were shifted by fractions of an octave:


In Exp. 2, target pairs were preceded by a context sequence that induced an upward or downward bias for the target shift (which always was a half-octave step): 


If stepwise shifts (here 1/6ths of an octave, i.e., 2 semitones) follow one another, this gives rise to sort of a timbral Shepard illusion, because  brightness appears to ascend or descend continuously: 

Iterative structured shrinkage algorithms applied to stationary/transient separation

Kai Siedenburg and Simon Doclo

BACKGROUND. We present novel strategies for stationary/transient signal separation in audio signals in order to exploit the basic observation that stationary components are sparse in frequency and persistent over time whereas transients are sparse in time and persistent across frequency. We utilize a multi-resolution STFT approach which allows to define structured shrinkage operators to tune into the characteristic spectrotemporal shapes of the stationary and transient signal layers. Structure is incorporated by considering the energy of time-frequency neighbourhoods or modulation spectrum regions instead of individual STFT coefficients, and shrinkage operators are employed in a dual-layered Iterated Shrinkage/Thresholding Algorithm (ISTA) framework. We further propose a novel iterative scheme, Iterative Cross-Shrinkage (ICS). 

This page presents a few representative audio examples for the tested algorithms. 

Example using recorded tones from acoustic instruments

The first example considers the case of isolated instrument tones. Note that only the ICS dyn MOD is able to properly extract the transient layer. In the other algorithms, the transient layer becomes falsely represented as part of the stationary layer. 


A second example concerns a musical mixture in the form of an excerpt from a Jazz-trio recording. Here only the ICS DYN MOD is presented. 


Iteration scheme. ICS: Iterative Cross-Shrinkage; ISTA: Iterative Shrinkage/Thresholding Algorithm

Update of thresholds. DYN: dynamic update; QUANT: quantile-based update. 

Inter-coefficient structure. MOD: modulation-based shrinkage; IND: independent handling of coefficients (i.e., no structure).