EURASIP Journal on Applied Signal Processing
Volume 2003 (2003), Issue 7, Pages 659-667
doi:10.1155/S1110865703303051
Sparse Spectrotemporal Coding of Sounds
1Institute of Neuroinformatics, University of Zurich and ETH Zurich, Winterthurerstrasse 190, Zurich CH-8057, Switzerland
2Institute of Neurology, University College London, Queen square, London WC1N 3BG, UK
Received 1 May 2002; Revised 28 January 2003
Copyright © 2003 David J. Klein et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Recent studies of biological auditory processing have revealed that sophisticated spectrotemporal analyses are performed by central auditory systems of various animals. The analysis is typically well matched with the statistics of relevant natural sounds, suggesting that it produces an optimal representation of the animal's acoustic biotope. We address this topic using simulated neurons that learn an optimal representation of a speech corpus. As input, the neurons receive
a spectrographic representation of sound produced by a peripheral
auditory model. The output representation is deemed optimal when
the responses of the neurons are maximally sparse. Following
optimization, the simulated neurons are similar to real neurons
in many respects. Most notably, a given neuron only analyzes the
input over a localized region of time and frequency. In addition,
multiple subregions either excite or inhibit the neuron, together
producing selectivity to spectral and temporal modulation
patterns. This suggests that the brain's solution is
particularly well suited for coding natural sound; therefore, it
may prove useful in the design of new computational methods for
processing speech.