Utterance Clustering Using Stereo Audio Channels

<div>Visualization of audio signal processing for each speaker. The same color box represents the waveforms from the same speech segment. (a) A stereo waveform of a speaker’s speaking audio, (b) stereo waveforms in 0.5 second, (c) mono waveforms of extracted left- and right-channel audio signal for every 0.5 seconds, and (d) the processed waveforms for every 0.5 seconds.</div>

Computational Intelligence and Neuroscience

fig1

Figure 1

Figure 1: Utterance Clustering Using Stereo Audio Channels