Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2013 (2013), Article ID 720834, 27 pages
Research Article

Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection

1Department of Computer and Information Science, University of Macau, Macau
2School of Computer Science and Engineering, University of New South Wales, Kensington, NSW 2052, Australia

Received 25 June 2013; Accepted 1 August 2013

Academic Editor: Sabah Mohammed

Copyright © 2013 Simon Fong et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers’ gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC.