Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2015, Article ID 394083, 13 pages
http://dx.doi.org/10.1155/2015/394083
Research Article

Improved Emotion Recognition Using Gaussian Mixture Model and Extreme Learning Machine in Speech and Glottal Signals

1School of Mechatronic Engineering, Universiti Malaysia Perlis (UniMAP), Campus Pauh Putra, 02600 Perlis, Perlis, Malaysia
2Department of Electrical and Electronics Engineering, Faculty of Engineering and Architecture, Abant Izzet Baysal University, 14280 Bolu, Turkey
3Universiti Kuala Lumpur Malaysian Spanish Institute, Kulim Hi-Tech Park, 09000 Kulim, Kedah, Malaysia

Received 25 August 2014; Accepted 29 December 2014

Academic Editor: Gen Qi Xu

Copyright © 2015 Hariharan Muthusamy et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. S. G. Koolagudi and K. S. Rao, “Emotion recognition from speech: a review,” International Journal of Speech Technology, vol. 15, no. 2, pp. 99–117, 2012. View at Publisher · View at Google Scholar · View at Scopus
  2. R. Corive, E. Douglas-Cowie, N. Tsapatsoulis et al., “Emotion recognition in human-computer interaction,” IEEE Signal Processing Magazine, vol. 18, no. 1, pp. 32–80, 2001. View at Publisher · View at Google Scholar · View at Scopus
  3. D. Ververidis and C. Kotropoulos, “Emotional speech recognition: resources, features, and methods,” Speech Communication, vol. 48, no. 9, pp. 1162–1181, 2006. View at Publisher · View at Google Scholar · View at Scopus
  4. M. El Ayadi, M. S. Kamel, and F. Karray, “Survey on speech emotion recognition: features, classification schemes, and databases,” Pattern Recognition, vol. 44, no. 3, pp. 572–587, 2011. View at Publisher · View at Google Scholar · View at Scopus
  5. D. Y. Wong, J. D. Markel, and A. H. Gray Jr., “Least squares glottal inverse filtering from the acoustic speech waveform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 4, pp. 350–355, 1979. View at Publisher · View at Google Scholar · View at Scopus
  6. D. E. Veeneman and S. L. BeMent, “Automatic glottal inverse filtering from speech and electroglottographic signals,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 369–377, 1985. View at Publisher · View at Google Scholar · View at Scopus
  7. P. Alku, “Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering,” Speech Communication, vol. 11, no. 2-3, pp. 109–118, 1992. View at Publisher · View at Google Scholar · View at Scopus
  8. T. Drugman, B. Bozkurt, and T. Dutoit, “A comparative study of glottal source estimation techniques,” Computer Speech and Language, vol. 26, no. 1, pp. 20–34, 2012. View at Publisher · View at Google Scholar · View at Scopus
  9. P. A. Naylor, A. Kounoudes, J. Gudnason, and M. Brookes, “Estimation of glottal closure instants in voiced speech using the DYPSA algorithm,” IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 1, pp. 34–43, 2007. View at Publisher · View at Google Scholar · View at Scopus
  10. K. E. Cummings and M. A. Clements, “Improvements to and applications of analysis of stressed speech using glottal waveforms,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '92), vol. 2, pp. 25–28, San Francisco, Calif, USA, March 1992. View at Publisher · View at Google Scholar
  11. K. E. Cummings and M. A. Clements, “Application of the analysis of glottal excitation of stressed speech to speaking style modification,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '93), pp. 207–210, April 1993. View at Scopus
  12. K. E. Cummings and M. A. Clements, “Analysis of the glottal excitation of emotionally styled and stressed speech,” Journal of the Acoustical Society of America, vol. 98, no. 1, pp. 88–98, 1995. View at Publisher · View at Google Scholar · View at Scopus
  13. E. Moore II, M. Clements, J. Peifer, and L. Weisser, “Investigating the role of glottal features in classifying clinical depression,” in Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 2849–2852, September 2003. View at Scopus
  14. E. Moore II, M. A. Clements, J. W. Peifer, and L. Weisser, “Critical analysis of the impact of glottal features in the classification of clinical depression in speech,” IEEE Transactions on Biomedical Engineering, vol. 55, no. 1, pp. 96–107, 2008. View at Publisher · View at Google Scholar · View at Scopus
  15. A. Ozdas, R. G. Shiavi, S. E. Silverman, M. K. Silverman, and D. M. Wilkes, “Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk,” IEEE Transactions on Biomedical Engineering, vol. 51, no. 9, pp. 1530–1540, 2004. View at Publisher · View at Google Scholar · View at Scopus
  16. A. I. Iliev and M. S. Scordilis, “Spoken emotion recognition using glottal symmetry,” EURASIP Journal on Advances in Signal Processing, vol. 2011, Article ID 624575, pp. 1–11, 2011. View at Publisher · View at Google Scholar
  17. L. He, M. Lech, J. Zhang, X. Ren, and L. Deng, “Study of wavelet packet energy entropy for emotion classification in speech and glottal signals,” in 5th International Conference on Digital Image Processing (ICDIP '13), vol. 8878 of Proceedings of SPIE, Beijing, China, April 2013. View at Publisher · View at Google Scholar
  18. P. Giannoulis and G. Potamianos, “A hierarchical approach with feature selection for emotion recognition from speech,” in Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC'12), pp. 1203–1206, European Language Resources Association, Istanbul, Turkey, 2012. View at Google Scholar
  19. F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, and B. Weiss, “A database of German emotional speech,” in Proceedings of the 9th European Conference on Speech Communication and Technology, pp. 1517–1520, Lisbon, Portugal, September 2005. View at Scopus
  20. S. Haq, P. J. B. Jackson, and J. Edge, “Audio-visual feature selection and reduction for emotion classification,” in Proceedings of the International Conference on Auditory-Visual Speech Processing (AVSP '08), pp. 185–190, Tangalooma, Australia, 2008.
  21. M. Sedaaghi, “Documentation of the sahand emotional speech database (SES),” Tech. Rep., Department of Electrical Engineering, Sahand University of Technology, Tabriz, Iran, 2008. View at Google Scholar
  22. L. R. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, vol. 14, PTR Prentice Hall, Englewood Cliffs, NJ, USA, 1993.
  23. M. Hariharan, C. Y. Fook, R. Sindhu, B. Ilias, and S. Yaacob, “A comparative study of wavelet families for classification of wrist motions,” Computers & Electrical Engineering, vol. 38, no. 6, pp. 1798–1807, 2012. View at Publisher · View at Google Scholar · View at Scopus
  24. M. Hariharan, K. Polat, R. Sindhu, and S. Yaacob, “A hybrid expert system approach for telemonitoring of vocal fold pathology,” Applied Soft Computing Journal, vol. 13, no. 10, pp. 4148–4161, 2013. View at Publisher · View at Google Scholar · View at Scopus
  25. M. Hariharan, K. Polat, and S. Yaacob, “A new feature constituting approach to detection of vocal fold pathology,” International Journal of Systems Science, vol. 45, no. 8, pp. 1622–1634, 2014. View at Publisher · View at Google Scholar
  26. M. Hariharan, S. Yaacob, and S. A. Awang, “Pathological infant cry analysis using wavelet packet transform and probabilistic neural network,” Expert Systems with Applications, vol. 38, no. 12, pp. 15377–15382, 2011. View at Publisher · View at Google Scholar · View at Scopus
  27. K. S. Rao, S. G. Koolagudi, and R. R. Vempada, “Emotion recognition from speech using global and local prosodic features,” International Journal of Speech Technology, vol. 16, no. 2, pp. 143–160, 2013. View at Publisher · View at Google Scholar · View at Scopus
  28. Y. Li, G. Zhang, and Y. Huang, “Adaptive wavelet packet filter-bank based acoustic feature for speech emotion recognition,” in Proceedings of 2013 Chinese Intelligent Automation Conference: Intelligent Information Processing, vol. 256 of Lecture Notes in Electrical Engineering, pp. 359–366, Springer, Berlin, Germany, 2013. View at Publisher · View at Google Scholar
  29. M. Hariharan, K. Polat, and R. Sindhu, “A new hybrid intelligent system for accurate detection of Parkinson's disease,” Computer Methods and Programs in Biomedicine, vol. 113, no. 3, pp. 904–913, 2014. View at Publisher · View at Google Scholar · View at Scopus
  30. S. R. Krothapalli and S. G. Koolagudi, “Characterization and recognition of emotions from speech using excitation source information,” International Journal of Speech Technology, vol. 16, no. 2, pp. 181–201, 2013. View at Publisher · View at Google Scholar · View at Scopus
  31. R. Farnoosh and B. Zarpak, “Image segmentation using Gaussian mixture model,” IUST International Journal of Engineering Science, vol. 19, pp. 29–32, 2008. View at Google Scholar
  32. Z. Zivkovic, “Improved adaptive Gaussian mixture model for background subtraction,” in Proceedings of the 17th International Conference on Pattern Recognition (ICPR '04), pp. 28–31, August 2004. View at Scopus
  33. A. Blake, C. Rother, M. Brown, P. Perez, and P. Torr, “Interactive image segmentation using an adaptive GMMRF model,” in Computer Vision—ECCV 2004, vol. 3021 of Lecture Notes in Computer Science, pp. 428–441, Springer, Berlin, Germany, 2004. View at Publisher · View at Google Scholar
  34. V. Majidnezhad and I. Kheidorov, “A novel GMM-based feature reduction for vocal fold pathology diagnosis,” Research Journal of Applied Sciences, vol. 5, no. 6, pp. 2245–2254, 2013. View at Google Scholar · View at Scopus
  35. D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker verification using adapted Gaussian mixture models,” Digital Signal Processing: A Review Journal, vol. 10, no. 1, pp. 19–41, 2000. View at Publisher · View at Google Scholar · View at Scopus
  36. A. I. Iliev, M. S. Scordilis, J. P. Papa, and A. X. Falcão, “Spoken emotion recognition through optimum-path forest classification using glottal features,” Computer Speech and Language, vol. 24, no. 3, pp. 445–460, 2010. View at Publisher · View at Google Scholar · View at Scopus
  37. M. H. Siddiqi, R. Ali, M. S. Rana, E.-K. Hong, E. S. Kim, and S. Lee, “Video-based human activity recognition using multilevel wavelet decomposition and stepwise linear discriminant analysis,” Sensors, vol. 14, no. 4, pp. 6370–6392, 2014. View at Publisher · View at Google Scholar · View at Scopus
  38. J. Rong, G. Li, and Y.-P. P. Chen, “Acoustic feature selection for automatic emotion recognition from speech,” Information Processing & Management, vol. 45, no. 3, pp. 315–328, 2009. View at Publisher · View at Google Scholar · View at Scopus
  39. J. Jiang, Z. Wu, M. Xu, J. Jia, and L. Cai, “Comparing feature dimension reduction algorithms for GMM-SVM based speech emotion recognition,” in Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA '13), pp. 1–4, Kaohsiung, Taiwan, October-November 2013. View at Publisher · View at Google Scholar
  40. P. Fewzee and F. Karray, “Dimensionality reduction for emotional speech recognition,” in Proceedings of the International Conference on Privacy, Security, Risk and Trust (PASSAT '12) and International Confernece on Social Computing (SocialCom '12), pp. 532–537, Amsterdam, The Netherlands, September 2012. View at Publisher · View at Google Scholar · View at Scopus
  41. S. Zhang and X. Zhao, “Dimensionality reduction-based spoken emotion recognition,” Multimedia Tools and Applications, vol. 63, no. 3, pp. 615–646, 2013. View at Publisher · View at Google Scholar · View at Scopus
  42. B.-C. Chiou and C.-P. Chen, “Feature space dimension reduction in speech emotion recognition using support vector machine,” in Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA '13), pp. 1–6, Kaohsiung, Taiwan, November 2013. View at Publisher · View at Google Scholar · View at Scopus
  43. S. Zhang, B. Lei, A. Chen, C. Chen, and Y. Chen, “Spoken emotion recognition using local fisher discriminant analysis,” in Proceedings of the IEEE 10th International Conference on Signal Processing (ICSP '10), pp. 538–540, October 2010. View at Publisher · View at Google Scholar · View at Scopus
  44. D. J. Krusienski, E. W. Sellers, D. J. McFarland, T. M. Vaughan, and J. R. Wolpaw, “Toward enhanced P300 speller performance,” Journal of Neuroscience Methods, vol. 167, no. 1, pp. 15–21, 2008. View at Publisher · View at Google Scholar · View at Scopus
  45. G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning machine for regression and multiclass classification,” IEEE Transactions on Systems, Man, and Cybernetics B: Cybernetics, vol. 42, no. 2, pp. 513–529, 2012. View at Publisher · View at Google Scholar · View at Scopus
  46. G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: theory and applications,” Neurocomputing, vol. 70, no. 1–3, pp. 489–501, 2006. View at Publisher · View at Google Scholar · View at Scopus
  47. W. Huang, N. Li, Z. Lin et al., “Liver tumor detection and segmentation using kernel-based extreme learning machine,” in Proceedings of the 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC '13), pp. 3662–3665, Osaka, Japan, July 2013. View at Publisher · View at Google Scholar · View at Scopus
  48. S. Ding, Y. Zhang, X. Xu, and L. Bao, “A novel extreme learning machine based on hybrid kernel function,” Journal of Computers, vol. 8, no. 8, pp. 2110–2117, 2013. View at Publisher · View at Google Scholar · View at Scopus
  49. A. Shahzadi, A. Ahmadyfard, A. Harimi, and K. Yaghmaie, “Speech emotion recognition using non-linear dynamics features,” Turkish Journal of Electrical Engineering & Computer Sciences, 2013. View at Google Scholar
  50. P. Henríquez, J. B. Alonso, M. A. Ferrer, C. M. Travieso, and J. R. Orozco-Arroyave, “Application of nonlinear dynamics characterization to emotional speech,” in Advances in Nonlinear Speech Processing, vol. 7015 of Lecture Notes in Computer Science, pp. 127–136, Springer, Berlin, Germany, 2011. View at Publisher · View at Google Scholar
  51. S. Wu, T. H. Falk, and W.-Y. Chan, “Automatic speech emotion recognition using modulation spectral features,” Speech Communication, vol. 53, no. 5, pp. 768–785, 2011. View at Publisher · View at Google Scholar · View at Scopus
  52. S. R. Krothapalli and S. G. Koolagudi, “Emotion recognition using vocal tract information,” in Emotion Recognition Using Speech Features, pp. 67–78, Springer, Berlin, Germany, 2013. View at Google Scholar
  53. M. Kotti and F. Paternò, “Speaker-independent emotion recognition exploiting a psychologically- inspired binary cascade classification schema,” International Journal of Speech Technology, vol. 15, no. 2, pp. 131–150, 2012. View at Publisher · View at Google Scholar · View at Scopus
  54. A. S. Lampropoulos and G. A. Tsihrintzis, “Evaluation of MPEG-7 descriptors for speech emotional recognition,” in Proceedings of the 8th International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP '12), pp. 98–101, 2012.
  55. N. Banda and P. Robinson, “Noise analysis in audio-visual emotion recognition,” in Proceedings of the International Conference on Multimodal Interaction, pp. 1–4, Alicante, Spain, November 2011.
  56. N. S. Fulmare, P. Chakrabarti, and D. Yadav, “Understanding and estimation of emotional expression using acoustic analysis of natural speech,” International Journal on Natural Language Computing, vol. 2, no. 4, pp. 37–46, 2013. View at Publisher · View at Google Scholar
  57. S. Haq and P. Jackson, “Speaker-dependent audio-visual emotion recognition,” in Proceedings of the International Conference on Audio-Visual Speech Processing, pp. 53–58, 2009.