Table of Contents Author Guidelines Submit a Manuscript
Computational Intelligence and Neuroscience
Volume 2017, Article ID 1945630, 9 pages
https://doi.org/10.1155/2017/1945630
Research Article

Random Deep Belief Networks for Recognizing Emotions from Speech Signals

School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

Correspondence should be addressed to Guihua Wen; nc.ude.tucs@newhgrc

Received 12 October 2016; Revised 21 January 2017; Accepted 6 February 2017; Published 5 March 2017

Academic Editor: Jens Christian Claussen

Copyright © 2017 Guihua Wen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. B. Fong and J. Westerink, “Affective computing in consumer electronics,” IEEE Transactions on Affective Computing, vol. 3, no. 2, pp. 129–131, 2012. View at Publisher · View at Google Scholar · View at Scopus
  2. M. El Ayadi, M. S. Kamel, and F. Karray, “Survey on speech emotion recognition: features, classification schemes, and databases,” Pattern Recognition, vol. 44, no. 3, pp. 572–587, 2011. View at Publisher · View at Google Scholar · View at Scopus
  3. A. Harimi, A. AhmadyFard, A. Shahzadi, and K. Yaghmaie, “Anger or joy? Emotion recognition using nonlinear dynamics of speech,” Applied Artificial Intelligence, vol. 29, no. 7, pp. 675–696, 2015. View at Publisher · View at Google Scholar · View at Scopus
  4. Y. Sun and G. Wen, “Ensemble softmax regression model for speech emotion recognition,” Multimedia Tools and Applications, pp. 1–24, 2016. View at Publisher · View at Google Scholar · View at Scopus
  5. J.-S. Park, J.-H. Kim, and Y.-H. Oh, “Feature vector classification based speech emotion recognition for service robots,” IEEE Transactions on Consumer Electronics, vol. 55, no. 3, pp. 1590–1596, 2009. View at Publisher · View at Google Scholar · View at Scopus
  6. D. J. France and R. G. Shiavi, “Acoustical properties of speech as indicators of depression and suicidal risk,” IEEE Transactions on Biomedical Engineering, vol. 47, no. 7, pp. 829–837, 2000. View at Publisher · View at Google Scholar · View at Scopus
  7. Z.-W. Huang, W.-T. Xue, and Q.-R. Mao, “Speech emotion recognition with unsupervised feature learning,” Frontiers of Information Technology and Electronic Engineering, vol. 16, no. 5, pp. 358–366, 2015. View at Publisher · View at Google Scholar · View at Scopus
  8. Z. Wang, Q. Ruan, and G. An, “Projection-optimal local Fisher discriminant analysis for feature extraction,” Neural Computing and Applications, vol. 26, no. 3, pp. 589–601, 2015. View at Publisher · View at Google Scholar · View at Scopus
  9. W. Q. Zheng, J. S. Yu, and Y. X. Zou, “An experimental study of speech emotion recognition based on deep convolutional neural networks,” in Proceedings of the International Conference on Affective Computing and Intelligent Interaction (ACII '15), pp. 827–831, IEEE, Xi'an, China, September 2015. View at Publisher · View at Google Scholar · View at Scopus
  10. Y. Bengio and H. Lee, “Editorial introduction to the neural networks special issue on deep learning of representations,” Neural Networks, vol. 64, pp. 1–3, 2015. View at Publisher · View at Google Scholar
  11. P. Liu, S. Han, Z. Meng, and Y. Tong, “Facial expression recognition via a boosted deep belief network,” in Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '14), pp. 1805–1812, IEEE, Columbus, Ohio, USA, June 2014. View at Publisher · View at Google Scholar · View at Scopus
  12. A. Stuhlsatz, C. Meyer, F. Eyben, T. Zielke, G. Meier, and B. Schuller, “Deep neural networks for acoustic emotion recognition: raising the benchmarks,” in Proceedings of the 36th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '11), pp. 5688–5691, Prague, Czech Republic, May 2011. View at Publisher · View at Google Scholar · View at Scopus
  13. A. Pal and S. Baskar, “Speech emotion recognition using Deep Dropout Autoencoders,” in Proceedings of the IEEE International Conference on Engineering and Technology (ICETECH '15), Coimbatore, India, November 2015. View at Publisher · View at Google Scholar · View at Scopus
  14. T. L. Nwe, S. W. Foo, and L. C. De Silva, “Speech emotion recognition using hidden Markov models,” Speech Communication, vol. 41, no. 4, pp. 603–623, 2003. View at Publisher · View at Google Scholar · View at Scopus
  15. M. Bhaykar, J. Yadav, and K. S. Rao, “Speaker dependent, speaker independent and cross language emotion recognition from speech using GMM and HMM,” in Proceedings of the National Conference on Communications (NCC '13), February 2013. View at Publisher · View at Google Scholar · View at Scopus
  16. L. Jinkyu, “High-level feature representation using recurrent neural network for speech emotion recognition,” in Proceedings of the 16th Annual Conference of the International Speech Communication Association (INTERSPEECH '15), pp. 1537–1540, Dresden, Germany, 2015.
  17. T. Danisman and A. Alpkocak, “Emotion classification of audio signals using ensemble of support vector machines,” Perception in Multimodal Dialogue Systems, vol. 5078, pp. 205–216, 2008. View at Publisher · View at Google Scholar
  18. S. Kim, Z. Yu, R. M. Kil, and M. Lee, “Deep learning of support vector machines with class probability output networks,” Neural Networks, vol. 64, pp. 19–28, 2015. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus
  19. X. Zhang, Y. Sun, and S. Duan, “Progress in speech emotion recognition,” in Proceedings of the IEEE Region 10 Annual International Conference, Bangkok, Thailand, March 2016.
  20. R. Mousavi and M. Eftekhari, “A new ensemble learning methodology based on hybridization of classifier ensemble selection approaches,” Applied Soft Computing, vol. 37, pp. 652–666, 2015. View at Publisher · View at Google Scholar · View at Scopus
  21. J. Wagner, F. Lingenfelser, E. André, and J. Kim, “Exploring fusion methods for multimodal emotion recognition with missing data,” IEEE Transactions on Affective Computing, vol. 2, no. 4, pp. 206–218, 2011. View at Publisher · View at Google Scholar · View at Scopus
  22. A. Milton and S. Tamil Selvi, “Class-specific multiple classifiers scheme to recognize emotions from speech signals,” Computer Speech and Language, vol. 28, no. 3, pp. 727–742, 2014. View at Publisher · View at Google Scholar · View at Scopus
  23. V. B. Kobayashi and V. B. Calag, “Detection of affective states from speech signals using ensembles of classifiers,” in Proceedings of the IET Intelligent Signal Processing Conference (ISP '13), pp. 1–9, IEEE, London, UK, December 2013. View at Publisher · View at Google Scholar · View at Scopus
  24. Y. Guo, L. Jiao, S. Wang et al., “A novel dynamic rough subspace based selective ensemble,” Pattern Recognition, vol. 48, no. 5, pp. 1638–1652, 2015. View at Publisher · View at Google Scholar · View at Scopus
  25. D.-Y. Huang, Z. Zhang, and S. S. Ge, “Speaker state classification based on fusion of asymmetric simple partial least squares (SIMPLS) and support vector machines,” Computer Speech and Language, vol. 28, no. 2, pp. 392–414, 2014. View at Publisher · View at Google Scholar · View at Scopus
  26. B. Schuller, S. Reiter, R. Muller, M. Al-Hames, M. Lang, and G. Rigoll, “Speaker independent speech emotion recognition by ensemble classification,” in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '05), pp. 864–867, Amsterdam, Netherlands, July 2005. View at Publisher · View at Google Scholar · View at Scopus
  27. M. K. Sarker, K. M. R. Alam, and M. Arifuzzaman, “Emotion recognition from speech based on relevant feature and majority voting,” in Proceedings of the International Conference on Informatics, Electronics and Vision (ICIEV '14), pp. 1–5, May 2014. View at Publisher · View at Google Scholar · View at Scopus
  28. V. Rozgić, S. Ananthakrishnan, S. Saleem, R. Kumar, and R. Prasad, “Ensemble of SVM trees for multimodal emotion recognition,” in Proceedings of the 4th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC '12), 4, p. 1, December 2012. View at Scopus
  29. C. Huang, W. Gong, W. Fu, and D. Feng, “A research of speech emotion recognition based on deep belief network and SVM,” Mathematical Problems in Engineering, vol. 2014, Article ID 749604, 7 pages, 2014. View at Publisher · View at Google Scholar · View at Scopus
  30. Y. Huang, A. Wu, G. Zhang, and Y. Li, “Speech emotion recognition based on deep belief networks and wavelet packet cepstral coefficients,” International Journal of Simulation: Systems, Science and Technology, vol. 17, no. 28, pp. 28.1–28.5, 2016. View at Google Scholar
  31. H. Ranganathan, S. Chakraborty, and S. Panchanathan, “Multimodal emotion recognition using deep learning architectures,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV '16), March 2016. View at Publisher · View at Google Scholar · View at Scopus
  32. K. Mannepalli, P. N. Sastry, and M. Suman, “MFCC-GMM based accent recognition system for Telugu speech signals,” International Journal of Speech Technology, vol. 19, no. 1, pp. 87–93, 2016. View at Publisher · View at Google Scholar · View at Scopus
  33. R. Xia and Y. Liu, “A multi-task learning framework for emotion recognition using 2D continuous space,” IEEE Transactions on Affective Computing, vol. 99, pp. 1–11, 2015. View at Publisher · View at Google Scholar
  34. X. Zhou, L. Xie, P. Zhang, and Y. Zhang, “An ensemble of deep neural networks for object tracking,” in Proceedings of the IEEE International Conference on Image Processing (ICIP '14), pp. 843–847, IEEE, Paris, France, 2014. View at Publisher · View at Google Scholar · View at Scopus
  35. H.-Y. Lee, T.-Y. Hu, H. Jing et al., “Ensemble of machine learning and acoustic segment model techniques for speech emotion and autism spectrum disorders recognition,” in Proceedings of the 14th Annual Conference of the International Speech Communication Association (INTERSPEECH '13), pp. 215–219, Lyon, France, August 2013. View at Scopus
  36. Y. Kim, H. Lee, and E. M. Provost, “Deep learning for robust feature generation in audiovisual emotion recognition,” in Proceedings of the 38th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '13), pp. 3687–3691, May 2013. View at Publisher · View at Google Scholar · View at Scopus
  37. S. E. Kahou, X. Bouthillier, P. Lamblin et al., “EmoNets: multimodal deep learning approaches for emotion recognition in video,” Journal on Multimodal User Interfaces, vol. 10, no. 2, pp. 99–111, 2016. View at Publisher · View at Google Scholar · View at Scopus
  38. C.-X. Zhang, J.-S. Zhang, N.-N. Ji, and G. Guo, “Learning ensemble classifiers via restricted Boltzmann machines,” Pattern Recognition Letters, vol. 36, no. 1, pp. 161–170, 2014. View at Publisher · View at Google Scholar · View at Scopus
  39. D. Eigen, J. Rolfe, R. Fergus, and Y. LeCun, “Understanding deep architectures using a recursive convolutional network,” in Proceedings of the International Conference on Learning Representations, Banff, Canada, April 2014.
  40. Y. Sun, G. Wen, and J. Wang, “Weighted spectral features based on local Hu moments for speech emotion recognition,” Biomedical Signal Processing and Control, vol. 18, pp. 80–90, 2015. View at Publisher · View at Google Scholar · View at Scopus
  41. C.-C. Lee, E. Mower, C. Busso, S. Lee, and S. Narayanan, “Emotion recognition using a hierarchical binary decision tree approach,” Speech Communication, vol. 53, no. 9-10, pp. 1162–1171, 2011. View at Publisher · View at Google Scholar · View at Scopus
  42. S. Wu, T. H. Falk, and W.-Y. Chan, “Automatic speech emotion recognition using modulation spectral features,” Speech Communication, vol. 53, no. 5, pp. 768–785, 2011. View at Publisher · View at Google Scholar · View at Scopus
  43. B. Schuller, S. Steidl, A. Batliner et al., “The INTERSPEECH 2010 paralinguistic challenge,” in Proceedings of the 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All (INTERSPEECH '10), pp. 2794–2797, September 2010. View at Scopus
  44. F. Eyben, M. Wöllmer, and B. Schuller, “OpenSMILE—the Munich versatile and fast open-source audio feature extractor,” in Proceedings of the 18th ACM International Conference on Multimedia (MM '10), pp. 1459–1462, ACM, Firenze, Italy, October 2010. View at Publisher · View at Google Scholar · View at Scopus
  45. F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, and B. Weiss, “A database of German emotional speech,” in Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH '05), pp. 1517–1520, Lisbon, Portugal, September 2005. View at Scopus
  46. S. Haq and P. Jackson, “Speaker-dependent audio-visual emotion recognition,” in Proceedings of the International Conference on Audio-Visual Speech Processing, pp. 53–58, 2009.
  47. “The selected Speech Emotion Database of Institute of Automation Chinese Academy of Sciences (CASIA),” http://www.datatang.com/data/39277.
  48. S. Steidl, Automatic Classification of Emotion Related User States in Spontaneous Children's Speech, Logos, Berlin, Germany, 2009.
  49. A. Hassan, R. Damper, and M. Niranjan, “On acoustic emotion recognition: compensating for covariate shift,” IEEE Transactions on Audio, Speech and Language Processing, vol. 21, no. 7, pp. 1458–1468, 2013. View at Publisher · View at Google Scholar · View at Scopus
  50. Y. Zong, W. Zheng, T. Zhang, and X. Huang, “Cross-corpus speech emotion recognition based on domain-adaptive least-squares regression,” IEEE Signal Processing Letters, vol. 23, no. 5, pp. 584–588, 2016. View at Publisher · View at Google Scholar · View at Scopus