Review Article

A Review of the Advancement in Speech Emotion Recognition for Indo-Aryan and Dravidian Languages

Table 2

Review of some speech emotion recognition experiments for Indo-Aryan and Dravidian languages.

S/NReferenceDatabaseApproach usedRecognized emotionsResults
NameLanguage

1Koolagudi et al. [79]IITKGP-SESCTeluguSVM and GMM with energy and pitch parametersHappy, anger, fear, disgust, sarcastic, sad, neutral, surprise63.75% average accuracy obtained

2Sultana et al. [3]SUBESCO and RAVDESSBangla and EnglishThe system integrates a DCNN and a BLSTM network with a TDF layerHappy, calm, sad, surprise, fearful, disgust, angry, neutralFor the SUBESCO and RAVDESS datasets, the proposed model has achieved weighted accuracies of 86.9% and 82.7%, respectively

3Kumar and Yadav [80]IITKGP-SEHSCHindiDeep LSTM with GMFCC and DMFCC featuresHappy, fear, angry, sad, neutralThe proposed framework gives average accuracy of 91.2% for male speech and 87.6% for female speech

4Mohanty and Swain [15]Oriya emotional speech databaseOriyaFuzzy K-meansAnger, sadness, astonish, fear, happiness, neutral65.16% recognition rate by incorporating mean pitch, first two formants, jitter, shimmer, and energy as feature vectors

5Samantaray et al. [48]MESDNEIAssameseSVM with dynamic, quality, derived, and prosodic featuresHappy, anger, fear, disgust, surprise, sad, neutral82.26% average accuracy rate for speaker-independent case

6Bhavan et al. [81]EmoDB, RAVDESS and IITKGP-SEHSCGerman, English and HindiBagged ensemble of SVM using MFCCs, spectral, and centroidsHappy, sad, calm, angry, surprise, fear, disgust, neutralObtained accuracy EmoDB: 92.45%, RAVDESS: 75.69% and IITKGP-SEHSC: 84.11%

7Swain et al. [82]Self-created database using utterances from two native languages of Odisha: Cuttacki and SambalpuriOriyaSVM using MFCC as feature vectorHappiness, fear, anger, disgust, sadness, surprise, neutral82.14% recognition accuracy for SVM classifier

8Zaheer et al. [30]SEMOUR+UrduEnsemble classifier, CNN combined with VGG-19 modelAnger, disgust, happiness, surprise, boredom, sadness, fearful, neutralThe proposed model achieved 56% speaker-independent recognition rate

9Wankhade et al. [47]Speech emotional database containing dialogues from different bollywood moviesHindiSVM classifier with MFCC and MEDC feature setAngry, happy, sad, neutral71.66% recognition rate using SVM classifier

10Ali et al. [83]Self-created speech emotional corpus recorded in 5 regional languages of PakistanUrdu, Sindhi, Pashto, Punjabi, and BalochiLearning classifiers (adaboostM1, J48, classification via regression, decision stump) with prosodic featuresHappiness, sad, anger, neutral40% classification accuracy with pitch feature

11Ancilin and Milton [84]UrduUrduSVM classifier with mel frequency magnitude coefficient (MFMC)Happy, sad, anger, neutral95.25% emotion recognition rate using MFMC

12Farhad et al. [85]UrduUrduNeural network, random forest and meta iterative classifiers with pitch and MFCC featuresHappy, sad, angryWith an accuracy of 78.75%, random forest outperforms other classifiers

13Darekar and Dhande [86]Marathi databaseMarathiAdaptive ANN combining cepstral, non-negative matrix factorization (NMF) and pitch featuresHappy, sad, angry, fear, neutral, surprisedProposed model obtains 80% accuracy combining the 3 features

14Koolagudi et al. [87]IITKGP-SESCTeluguSVM and GMM model with epoch parameters were usedHappy, anger, fear, sadness, disgust, neutralAverage recognition rates are 58% and 61% for SVM and GMM, respectively

15Kandali et al. [49]Self-created acted emotional speech database by 27 speakersAssameseGMM classifier with MFCC featuresHappy, sad, disgust, fear, angry, surprise, neutralHighest mean classification score is 76.5%

16Dhar and Guha [88]Abeg: self-collected Bangla emotional speech datasetBanglaLogistic regression model with MFCC and LPC featuresHappy, angry, neutralProposed model achieved 92% accuracy combining MFCC and LPC features

17Jacob [89]Hindi emotional speech database containing 2240 wav files collected from 10 speakersHindiANN model with jitter and shimmer featuresHappy, sad, anger, fear, surprise, disgust, neutral83.3% overall accuracy obtained combining jitter and shimmer features

18Fernandes and Mannepalli [90]Acted emotional speech database containing 1400 utterances by 10 actorsTamilLSTM and BiLSTM with MFCC, MFCC delta, spectral kurtosis, bark spectrum, and spectral skewness featuresHappy, anger, sad, fear, boredom, disgust, neutral84% accuracy rate obtained using LSTM and BiLSTM with dropout layers

19Rajisha et al. [91]Acted emotional dataset created by the authorsMalayalamANN and SVM classifier with MFCC, short-time energy, and pitch featuresHappy, anger, sad, neutral88.4% recognition rate obtained using ANN and 78.2% with SVM

20Kannadaguli and Bhat [92]Self-created database containing 2800 emotional recordingsKannadaBayesian and HMM model with MFCC featureHappy, excited, angry, sadAverage emotion error rate of 25.5% for Bayesian and 0.2% for HMM approach