Advances in Human-Computer Interaction

Review Article

A Review of the Advancement in Speech Emotion Recognition for Indo-Aryan and Dravidian Languages

Table 2

Review of some speech emotion recognition experiments for Indo-Aryan and Dravidian languages.


S/N	Reference	Database		Approach used	Recognized emotions	Results
S/N	Reference	Name	Language	Approach used	Recognized emotions	Results

1	Koolagudi et al. [79]	IITKGP-SESC	Telugu	SVM and GMM with energy and pitch parameters	Happy, anger, fear, disgust, sarcastic, sad, neutral, surprise	63.75% average accuracy obtained

2	Sultana et al. [3]	SUBESCO and RAVDESS	Bangla and English	The system integrates a DCNN and a BLSTM network with a TDF layer	Happy, calm, sad, surprise, fearful, disgust, angry, neutral	For the SUBESCO and RAVDESS datasets, the proposed model has achieved weighted accuracies of 86.9% and 82.7%, respectively

3	Kumar and Yadav [80]	IITKGP-SEHSC	Hindi	Deep LSTM with GMFCC and DMFCC features	Happy, fear, angry, sad, neutral	The proposed framework gives average accuracy of 91.2% for male speech and 87.6% for female speech

4	Mohanty and Swain [15]	Oriya emotional speech database	Oriya	Fuzzy K-means	Anger, sadness, astonish, fear, happiness, neutral	65.16% recognition rate by incorporating mean pitch, first two formants, jitter, shimmer, and energy as feature vectors

5	Samantaray et al. [48]	MESDNEI	Assamese	SVM with dynamic, quality, derived, and prosodic features	Happy, anger, fear, disgust, surprise, sad, neutral	82.26% average accuracy rate for speaker-independent case

6	Bhavan et al. [81]	EmoDB, RAVDESS and IITKGP-SEHSC	German, English and Hindi	Bagged ensemble of SVM using MFCCs, spectral, and centroids	Happy, sad, calm, angry, surprise, fear, disgust, neutral	Obtained accuracy EmoDB: 92.45%, RAVDESS: 75.69% and IITKGP-SEHSC: 84.11%

7	Swain et al. [82]	Self-created database using utterances from two native languages of Odisha: Cuttacki and Sambalpuri	Oriya	SVM using MFCC as feature vector	Happiness, fear, anger, disgust, sadness, surprise, neutral	82.14% recognition accuracy for SVM classifier

8	Zaheer et al. [30]	SEMOUR⁺	Urdu	Ensemble classifier, CNN combined with VGG-19 model	Anger, disgust, happiness, surprise, boredom, sadness, fearful, neutral	The proposed model achieved 56% speaker-independent recognition rate

9	Wankhade et al. [47]	Speech emotional database containing dialogues from different bollywood movies	Hindi	SVM classifier with MFCC and MEDC feature set	Angry, happy, sad, neutral	71.66% recognition rate using SVM classifier

10	Ali et al. [83]	Self-created speech emotional corpus recorded in 5 regional languages of Pakistan	Urdu, Sindhi, Pashto, Punjabi, and Balochi	Learning classifiers (adaboostM1, J48, classification via regression, decision stump) with prosodic features	Happiness, sad, anger, neutral	40% classification accuracy with pitch feature

11	Ancilin and Milton [84]	Urdu	Urdu	SVM classifier with mel frequency magnitude coefficient (MFMC)	Happy, sad, anger, neutral	95.25% emotion recognition rate using MFMC

12	Farhad et al. [85]	Urdu	Urdu	Neural network, random forest and meta iterative classifiers with pitch and MFCC features	Happy, sad, angry	With an accuracy of 78.75%, random forest outperforms other classifiers

13	Darekar and Dhande [86]	Marathi database	Marathi	Adaptive ANN combining cepstral, non-negative matrix factorization (NMF) and pitch features	Happy, sad, angry, fear, neutral, surprised	Proposed model obtains 80% accuracy combining the 3 features

14	Koolagudi et al. [87]	IITKGP-SESC	Telugu	SVM and GMM model with epoch parameters were used	Happy, anger, fear, sadness, disgust, neutral	Average recognition rates are 58% and 61% for SVM and GMM, respectively

15	Kandali et al. [49]	Self-created acted emotional speech database by 27 speakers	Assamese	GMM classifier with MFCC features	Happy, sad, disgust, fear, angry, surprise, neutral	Highest mean classification score is 76.5%

16	Dhar and Guha [88]	Abeg: self-collected Bangla emotional speech dataset	Bangla	Logistic regression model with MFCC and LPC features	Happy, angry, neutral	Proposed model achieved 92% accuracy combining MFCC and LPC features

17	Jacob [89]	Hindi emotional speech database containing 2240 wav files collected from 10 speakers	Hindi	ANN model with jitter and shimmer features	Happy, sad, anger, fear, surprise, disgust, neutral	83.3% overall accuracy obtained combining jitter and shimmer features

18	Fernandes and Mannepalli [90]	Acted emotional speech database containing 1400 utterances by 10 actors	Tamil	LSTM and BiLSTM with MFCC, MFCC delta, spectral kurtosis, bark spectrum, and spectral skewness features	Happy, anger, sad, fear, boredom, disgust, neutral	84% accuracy rate obtained using LSTM and BiLSTM with dropout layers

19	Rajisha et al. [91]	Acted emotional dataset created by the authors	Malayalam	ANN and SVM classifier with MFCC, short-time energy, and pitch features	Happy, anger, sad, neutral	88.4% recognition rate obtained using ANN and 78.2% with SVM

20	Kannadaguli and Bhat [92]	Self-created database containing 2800 emotional recordings	Kannada	Bayesian and HMM model with MFCC feature	Happy, excited, angry, sad	Average emotion error rate of 25.5% for Bayesian and 0.2% for HMM approach