Research Article

Improved Emotion Recognition Using Gaussian Mixture Model and Extreme Learning Machine in Speech and Glottal Signals

Table 1

Some of the significant works on speech emotion recognition.

Ref. numberDatabaseSignalsNumber of emotionsMethodsBest result

[49]BESSpeech signalsAnger, boredom, disgust, fear, happiness, sadness, and neutralNonlinear dynamic features + prosodic + spectral features + SVM classifier82.72% (females)
85.90% (males)
[50]BESSpeech signalsNeutral, fear, and angerNonlinear dynamic features + neural network93.78%
[51]BESSpeech signalsAnger, boredom, disgust, fear, happiness, sadness, and neutralModulation spectral features (MSFs) + multiclass SVM85.60%
[30]BESSpeech signalsAnger, boredom, disgust, fear, happiness, sadness, and neutralCombination of spectral excitation source features + autoassociative neural network82.16%
[27]BESSpeech signalsAnger, boredom, disgust, fear, happiness, sadness, and neutralCombination of utterancewise global and local prosodic features + SVM classifier62.43%
[52]BESSpeech signalsAnger, boredom, disgust, fear, happiness, sadness, and neutralLPCCs + formants + GMM classifier68%
[28]BESSpeech signalsAnger, boredom, fear, happiness, sadness, and neutralDiscriminative band wavelet packet power coefficients (db-WPPC) with Daubechies filter of order 40 + GMM classifier75.64%
[53]BESSpeech signalsAnger, boredom, disgust, fear, happiness, sadness, and neutralLow level audio descriptors and high level perceptual descriptors with linear SVM87.7%
[54]BESSpeech signalsAnger, boredom, disgust, fear, happiness, sadness, and neutralMPEG-7 low level audio descriptors + SVM with radial basis function kernel77.88%
[55]SAVEESpeech signalsAnger, surprise, sadness, happiness, fear, disgust, and neutralMel-frequency cepstral coefficients + signal energy + correlation based feature selection + SVM with radial basis function kernels79%
[56]SAVEESpeech signalsAnger, surprise, sadness, happiness, fear, disgust, and neutralEnergy intensity + pitch + standard deviation + jitter + shimmer + NN74.39%
[57]SAVEESpeech signalsAnger, surprise, sadness, happiness, fear, disgust, and neutralAudio features + LDA feature reduction + single component Gaussian classifier63%
[20]SAVEESpeech signalsAnger, surprise, sadness, happiness, fear, disgust, and neutralPitch + energy + duration + spectral + Gaussian classifier59.2%