Review Article

Academic Emotion Classification Using FER: A Systematic Review

Table 2

Selected retrieved publications.

Retrieved fromAuthors, yearEmotion classifierDatasetPreprocessingFeature extractionResultsStrengthLimitation

IEEE XploreG. Li and Wang, 2018 [30]SVM+CNNFER-2013Histogram equalization; frontal_face_detector detector of the Dlib library is loaded for face detection.Method based on the global feature & geometric featuresN/AAble to analyze the learners’ facial images in real timeFocus only on general emotions
IEEE XploreEl Hammoumi et al., 2018 [31]CNNCK+ and the KDEFOpenCVCNNAccuracy: 97.18%Achieved high detection accuracyFocus only on general emotions
IEEE XploreMa et al., 2018 [32]CNNFER-2013Cascaded Haar feature real-time facial detection algorithm based on AdaBoostCNNIdentification error score: 0.14Real-time detectionFocus only on general emotions
ScienceDirectD. Yang et al., 2018 [33]Neural networkJAFFEHaar cascadesNeural networkAccuracy:
(i) Sad: 76%
(ii) Surprise: 87.72%
(iii) Happy: 94%
(iv) Anger: 87.66%
(v) Disgust: 82.76
(vi) Fear: 79.73%
Achieved high classification accuracy for “happy” emotion(i) Long processing time
(ii) Does not involve the illumination and pose of the image
IEEE XploreCandra Kirana et al., 2018 [34]Viola-Jones algorithmTen student expressions in the classViola-Jones algorithmViola-Jones algorithmAccuracy: 74%Fastest algorithm when compared to the other two algorithms, Viola-Jones+neural networks and neural networks(i) Small dataset
(ii) Has lower detection accuracy compared to more complex algorithms (Viola-Jones +neural networks)
(iii) Only works on forward-facing faces
IEEE XploreDewan et al., 2018 [35]DBNDAiSEEViola-Jones algorithmLocal directional pattern (LDP)Accuracy:
(i) Two level: 90.89%
(ii) Three level: 87.25%
(i) Achieved high accuracy for two-level engagement detection
(ii) Robustness in classification
Unknown direct correlation between engagement and actual task performance
ScopusSharma and Mansotra, 2019 [36]CNNFER-2013Viola-Jones algorithmHaar cascades extractionN/AAble to analyze emotions in real timeFocus only on general emotions
ScopusSharma and Mansotra, 2019 [37]CNNFER-2013 and emotional corpusViola-Jones algorithmHaar cascades extractionN/AAchieved better classification accuracyFocus only on general emotions
IEEE XploreMao et al., 2019 [38]SVM617 images from 19 students (12 expressions)Gray processing, histogram equalization, & scale normalizationExtended LBP algorithm (ELBP)Accuracy: 98.16%Outperforms the original LBP algorithmFocus only on general emotions
ScienceDirectHung et al., 2019 [39]CNNFER-2013, JAFFE and KDEFAdaBoostCNNAccuracy: 84.59%Achieved high recognition accuracyFocus only on general emotions
IEEE XploreLasri et al., 2019 [40]CNNFER-2013AdaBoostCNNAccuracy: 70%Good in predicting happy and surprised emotionsFocus only on general emotions
ScopusTang et al., 2019 [41]CNNFER-2013Convert each pixel grayscale face image into a stringCNNAccuracy: 70.10%(i) High detection accuracy and robustness
(ii) Real-time evaluation students’ classroom performance
Focus only on general emotions
IEEE XploreShi et al., 2019 [42]CNN-SVM82 students that learn different online coursesFace detection using Viola-Jones, image rotation, image scaling normalizationCNNAccuracy: 93.80%High predictive performanceOnly considers two levels of confusion
IEEE XploreHealy et al., 2019 [43]SVMCK+ and MUGDlib library, which contains CNNs trained in face detection.Dlib is used to extract the 68 landmark points from the detected faceAccuracy: 88.76%Able to provide quick and reliable classification(i) Focus only on general emotions
(ii) Computationally intensive
Google scholarLiang, 2019 [44]SVMYale face of Kyushu University in Japan, BioID Face Database, JAFFENormalization of feature pointsActive shape model (ASM)Accuracy: 79.65%Good recognition rate for some facial expressionsFocus only on general emotions
IEEE XploreDash et al., 2019 [45]CNNDAiSEEViola-Jones algorithmCNNAccuracy:
(i)Engaged:
(ii) Not engaged:
Achieved high detection accuracyFocus only on engagement detection
Google ScholarBian et al., 2019 [46]CNNOL-SFEDVGG16CNNAccuracy: 91.60%Achieved high detection accuracyLimited sample number
IEEE XploreHuang et al., 2019 [47]BERN (combination of temporal convolution, bidirectional LSTM, and attention mechanism)DAiSEEN/ATracking the changes in face position and pixels through deep learningTop 1 accuracy: 60% for four classification modelAchieved state-of-art performance compared with the benchmark (57.9%)Requires a large amount of training data and a long training time
SpringerT. S and Guddeti, 2020 [48]Hybrid CNNSelf-created dataset (engaged, boredom, and neutral)Delete the blurred and repeated framesN/AAccuracy:
(i) Posed: 86%
(ii) Spontaneous: 70%
Outperforms the existing state-of-the-art methodsLong training time
Google ScholarMohamad Nezami et al., 2020 [49]CNN4627 engaged and disengaged sampleCNN-based face detection algorithmN/APrecision: 60.42%Outperforms their previous engagement recognition methodFocus only on engagement detection
ScopusAlrayassi and Shilbayeh, 2020 [50]CNN15 random students seeking admission in ADSMAdaBoostPCA algorithmAccuracy: 56.60%Able to detect happiness and no emotion(i) Focus only on general emotions
(ii) Difficult to detect sadness and surprised emotion
Google ScholarTang et al., 2020 [51]CNNJAFFE and CK+CNNCNNAccuracy:
(i) JAFFE: 92.68%
(ii) CK+: 99.10%
Achieved high recognition accuracyFocus only on general emotions
Google ScholarLeong, 2020 [52]LSTMDAiSEEMTCNN library—face detection & croppingFaceNet modelAccuracy compared to the EmotioNet model:
(i) Boredom: +16.26%
(ii) Frustration -2.42%
(i) Improved accuracy for boredom emotion(i) Decreased accuracy for frustration emotion
(ii) Involved negative emotions only
SpringerZatarain Cabada et al., 2020 [53]CNNDatabase Insight (dbI)CNNLocal binary patterns (LBP), geometric-based feature extraction, and convolutional filters (CF)Accuracy: 82%Demonstrate an 8% improvement in accuracy over a previous work that used a trial and error methodImbalanced dataset
SpringerZhu and Chen, 2020 [54]Hybrid deep neural network (hybrid DNN)JAFFE, CK+, and FER-2013Increase the original data by eight times, including the original image, flipped image, and rotated images of the six anglesFace++ detect APIAccuracy: 83.90%Show high recognition accuracyRequires a large amount of training data
ScopusWang et al., 2020 [7]CNNCK+, DISFA, DISFA+IntraFaceCNNN/APerforms robustly in various environmentsFocus only on general emotions
IEEE XploreDubbaka and Gopalan, 2020 [55]CNNDISFA+(i) Grayscale and reduced into pixels input dimensions
(ii) Split into the upper, lower, and whole face
Using Dlib packageAccuracy: 95%Achieved a high detection accuracyObstructions on faces limited the model’s performance
SpringerPise et al., 2020 [56]Temporal relation network (TRN)30 samples of different individual’s frontal images with four emotion typesScale, align, and normalize the samplesBase (SqueezeNet) CNNAccuracy: 91.30%Achieved a high detection accuracy(i) Prone to underfitting issues due to a small dataset
(ii) Focus only on general emotions
Google ScholarSabri, 2020 [57]Support vector regression (SVR)JAFFEViola-Jones algorithmGray-level cooccurrence matrix (GLCM)Accuracy: 99.16%(i) Less susceptible to overfitting
(ii) Achieved a high detection accuracy
(i) Focus only on general emotions
(ii) Computationally expensive
ScopusKumar et al., 2020 [58]SVMJAFFE, CK+, and FER-2013Kanade-Lucas-Tomasi algorithmGabor filterAccuracy: 62%Depression detection is includedFocus only on general emotions
IEEE XploreMurugappan et al., 2020 [59]Extreme learning machine (ELM) and probabilistic neural network (PNN)55 undergraduate university studentsViola-Jones algorithmUse a mathematical model to place ten virtual markers on the subject’s face in defined locationsAccuracy:
(i) ELM: 88%
(ii) PNN: 92%
Achieved a high detection accuracyOnly simple distance measure is used for emotion classification
IEEE XploreMurugappan et al., 2020 [60]K-nearest neighbors (KNN) & decision tree55 subjects with six types of emotionsViola-Jones algorithmUse a mathematical model to place ten virtual markers on the subject’s face in defined locationsAccuracy: 98.03%(i) Less computational complexity analysis
(ii) Achieved a high detection accuracy
Focus only on general emotions
Google ScholarRao and Rao, 2020 [61]CNNDAiSEE, JAFFE, and CK+(i) Each video is cut into frames
(ii) Resize to an image of size pixels
(iii) Apply limited histogram equalization
CNN+pose estimatorAccuracy:
(i) DAiSEE: 53.4%
(ii) JAFFE: 71.4%
(iii) CK+: 99.95%
Achieved high detection accuracy when using the CK+ datasetShow a low recognition rate for frustration when using the DAiSEE dataset
Google ScholarHingu, 2020 [62]CNNFER-2013Haar cascadesCNNAccuracy:
(i) Training set: 65%
(ii) Validation set: 63%
The feature extraction method outperforms the existing traditional approach(i) Focus only on general emotions
(ii) Low detection accuracy
IEEE XploreZakka and Vadapalli, 2020 [63]CNNFER-2013Haar cascadesCNNAccuracy: 64.43%Able to detect emotions in real time(i) Focus only on general emotions
(ii) Low recognition accuracy
SpringerLiao et al., 2021 [64]Deep facial spatiotemporal network (DFSTN)DAiSEEMTCNNSE-ResNet-50 (SENet)Accuracy: 58.84%(i) The prediction effect is enhanced even in challenging circumstances
(ii) Able to fuse facial spatiotemporal information
(i) Low detection accuracy
(ii) Data deficiencies and data imbalances
IEEE XploreSiam et al., 2021 [65]CNNFER-2013(i) Resize images into
(ii) Image augmentation
(iii) Normalization
CNNAccuracy: 69%Able to generate reviews from an image with multiple faces(i) Focus only on general emotions
(ii) Low detection accuracy
Google ScholarLi et al., 2021 [66]CNNFER-2013Haar cascadesCNNAccuracy: 72.4%Less complex modelFocus only on general emotions
IEEE XploreMohan et al., 2021 [67]Deep CNN (DCNN)FER-2013, JAFFE, CK+, KDEF, and RAF(i) Rotation by +5°
(ii) Rotation by -5°
(iii) Horizontal flip
(iv) Gaussian noise
DCNNAccuracy:
(i) FER-2013: 78%
(ii) JAFFE: 98%
(iii) CK+: 98%
(iv) KDEF: 96%
(v) RAF: 83%
Outperforms twenty-five baseline methods by considering the average timeThe performance is generally not as good as that in FER under a lab-controlled environment
SpringerMohan et al., 2021 [68]CNNFER-2013, JAFFE, CK+, KDEF, and RAFImage resizing using bilinear interpolationCNNAccuracy:
(i) FER-2013: 78.9%
(ii) JAFFE: 96.7%
(iii) CK+: 97.8%
(iv) KDEF: 82.5%
(v) RAF: 81.68%
More preeminent in terms of accuracy and execution time compared to 21 state-of-the-art methodsFocus only on general emotions
SpringerMohan and Seal, 2021 [69]SVM, RF, KNN, MLP, AdaboostReal Life dataset (RL) and Bag-of-Lies(i) The facial regions of the selected frames are cropped
(ii) Images are reshaped using bilinear interpolation
(iii) LDP face images are concatenated and resized using bilinear interpolation
N/AAccuracy:
(i) Bag-of-Lies (video data+audio data+EEG signals data): 70%
(ii) RL dataset (video data+audio data): 76.07%
Combining modalities are consistent with deception detectionSmall datasets used
IEEE XploreMohan et al., 2022 [70]Deep CNN (DCNN)RL trail, Bag-of-Lies, MU3D(i) The facial regions of the selected frames are cropped
(ii) Images are reshaped using bilinear interpolation
(iii) LDP face images are concatenated and resized using bilinear interpolation
DCNNAccuracy:
(i) RL trail: 97%
(ii) Bag-of-Lies: 96%
(iii) MU3D: 98%
Achieved a high detection accuracyData scarcity
SpringerShen et al., 2022 [71]Squeeze and excitation-deep adaptation networks (SE-DAN)JAFFE, CK+, and RAF-DBRandom rotation and horizontal flip on RAF-DBSE-CNNAccuracy: 56%(i) The accuracy is higher than Alexnet, VGG-16, SE-CNN, and DAN
(ii) Can be used for transfer learning and domain adaptation
Focus only on general emotions
SpringerGupta et al., 2022 [72]DCNN (ResNet-50, VGG19, Inception-V3)FER-2013, CK+, RAF-DB, and own dataset (consists of 1800 coloured images with emotions such as angry, sad, happy, neutral, surprised, and fear)Automatic frame selectionMediaPipe face meshAccuracy:
(i) ResNet-50: 92.32%
(ii) VGG19: 90.14%
(iii) Inception-V3: 89.11%
Outperforms all other models for FER in real-time learning scenariosFocus only on engagement detection
IEEE XploreSavchenko et al., 2022 [73]CNNAffectNetRotate cropped facial images to align them based on eyes positionCNNAccuracy: 70.23%Much faster and can be implemented for real-time processingLess accurate when compared to the best-known multimodal ensembles on the AFEW and VGAF datasets
Google ScholarHou et al., 2022 [74]CNN (VGG16)FER-2013 and CK+MTCNNVGG16Accuracy:
(i) FER-2013: 67.4%
(ii) CK+: 99.18%
The accuracy of VGG16+ECANet is 2.76% higher than VGG16 itselfSlow algorithm’s running speed
Google ScholarYuan, 2022 [75]MTCNNRAF-DB, masked dataset, and classroom datasetHistogram equalizationMTCNNAccuracy: 93.53%(i) Able to detect multiple faces in a single image
(ii) The pressure of data storage is alleviated, and the collection workload is reduced
Small sample size of the dataset
Google ScholarWu, 2022 [76]CNNThe self-collected dataset consists of 1073 images with expressions such as boredom, surprise, happy, confusion, and neutral(i) Convert pictures into grayscale
(ii) Resize into pixels
CNNAccuracy: 80%Perform well in recognizing emotions such as happy, surprised, and neutralDataset insufficiency