Retrieved from Authors, year Emotion classifier Dataset Preprocessing Feature extraction Results Strength Limitation IEEE Xplore G. Li and Wang, 2018 [30 ] SVM+CNN FER-2013 Histogram equalization; frontal_face_detector detector of the Dlib library is loaded for face detection. Method based on the global feature & geometric features N/A Able to analyze the learners’ facial images in real time Focus only on general emotions IEEE Xplore El Hammoumi et al., 2018 [31 ] CNN CK+ and the KDEF OpenCV CNN Accuracy: 97.18% Achieved high detection accuracy Focus only on general emotions IEEE Xplore Ma et al., 2018 [32 ] CNN FER-2013 Cascaded Haar feature real-time facial detection algorithm based on AdaBoost CNN Identification error score: 0.14 Real-time detection Focus only on general emotions ScienceDirect D. Yang et al., 2018 [33 ] Neural network JAFFE Haar cascades Neural network Accuracy: (i) Sad: 76% (ii) Surprise: 87.72% (iii) Happy: 94% (iv) Anger: 87.66% (v) Disgust: 82.76 (vi) Fear: 79.73% Achieved high classification accuracy for “happy” emotion (i) Long processing time (ii) Does not involve the illumination and pose of the image IEEE Xplore Candra Kirana et al., 2018 [34 ] Viola-Jones algorithm Ten student expressions in the class Viola-Jones algorithm Viola-Jones algorithm Accuracy: 74% Fastest algorithm when compared to the other two algorithms, Viola-Jones+neural networks and neural networks (i) Small dataset (ii) Has lower detection accuracy compared to more complex algorithms (Viola-Jones +neural networks) (iii) Only works on forward-facing faces IEEE Xplore Dewan et al., 2018 [35 ] DBN DAiSEE Viola-Jones algorithm Local directional pattern (LDP) Accuracy: (i) Two level: 90.89% (ii) Three level: 87.25% (i) Achieved high accuracy for two-level engagement detection (ii) Robustness in classification Unknown direct correlation between engagement and actual task performance Scopus Sharma and Mansotra, 2019 [36 ] CNN FER-2013 Viola-Jones algorithm Haar cascades extraction N/A Able to analyze emotions in real time Focus only on general emotions Scopus Sharma and Mansotra, 2019 [37 ] CNN FER-2013 and emotional corpus Viola-Jones algorithm Haar cascades extraction N/A Achieved better classification accuracy Focus only on general emotions IEEE Xplore Mao et al., 2019 [38 ] SVM 617 images from 19 students (12 expressions) Gray processing, histogram equalization, & scale normalization Extended LBP algorithm (ELBP) Accuracy: 98.16% Outperforms the original LBP algorithm Focus only on general emotions ScienceDirect Hung et al., 2019 [39 ] CNN FER-2013, JAFFE and KDEF AdaBoost CNN Accuracy: 84.59% Achieved high recognition accuracy Focus only on general emotions IEEE Xplore Lasri et al., 2019 [40 ] CNN FER-2013 AdaBoost CNN Accuracy: 70% Good in predicting happy and surprised emotions Focus only on general emotions Scopus Tang et al., 2019 [41 ] CNN FER-2013 Convert each pixel grayscale face image into a string CNN Accuracy: 70.10% (i) High detection accuracy and robustness (ii) Real-time evaluation students’ classroom performance Focus only on general emotions IEEE Xplore Shi et al., 2019 [42 ] CNN-SVM 82 students that learn different online courses Face detection using Viola-Jones, image rotation, image scaling normalization CNN Accuracy: 93.80% High predictive performance Only considers two levels of confusion IEEE Xplore Healy et al., 2019 [43 ] SVM CK+ and MUG Dlib library, which contains CNNs trained in face detection. Dlib is used to extract the 68 landmark points from the detected face Accuracy: 88.76% Able to provide quick and reliable classification (i) Focus only on general emotions (ii) Computationally intensive Google scholar Liang, 2019 [44 ] SVM Yale face of Kyushu University in Japan, BioID Face Database, JAFFE Normalization of feature points Active shape model (ASM) Accuracy: 79.65% Good recognition rate for some facial expressions Focus only on general emotions IEEE Xplore Dash et al., 2019 [45 ] CNN DAiSEE Viola-Jones algorithm CNN Accuracy: (i)Engaged: (ii) Not engaged: Achieved high detection accuracy Focus only on engagement detection Google Scholar Bian et al., 2019 [46 ] CNN OL-SFED VGG16 CNN Accuracy: 91.60% Achieved high detection accuracy Limited sample number IEEE Xplore Huang et al., 2019 [47 ] BERN (combination of temporal convolution, bidirectional LSTM, and attention mechanism) DAiSEE N/A Tracking the changes in face position and pixels through deep learning Top 1 accuracy: 60% for four classification model Achieved state-of-art performance compared with the benchmark (57.9%) Requires a large amount of training data and a long training time Springer T. S and Guddeti, 2020 [48 ] Hybrid CNN Self-created dataset (engaged, boredom, and neutral) Delete the blurred and repeated frames N/A Accuracy: (i) Posed: 86% (ii) Spontaneous: 70% Outperforms the existing state-of-the-art methods Long training time Google Scholar Mohamad Nezami et al., 2020 [49 ] CNN 4627 engaged and disengaged sample CNN-based face detection algorithm N/A Precision: 60.42% Outperforms their previous engagement recognition method Focus only on engagement detection Scopus Alrayassi and Shilbayeh, 2020 [50 ] CNN 15 random students seeking admission in ADSM AdaBoost PCA algorithm Accuracy: 56.60% Able to detect happiness and no emotion (i) Focus only on general emotions (ii) Difficult to detect sadness and surprised emotion Google Scholar Tang et al., 2020 [51 ] CNN JAFFE and CK+ CNN CNN Accuracy: (i) JAFFE: 92.68% (ii) CK+: 99.10% Achieved high recognition accuracy Focus only on general emotions Google Scholar Leong, 2020 [52 ] LSTM DAiSEE MTCNN library—face detection & cropping FaceNet model Accuracy compared to the EmotioNet model: (i) Boredom: +16.26% (ii) Frustration -2.42% (i) Improved accuracy for boredom emotion (i) Decreased accuracy for frustration emotion (ii) Involved negative emotions only Springer Zatarain Cabada et al., 2020 [53 ] CNN Database Insight (dbI) CNN Local binary patterns (LBP), geometric-based feature extraction, and convolutional filters (CF) Accuracy: 82% Demonstrate an 8% improvement in accuracy over a previous work that used a trial and error method Imbalanced dataset Springer Zhu and Chen, 2020 [54 ] Hybrid deep neural network (hybrid DNN) JAFFE, CK+, and FER-2013 Increase the original data by eight times, including the original image, flipped image, and rotated images of the six angles Face++ detect API Accuracy: 83.90% Show high recognition accuracy Requires a large amount of training data Scopus Wang et al., 2020 [7 ] CNN CK+, DISFA, DISFA+ IntraFace CNN N/A Performs robustly in various environments Focus only on general emotions IEEE Xplore Dubbaka and Gopalan, 2020 [55 ] CNN DISFA+ (i) Grayscale and reduced into pixels input dimensions (ii) Split into the upper, lower, and whole face Using Dlib package Accuracy: 95% Achieved a high detection accuracy Obstructions on faces limited the model’s performance Springer Pise et al., 2020 [56 ] Temporal relation network (TRN) 30 samples of different individual’s frontal images with four emotion types Scale, align, and normalize the samples Base (SqueezeNet) CNN Accuracy: 91.30% Achieved a high detection accuracy (i) Prone to underfitting issues due to a small dataset (ii) Focus only on general emotions Google Scholar Sabri, 2020 [57 ] Support vector regression (SVR) JAFFE Viola-Jones algorithm Gray-level cooccurrence matrix (GLCM) Accuracy: 99.16% (i) Less susceptible to overfitting (ii) Achieved a high detection accuracy (i) Focus only on general emotions (ii) Computationally expensive Scopus Kumar et al., 2020 [58 ] SVM JAFFE, CK+, and FER-2013 Kanade-Lucas-Tomasi algorithm Gabor filter Accuracy: 62% Depression detection is included Focus only on general emotions IEEE Xplore Murugappan et al., 2020 [59 ] Extreme learning machine (ELM) and probabilistic neural network (PNN) 55 undergraduate university students Viola-Jones algorithm Use a mathematical model to place ten virtual markers on the subject’s face in defined locations Accuracy: (i) ELM: 88% (ii) PNN: 92% Achieved a high detection accuracy Only simple distance measure is used for emotion classification IEEE Xplore Murugappan et al., 2020 [60 ] K-nearest neighbors (KNN) & decision tree 55 subjects with six types of emotions Viola-Jones algorithm Use a mathematical model to place ten virtual markers on the subject’s face in defined locations Accuracy: 98.03% (i) Less computational complexity analysis (ii) Achieved a high detection accuracy Focus only on general emotions Google Scholar Rao and Rao, 2020 [61 ] CNN DAiSEE, JAFFE, and CK+ (i) Each video is cut into frames (ii) Resize to an image of size pixels (iii) Apply limited histogram equalization CNN+pose estimator Accuracy: (i) DAiSEE: 53.4% (ii) JAFFE: 71.4% (iii) CK+: 99.95% Achieved high detection accuracy when using the CK+ dataset Show a low recognition rate for frustration when using the DAiSEE dataset Google Scholar Hingu, 2020 [62 ] CNN FER-2013 Haar cascades CNN Accuracy: (i) Training set: 65% (ii) Validation set: 63% The feature extraction method outperforms the existing traditional approach (i) Focus only on general emotions (ii) Low detection accuracy IEEE Xplore Zakka and Vadapalli, 2020 [63 ] CNN FER-2013 Haar cascades CNN Accuracy: 64.43% Able to detect emotions in real time (i) Focus only on general emotions (ii) Low recognition accuracy Springer Liao et al., 2021 [64 ] Deep facial spatiotemporal network (DFSTN) DAiSEE MTCNN SE-ResNet-50 (SENet) Accuracy: 58.84% (i) The prediction effect is enhanced even in challenging circumstances (ii) Able to fuse facial spatiotemporal information (i) Low detection accuracy (ii) Data deficiencies and data imbalances IEEE Xplore Siam et al., 2021 [65 ] CNN FER-2013 (i) Resize images into (ii) Image augmentation (iii) Normalization CNN Accuracy: 69% Able to generate reviews from an image with multiple faces (i) Focus only on general emotions (ii) Low detection accuracy Google Scholar Li et al., 2021 [66 ] CNN FER-2013 Haar cascades CNN Accuracy: 72.4% Less complex model Focus only on general emotions IEEE Xplore Mohan et al., 2021 [67 ] Deep CNN (DCNN) FER-2013, JAFFE, CK+, KDEF, and RAF (i) Rotation by +5° (ii) Rotation by -5° (iii) Horizontal flip (iv) Gaussian noise DCNN Accuracy: (i) FER-2013: 78% (ii) JAFFE: 98% (iii) CK+: 98% (iv) KDEF: 96% (v) RAF: 83% Outperforms twenty-five baseline methods by considering the average time The performance is generally not as good as that in FER under a lab-controlled environment Springer Mohan et al., 2021 [68 ] CNN FER-2013, JAFFE, CK+, KDEF, and RAF Image resizing using bilinear interpolation CNN Accuracy: (i) FER-2013: 78.9% (ii) JAFFE: 96.7% (iii) CK+: 97.8% (iv) KDEF: 82.5% (v) RAF: 81.68% More preeminent in terms of accuracy and execution time compared to 21 state-of-the-art methods Focus only on general emotions Springer Mohan and Seal, 2021 [69 ] SVM, RF, KNN, MLP, Adaboost Real Life dataset (RL) and Bag-of-Lies (i) The facial regions of the selected frames are cropped (ii) Images are reshaped using bilinear interpolation (iii) LDP face images are concatenated and resized using bilinear interpolation N/A Accuracy: (i) Bag-of-Lies (video data+audio data+EEG signals data): 70% (ii) RL dataset (video data+audio data): 76.07% Combining modalities are consistent with deception detection Small datasets used IEEE Xplore Mohan et al., 2022 [70 ] Deep CNN (DCNN) RL trail, Bag-of-Lies, MU3D (i) The facial regions of the selected frames are cropped (ii) Images are reshaped using bilinear interpolation (iii) LDP face images are concatenated and resized using bilinear interpolation DCNN Accuracy: (i) RL trail: 97% (ii) Bag-of-Lies: 96% (iii) MU3D: 98% Achieved a high detection accuracy Data scarcity Springer Shen et al., 2022 [71 ] Squeeze and excitation-deep adaptation networks (SE-DAN) JAFFE, CK+, and RAF-DB Random rotation and horizontal flip on RAF-DB SE-CNN Accuracy: 56% (i) The accuracy is higher than Alexnet, VGG-16, SE-CNN, and DAN (ii) Can be used for transfer learning and domain adaptation Focus only on general emotions Springer Gupta et al., 2022 [72 ] DCNN (ResNet-50, VGG19, Inception-V3) FER-2013, CK+, RAF-DB, and own dataset (consists of 1800 coloured images with emotions such as angry, sad, happy, neutral, surprised, and fear) Automatic frame selection MediaPipe face mesh Accuracy: (i) ResNet-50: 92.32% (ii) VGG19: 90.14% (iii) Inception-V3: 89.11% Outperforms all other models for FER in real-time learning scenarios Focus only on engagement detection IEEE Xplore Savchenko et al., 2022 [73 ] CNN AffectNet Rotate cropped facial images to align them based on eyes position CNN Accuracy: 70.23% Much faster and can be implemented for real-time processing Less accurate when compared to the best-known multimodal ensembles on the AFEW and VGAF datasets Google Scholar Hou et al., 2022 [74 ] CNN (VGG16) FER-2013 and CK+ MTCNN VGG16 Accuracy: (i) FER-2013: 67.4% (ii) CK+: 99.18% The accuracy of VGG16+ECANet is 2.76% higher than VGG16 itself Slow algorithm’s running speed Google Scholar Yuan, 2022 [75 ] MTCNN RAF-DB, masked dataset, and classroom dataset Histogram equalization MTCNN Accuracy: 93.53% (i) Able to detect multiple faces in a single image (ii) The pressure of data storage is alleviated, and the collection workload is reduced Small sample size of the dataset Google Scholar Wu, 2022 [76 ] CNN The self-collected dataset consists of 1073 images with expressions such as boredom, surprise, happy, confusion, and neutral (i) Convert pictures into grayscale (ii) Resize into pixels CNN Accuracy: 80% Perform well in recognizing emotions such as happy, surprised, and neutral Dataset insufficiency