Human Behavior and Emerging Technologies

Review Article

Academic Emotion Classification Using FER: A Systematic Review

Table 2

Selected retrieved publications.


Retrieved from	Authors, year	Emotion classifier	Dataset	Preprocessing	Feature extraction	Results	Strength	Limitation

IEEE Xplore	G. Li and Wang, 2018 [30]	SVM+CNN	FER-2013	Histogram equalization; frontal_face_detector detector of the Dlib library is loaded for face detection.	Method based on the global feature & geometric features	N/A	Able to analyze the learners’ facial images in real time	Focus only on general emotions
IEEE Xplore	El Hammoumi et al., 2018 [31]	CNN	CK+ and the KDEF	OpenCV	CNN	Accuracy: 97.18%	Achieved high detection accuracy	Focus only on general emotions
IEEE Xplore	Ma et al., 2018 [32]	CNN	FER-2013	Cascaded Haar feature real-time facial detection algorithm based on AdaBoost	CNN	Identification error score: 0.14	Real-time detection	Focus only on general emotions
ScienceDirect	D. Yang et al., 2018 [33]	Neural network	JAFFE	Haar cascades	Neural network	Accuracy: (i) Sad: 76% (ii) Surprise: 87.72% (iii) Happy: 94% (iv) Anger: 87.66% (v) Disgust: 82.76 (vi) Fear: 79.73%	Achieved high classification accuracy for “happy” emotion	(i) Long processing time (ii) Does not involve the illumination and pose of the image
IEEE Xplore	Candra Kirana et al., 2018 [34]	Viola-Jones algorithm	Ten student expressions in the class	Viola-Jones algorithm	Viola-Jones algorithm	Accuracy: 74%	Fastest algorithm when compared to the other two algorithms, Viola-Jones+neural networks and neural networks	(i) Small dataset (ii) Has lower detection accuracy compared to more complex algorithms (Viola-Jones +neural networks) (iii) Only works on forward-facing faces
IEEE Xplore	Dewan et al., 2018 [35]	DBN	DAiSEE	Viola-Jones algorithm	Local directional pattern (LDP)	Accuracy: (i) Two level: 90.89% (ii) Three level: 87.25%	(i) Achieved high accuracy for two-level engagement detection (ii) Robustness in classification	Unknown direct correlation between engagement and actual task performance
Scopus	Sharma and Mansotra, 2019 [36]	CNN	FER-2013	Viola-Jones algorithm	Haar cascades extraction	N/A	Able to analyze emotions in real time	Focus only on general emotions
Scopus	Sharma and Mansotra, 2019 [37]	CNN	FER-2013 and emotional corpus	Viola-Jones algorithm	Haar cascades extraction	N/A	Achieved better classification accuracy	Focus only on general emotions
IEEE Xplore	Mao et al., 2019 [38]	SVM	617 images from 19 students (12 expressions)	Gray processing, histogram equalization, & scale normalization	Extended LBP algorithm (ELBP)	Accuracy: 98.16%	Outperforms the original LBP algorithm	Focus only on general emotions
ScienceDirect	Hung et al., 2019 [39]	CNN	FER-2013, JAFFE and KDEF	AdaBoost	CNN	Accuracy: 84.59%	Achieved high recognition accuracy	Focus only on general emotions
IEEE Xplore	Lasri et al., 2019 [40]	CNN	FER-2013	AdaBoost	CNN	Accuracy: 70%	Good in predicting happy and surprised emotions	Focus only on general emotions
Scopus	Tang et al., 2019 [41]	CNN	FER-2013	Convert each pixel grayscale face image into a string	CNN	Accuracy: 70.10%	(i) High detection accuracy and robustness (ii) Real-time evaluation students’ classroom performance	Focus only on general emotions
IEEE Xplore	Shi et al., 2019 [42]	CNN-SVM	82 students that learn different online courses	Face detection using Viola-Jones, image rotation, image scaling normalization	CNN	Accuracy: 93.80%	High predictive performance	Only considers two levels of confusion
IEEE Xplore	Healy et al., 2019 [43]	SVM	CK+ and MUG	Dlib library, which contains CNNs trained in face detection.	Dlib is used to extract the 68 landmark points from the detected face	Accuracy: 88.76%	Able to provide quick and reliable classification	(i) Focus only on general emotions (ii) Computationally intensive
Google scholar	Liang, 2019 [44]	SVM	Yale face of Kyushu University in Japan, BioID Face Database, JAFFE	Normalization of feature points	Active shape model (ASM)	Accuracy: 79.65%	Good recognition rate for some facial expressions	Focus only on general emotions
IEEE Xplore	Dash et al., 2019 [45]	CNN	DAiSEE	Viola-Jones algorithm	CNN	Accuracy: (i)Engaged: (ii) Not engaged:	Achieved high detection accuracy	Focus only on engagement detection
Google Scholar	Bian et al., 2019 [46]	CNN	OL-SFED	VGG16	CNN	Accuracy: 91.60%	Achieved high detection accuracy	Limited sample number
IEEE Xplore	Huang et al., 2019 [47]	BERN (combination of temporal convolution, bidirectional LSTM, and attention mechanism)	DAiSEE	N/A	Tracking the changes in face position and pixels through deep learning	Top 1 accuracy: 60% for four classification model	Achieved state-of-art performance compared with the benchmark (57.9%)	Requires a large amount of training data and a long training time
Springer	T. S and Guddeti, 2020 [48]	Hybrid CNN	Self-created dataset (engaged, boredom, and neutral)	Delete the blurred and repeated frames	N/A	Accuracy: (i) Posed: 86% (ii) Spontaneous: 70%	Outperforms the existing state-of-the-art methods	Long training time
Google Scholar	Mohamad Nezami et al., 2020 [49]	CNN	4627 engaged and disengaged sample	CNN-based face detection algorithm	N/A	Precision: 60.42%	Outperforms their previous engagement recognition method	Focus only on engagement detection
Scopus	Alrayassi and Shilbayeh, 2020 [50]	CNN	15 random students seeking admission in ADSM	AdaBoost	PCA algorithm	Accuracy: 56.60%	Able to detect happiness and no emotion	(i) Focus only on general emotions (ii) Difficult to detect sadness and surprised emotion
Google Scholar	Tang et al., 2020 [51]	CNN	JAFFE and CK+	CNN	CNN	Accuracy: (i) JAFFE: 92.68% (ii) CK+: 99.10%	Achieved high recognition accuracy	Focus only on general emotions
Google Scholar	Leong, 2020 [52]	LSTM	DAiSEE	MTCNN library—face detection & cropping	FaceNet model	Accuracy compared to the EmotioNet model: (i) Boredom: +16.26% (ii) Frustration -2.42%	(i) Improved accuracy for boredom emotion	(i) Decreased accuracy for frustration emotion (ii) Involved negative emotions only
Springer	Zatarain Cabada et al., 2020 [53]	CNN	Database Insight (dbI)	CNN	Local binary patterns (LBP), geometric-based feature extraction, and convolutional filters (CF)	Accuracy: 82%	Demonstrate an 8% improvement in accuracy over a previous work that used a trial and error method	Imbalanced dataset
Springer	Zhu and Chen, 2020 [54]	Hybrid deep neural network (hybrid DNN)	JAFFE, CK+, and FER-2013	Increase the original data by eight times, including the original image, flipped image, and rotated images of the six angles	Face++ detect API	Accuracy: 83.90%	Show high recognition accuracy	Requires a large amount of training data
Scopus	Wang et al., 2020 [7]	CNN	CK+, DISFA, DISFA+	IntraFace	CNN	N/A	Performs robustly in various environments	Focus only on general emotions
IEEE Xplore	Dubbaka and Gopalan, 2020 [55]	CNN	DISFA+	(i) Grayscale and reduced into pixels input dimensions (ii) Split into the upper, lower, and whole face	Using Dlib package	Accuracy: 95%	Achieved a high detection accuracy	Obstructions on faces limited the model’s performance
Springer	Pise et al., 2020 [56]	Temporal relation network (TRN)	30 samples of different individual’s frontal images with four emotion types	Scale, align, and normalize the samples	Base (SqueezeNet) CNN	Accuracy: 91.30%	Achieved a high detection accuracy	(i) Prone to underfitting issues due to a small dataset (ii) Focus only on general emotions
Google Scholar	Sabri, 2020 [57]	Support vector regression (SVR)	JAFFE	Viola-Jones algorithm	Gray-level cooccurrence matrix (GLCM)	Accuracy: 99.16%	(i) Less susceptible to overfitting (ii) Achieved a high detection accuracy	(i) Focus only on general emotions (ii) Computationally expensive
Scopus	Kumar et al., 2020 [58]	SVM	JAFFE, CK+, and FER-2013	Kanade-Lucas-Tomasi algorithm	Gabor filter	Accuracy: 62%	Depression detection is included	Focus only on general emotions
IEEE Xplore	Murugappan et al., 2020 [59]	Extreme learning machine (ELM) and probabilistic neural network (PNN)	55 undergraduate university students	Viola-Jones algorithm	Use a mathematical model to place ten virtual markers on the subject’s face in defined locations	Accuracy: (i) ELM: 88% (ii) PNN: 92%	Achieved a high detection accuracy	Only simple distance measure is used for emotion classification
IEEE Xplore	Murugappan et al., 2020 [60]	K-nearest neighbors (KNN) & decision tree	55 subjects with six types of emotions	Viola-Jones algorithm	Use a mathematical model to place ten virtual markers on the subject’s face in defined locations	Accuracy: 98.03%	(i) Less computational complexity analysis (ii) Achieved a high detection accuracy	Focus only on general emotions
Google Scholar	Rao and Rao, 2020 [61]	CNN	DAiSEE, JAFFE, and CK+	(i) Each video is cut into frames (ii) Resize to an image of size pixels (iii) Apply limited histogram equalization	CNN+pose estimator	Accuracy: (i) DAiSEE: 53.4% (ii) JAFFE: 71.4% (iii) CK+: 99.95%	Achieved high detection accuracy when using the CK+ dataset	Show a low recognition rate for frustration when using the DAiSEE dataset
Google Scholar	Hingu, 2020 [62]	CNN	FER-2013	Haar cascades	CNN	Accuracy: (i) Training set: 65% (ii) Validation set: 63%	The feature extraction method outperforms the existing traditional approach	(i) Focus only on general emotions (ii) Low detection accuracy
IEEE Xplore	Zakka and Vadapalli, 2020 [63]	CNN	FER-2013	Haar cascades	CNN	Accuracy: 64.43%	Able to detect emotions in real time	(i) Focus only on general emotions (ii) Low recognition accuracy
Springer	Liao et al., 2021 [64]	Deep facial spatiotemporal network (DFSTN)	DAiSEE	MTCNN	SE-ResNet-50 (SENet)	Accuracy: 58.84%	(i) The prediction effect is enhanced even in challenging circumstances (ii) Able to fuse facial spatiotemporal information	(i) Low detection accuracy (ii) Data deficiencies and data imbalances
IEEE Xplore	Siam et al., 2021 [65]	CNN	FER-2013	(i) Resize images into (ii) Image augmentation (iii) Normalization	CNN	Accuracy: 69%	Able to generate reviews from an image with multiple faces	(i) Focus only on general emotions (ii) Low detection accuracy
Google Scholar	Li et al., 2021 [66]	CNN	FER-2013	Haar cascades	CNN	Accuracy: 72.4%	Less complex model	Focus only on general emotions
IEEE Xplore	Mohan et al., 2021 [67]	Deep CNN (DCNN)	FER-2013, JAFFE, CK+, KDEF, and RAF	(i) Rotation by +5° (ii) Rotation by -5° (iii) Horizontal flip (iv) Gaussian noise	DCNN	Accuracy: (i) FER-2013: 78% (ii) JAFFE: 98% (iii) CK+: 98% (iv) KDEF: 96% (v) RAF: 83%	Outperforms twenty-five baseline methods by considering the average time	The performance is generally not as good as that in FER under a lab-controlled environment
Springer	Mohan et al., 2021 [68]	CNN	FER-2013, JAFFE, CK+, KDEF, and RAF	Image resizing using bilinear interpolation	CNN	Accuracy: (i) FER-2013: 78.9% (ii) JAFFE: 96.7% (iii) CK+: 97.8% (iv) KDEF: 82.5% (v) RAF: 81.68%	More preeminent in terms of accuracy and execution time compared to 21 state-of-the-art methods	Focus only on general emotions
Springer	Mohan and Seal, 2021 [69]	SVM, RF, KNN, MLP, Adaboost	Real Life dataset (RL) and Bag-of-Lies	(i) The facial regions of the selected frames are cropped (ii) Images are reshaped using bilinear interpolation (iii) LDP face images are concatenated and resized using bilinear interpolation	N/A	Accuracy: (i) Bag-of-Lies (video data+audio data+EEG signals data): 70% (ii) RL dataset (video data+audio data): 76.07%	Combining modalities are consistent with deception detection	Small datasets used
IEEE Xplore	Mohan et al., 2022 [70]	Deep CNN (DCNN)	RL trail, Bag-of-Lies, MU3D	(i) The facial regions of the selected frames are cropped (ii) Images are reshaped using bilinear interpolation (iii) LDP face images are concatenated and resized using bilinear interpolation	DCNN	Accuracy: (i) RL trail: 97% (ii) Bag-of-Lies: 96% (iii) MU3D: 98%	Achieved a high detection accuracy	Data scarcity
Springer	Shen et al., 2022 [71]	Squeeze and excitation-deep adaptation networks (SE-DAN)	JAFFE, CK+, and RAF-DB	Random rotation and horizontal flip on RAF-DB	SE-CNN	Accuracy: 56%	(i) The accuracy is higher than Alexnet, VGG-16, SE-CNN, and DAN (ii) Can be used for transfer learning and domain adaptation	Focus only on general emotions
Springer	Gupta et al., 2022 [72]	DCNN (ResNet-50, VGG19, Inception-V3)	FER-2013, CK+, RAF-DB, and own dataset (consists of 1800 coloured images with emotions such as angry, sad, happy, neutral, surprised, and fear)	Automatic frame selection	MediaPipe face mesh	Accuracy: (i) ResNet-50: 92.32% (ii) VGG19: 90.14% (iii) Inception-V3: 89.11%	Outperforms all other models for FER in real-time learning scenarios	Focus only on engagement detection
IEEE Xplore	Savchenko et al., 2022 [73]	CNN	AffectNet	Rotate cropped facial images to align them based on eyes position	CNN	Accuracy: 70.23%	Much faster and can be implemented for real-time processing	Less accurate when compared to the best-known multimodal ensembles on the AFEW and VGAF datasets
Google Scholar	Hou et al., 2022 [74]	CNN (VGG16)	FER-2013 and CK+	MTCNN	VGG16	Accuracy: (i) FER-2013: 67.4% (ii) CK+: 99.18%	The accuracy of VGG16+ECANet is 2.76% higher than VGG16 itself	Slow algorithm’s running speed
Google Scholar	Yuan, 2022 [75]	MTCNN	RAF-DB, masked dataset, and classroom dataset	Histogram equalization	MTCNN	Accuracy: 93.53%	(i) Able to detect multiple faces in a single image (ii) The pressure of data storage is alleviated, and the collection workload is reduced	Small sample size of the dataset
Google Scholar	Wu, 2022 [76]	CNN	The self-collected dataset consists of 1073 images with expressions such as boredom, surprise, happy, confusion, and neutral	(i) Convert pictures into grayscale (ii) Resize into pixels	CNN	Accuracy: 80%	Perform well in recognizing emotions such as happy, surprised, and neutral	Dataset insufficiency