Review Article

Academic Emotion Classification Using FER: A Systematic Review

Table 3

Summary of the advantages and disadvantages of the emotion classifier algorithms.

FER classification methodAlgorithmAdvantageDisadvantage

Conventional machine learning algorithmViola-Jones algorithmFast face detection algorithm [34]Has lower detection accuracy compared to more complex algorithms [34]
Support vector machine (SVM)High accuracy in classification tasks [77]Computationally intensive, especially when dealing with large datasets or complex models [77]
Support vector regression (SVR)Less susceptible to overfitting [57]Computationally expensive, particularly for large datasets [78]
Extreme learning machine (ELM)Perform faster in classification [59]Takes more computational time and has a lower accuracy than PNN [59]
Probabilistic neural network (PNN)Takes less computational time than ELM and is more efficient in classification [59]Possible overfitting of the data [59]
Decision treeAble to handle missing data by not incorporating the missing feature during the decision-making process [79]More computation time compared to KNN [60]
K-nearest neighbors (KNN)Achieved higher accuracy when compared to the decision tree algorithm and lesser computation time [60]Slower performance compared to decision tree [60]
Multilayer perceptron (MLP)Capable of adaptive learning and optimal processing [80]Lower classification accuracy compared to the random forest algorithm [69]
Adaptive boosting (AdaBoost)Enhance the performance of classification out of weak learners [81]Susceptible to noisy data [82]
Random forest (RF)Robust to noisy data or outliers [83]Prone to overfitting [84]

Deep learning algorithmConvolutional neural networks (CNN)Effective at handling complex image and video data [61]Requires a large amount of training data and significant data augmentation to avoid overfitting [85]
Neural networkAble to achieve high classification accuracy [33]Long processing time [33]
Deep belief network (DBN)Robustness in classification [35]Requires large amounts of training data [35]
Long short-term memory (LSTM)Addressed the issue of vanishing gradients [52]Slow computational speed for complex architectures [52]
Temporal relation network (TRN)Able to achieve state-of-the-art performance on FER benchmarks [56]Prone to overfitting/underfitting on small datasets [56]
Deep facial spatiotemporal network (DFSTN)Able to fuse facial spatiotemporal information [64]Require larger amounts of training data to learn effective feature representations and to avoid overfitting [64]
Deep CNN (DCNN)Effective at learning complex features from raw image data [86]Difficult to interpret, as it might be challenging to understand the underlying mechanisms behind the model’s decision-making process [87]
Squeeze and excitation-deep adaptation networks (SE-DAN)Can be used for transfer learning and domain adaptation [71]Require a significant amount of computational resources and time to train [71]
Multitask cascaded convolutional neural network (MTCNN)High accuracy in detection and classification tasks; able to detect multiple faces in a single image [75]Require a large amount of training data to achieve high accuracy [75]

Hybrid algorithmSupport vector machine+convolutional neural network (SVM+CNN)Enhance the performance of classification compared to using just one of these algorithms alone [30]Hyperparameter tuning of this combination of two algorithms can be challenging and time-consuming [88]
BERN (combination of temporal convolution, bidirectional LSTM, and attention mechanism)Achieved state-of-art performance [47]Requires a large amount of training data and a long training time [47]
Hybrid convolutional neural network (hybrid CNN)More robust to variations in input data [89]Long training time [48]
Hybrid deep neural network (hybrid DNN)Able to handle a wide range of data types and classification tasks [90]Requires a large amount of training data [54]