Research Article

A Hybrid Feature Extraction Method for Nepali COVID-19-Related Tweets Classification

Table 3

Comparison of performance of three feature extraction methods with nine machine learning algorithms in terms of classification performance (%).

ClassifiersFastTextTF-IDFHybrid
PRFAPRFAPRFA

LR58.458.554.458.562.964.460.764.465.968.165.968.1
K-NN65.265.265.260.258.061.058.060.164.166.063.666.0
NB54.150.050.149.665.152.155.052.062.455.557.755.6
DT48.448.048.248.058.160.558.659.557.057.957.457.9
RF63.165.462.564.863.165.162.064.068.767.964.667.9
ETC63.561.658.461.663.164.962.764.969.267.964.867.9
AdaBoost54.256.152.956.163.762.758.259.660.162.960.162.9
MLP-NN61.360.956.560.958.660.659.260.666.566.466.466.4
SVM + Linear51.258.352.257.862.164.359.264.266.068.064.668.0
SVM + RBF62.158.153.058.066.066.062.165.171.472.170.172.1

Note that P, R, F, and A denote overall Precision, Recall, F1-score, and Accuracy for three types of feature extraction methods (FastText, TF-IDF, and Hybrid), respectively. The best hyperparameters of each machine learning algorithm are as follows: LR (C:10, solver: lbfgs, and max_iteration: 2000), K-NN (leaf_size: 35, n_neighbour: 120, p: 1), DT (criterion: gini, min_sample_leaf: 10, and min_sample_split: 2), RF (min_sample_split: 6, min_sample_leaf: 3), ETC (min_sample_leaf: 1, min_sample_split: 2, and n_estimator: 200), AdaBoost (learning_rate: 0.8, n_estimator: 100), MLP-NN (hidden_layer_size: 20, learning_rate_init: 0.01, solver: Adam, and max_iteration: 2000), SVM + Linear (c: 1, Gamma: 0.1), and SVM + RBF (c: 100, Gamma: 0.1). The highest metrics are highlighted in boldface.