Computational Intelligence and Neuroscience

Research Article

A Hybrid Feature Extraction Method for Nepali COVID-19-Related Tweets Classification

Table 3

Comparison of performance of three feature extraction methods with nine machine learning algorithms in terms of classification performance (%).


Classifiers	FastText				TF-IDF				Hybrid
Classifiers	P	R	F	A	P	R	F	A	P	R	F	A

LR	58.4	58.5	54.4	58.5	62.9	64.4	60.7	64.4	65.9	68.1	65.9	68.1
K-NN	65.2	65.2	65.2	60.2	58.0	61.0	58.0	60.1	64.1	66.0	63.6	66.0
NB	54.1	50.0	50.1	49.6	65.1	52.1	55.0	52.0	62.4	55.5	57.7	55.6
DT	48.4	48.0	48.2	48.0	58.1	60.5	58.6	59.5	57.0	57.9	57.4	57.9
RF	63.1	65.4	62.5	64.8	63.1	65.1	62.0	64.0	68.7	67.9	64.6	67.9
ETC	63.5	61.6	58.4	61.6	63.1	64.9	62.7	64.9	69.2	67.9	64.8	67.9
AdaBoost	54.2	56.1	52.9	56.1	63.7	62.7	58.2	59.6	60.1	62.9	60.1	62.9
MLP-NN	61.3	60.9	56.5	60.9	58.6	60.6	59.2	60.6	66.5	66.4	66.4	66.4
SVM + Linear	51.2	58.3	52.2	57.8	62.1	64.3	59.2	64.2	66.0	68.0	64.6	68.0
SVM + RBF	62.1	58.1	53.0	58.0	66.0	66.0	62.1	65.1	71.4	72.1	70.1	72.1

Note that P, R, F, and A denote overall Precision, Recall, F1-score, and Accuracy for three types of feature extraction methods (FastText, TF-IDF, and Hybrid), respectively. The best hyperparameters of each machine learning algorithm are as follows: LR (C:10, solver: lbfgs, and max_iteration: 2000), K-NN (leaf_size: 35, n_neighbour: 120, p: 1), DT (criterion: gini, min_sample_leaf: 10, and min_sample_split: 2), RF (min_sample_split: 6, min_sample_leaf: 3), ETC (min_sample_leaf: 1, min_sample_split: 2, and n_estimator: 200), AdaBoost (learning_rate: 0.8, n_estimator: 100), MLP-NN (hidden_layer_size: 20, learning_rate_init: 0.01, solver: Adam, and max_iteration: 2000), SVM + Linear (c: 1, Gamma: 0.1), and SVM + RBF (c: 100, Gamma: 0.1). The highest metrics are highlighted in boldface.