Research Article
Predicting Coronavirus Pandemic in Real-Time Using Machine Learning and Big Data Streaming System
Table 2
The performance of the DT model.
| Feature extraction method | Matrix size | Testing performance | Cross-validation performance | Accuracy | Precision | Recall | F1-score | Accuracy | Precision | Recall | F1-score |
| Unigram | 1000 | 78.91 | 78.91 | 78.91 | 78.48 | 82.85 ± 0.37 | 83.32 ± 0.49 | 82.89 ± 0.35 | 82.38 ± 0.56 | 3000 | 80.31 | 80.09 | 80.31 | 80.06 | 87.09 ± 0.75 | 87.14 ± 0.66 | 87.15 ± 0.66 | 86.84 ± 0.66 | Bigram | 1000 | 78.34 | 78.32 | 78.34 | 77.96 | 82.17 ± 0.22 | 82.72 ± 0.29 | 82.09 ± 0.28 | 81.78 ± 0.26 | 3000 | 81.13 | 80.91 | 81.13 | 80.88 | 85.76 ± 0.5 | 85.86 ± 0.45 | 85.86 ± 0.47 | 85.59 ± 0.58 | Trigram | 1000 | 77.92 | 77.92 | 77.92 | 77.53 | 82.23 ± 0.53 | 82.84 ± 0.52 | 82.25 ± 0.48 | 81.93 ± 0.47 | 3000 | 80.31 | 80.1 | 80.31 | 80.09 | 86.23 ± 0.87 | 86.13 ± 0.86 | 86.09 ± 0.81 | 85.98 ± 0.87 | Four-gram | 1000 | 77.97 | 77.96 | 77.97 | 77.6 | 81.32 ± 0.59 | 81.93 ± 0.43 | 81.37 ± 0.49 | 81.14 ± 0.53 | 3000 | 80.37 | 80.15 | 80.37 | 80.09 | 85.73 ± 0.75 | 85.66 ± 0.74 | 85.73 ± 0.82 | 85.45 ± 0.72 |
|
|