Research Article
An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting
Table 6
Model training result efficiency compared.
| N-gram model (N) | Number of gram | Feature factor selection in P-TF-IDF algorithm | Accuracy | Training time (s) | P-TF-IDF after filtering enter percentage of words (%) | Word count |
| N = 2 (Bi-gram) | 2.2 ∗ 109 | Add word frequency factor and location factor | 0.9026 | 27.52 | 25 | 47071 | N = 3 (Tri -gram) | 1.0 ∗ 1014 | Add word frequency factor and location factor | 0.9015 | 34.47 | N = 4 (4-gram) | 4.9 ∗ 1018 | Add word frequency factor and location factor | 0.9015 | 47.17 | N = 5 (5-gram) | 2.3 ∗ 1023 | Add word frequency factor and location factor | 0.8997 | 52.55 |
| N = 2 (Bi-gram) | 8.4 ∗ 109 | Add word frequency factor and location factor | 0.9097 | 20.69 | 50 | 91711 | N = 3 (Tri -gram) | 7.7 ∗ 1014 | Add word frequency factor and location factor | 0.9073 | 31.22 | N = 4 (4-gram) | 7.0 ∗ 1019 | Add word frequency factor and location factor | 0.9056 | 33.64 | N = 5 (5-gram) | 6.4 ∗ 1024 | Add word frequency factor and location factor | 0.9038 | 45.79 |
| N = 2 (Bi-gram) | 1.7 ∗ 1015 | Add word frequency factor and location factor | 0.9021 | 23.52 | 75 | 132660 | N = 3 (Tri -gram) | 2.3 ∗ | Add word frequency factor and location factor | 0.9021 | 36.67 | N = 4 (4-gram) | 3.0 ∗ | Add word frequency factor and location factor | 0.9003 | 48.61 | N = 5 (5-gram) | 4.1 ∗ | Add word frequency factor and location factor | 0.8985 | 58.81 |
| N = 2 (Bi-gram) | 3.6 ∗ | None | 0.9021 | 28.26 | 100 | 191297 | N = 3 (Tri -gram) | 7.0 ∗ | None | 0.9011 | 36.31 | N = 4 (4-gram) | 1.3 ∗ | None | 0.9001 | 49.59 | N = 5 (5-gram) | 2.5 ∗ 1026 | None | 0.8937 | 68.58 |
|
|