Research Article

An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting

Table 6

Model training result efficiency compared.

N-gram model (N)Number of gramFeature factor selection in P-TF-IDF algorithmAccuracyTraining time (s)P-TF-IDF after filtering enter percentage of words (%)Word count

N = 2 (Bi-gram)2.2 ∗ 109Add word frequency factor and location factor0.902627.522547071
N = 3 (Tri -gram)1.0 ∗ 1014Add word frequency factor and location factor0.901534.47
N = 4 (4-gram)4.9 ∗ 1018Add word frequency factor and location factor0.901547.17
N = 5 (5-gram)2.3 ∗ 1023Add word frequency factor and location factor0.899752.55

N = 2 (Bi-gram)8.4 ∗ 109Add word frequency factor and location factor0.909720.695091711
N = 3 (Tri -gram)7.7 ∗ 1014Add word frequency factor and location factor0.907331.22
N = 4 (4-gram)7.0 ∗ 1019Add word frequency factor and location factor0.905633.64
N = 5 (5-gram)6.4 ∗ 1024Add word frequency factor and location factor0.903845.79

N = 2 (Bi-gram)1.7 ∗ 1015Add word frequency factor and location factor0.902123.5275132660
N = 3 (Tri -gram)2.3 ∗ Add word frequency factor and location factor0.902136.67
N = 4 (4-gram)3.0 ∗ Add word frequency factor and location factor0.900348.61
N = 5 (5-gram)4.1 ∗ Add word frequency factor and location factor0.898558.81

N = 2 (Bi-gram)3.6 ∗ None0.902128.26100191297
N = 3 (Tri -gram)7.0 ∗ None0.901136.31
N = 4 (4-gram)1.3 ∗ None0.900149.59
N = 5 (5-gram)2.5 ∗ 1026None0.893768.58