Research Article
Effective Preprocessing and Normalization Techniques for COVID-19 Twitter Streams with POS Tagging via Lightweight Hidden Markov Model
Table 11
Proposed method on normalizing repeated characters, abbreviations, and misspelled words.
| Techniques | Abbreviations | Repeated characters | Misspelled words |
| Ground truth | 308 | 672 | 1128 | Regular expression | — | 462 | — | Replace() function using WordNet | 108 | 253 | 749 | Expanding abbreviations by CSV file Replacement | 247 | — | — | NLTK library | 210 | 319 | 561 | Proposed model | 281 | 590 | 1036 |
|
|