Research Article

Validation of Text Data Preprocessing Using a Neural Network Model

Table 1

Text preprocessing technique.


NormalizationLoweringConversion to lowercasePros: search accuracy can be improved
Cons: proper nouns composed of capital letters can be incorrectly classified as general nouns
StemmingConversion to stemsPros: time efficiency can be improved by reducing the size of the text
Cons: dilution of meaning can affect accuracy
LemmatizationConversion to headwordsPros: part-of-speech information is converted into a preserved form, and search accuracy can be improved
Cons: conversion time is long

PunctuationSplittingWord splittingPros: meaning can be preserved
Cons: different rules should be applied depending on the purpose, and the rules are complicated
MergingWord merging