Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2016, Article ID 7819626, 12 pages
http://dx.doi.org/10.1155/2016/7819626
Research Article

Improved Feature Weight Algorithm and Its Application to Text Classification

1School of Computer Science, Communication University of China, Beijing 100024, China
2School of Computer, Faculty of Science and Engineering, Communication University of China, Beijing 100024, China

Received 2 November 2015; Revised 1 February 2016; Accepted 3 March 2016

Academic Editor: Andrzej Swierniak

Copyright © 2016 Songtao Shang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Text preprocessing is one of the key problems in pattern recognition and plays an important role in the process of text classification. Text preprocessing has two pivotal steps: feature selection and feature weighting. The preprocessing results can directly affect the classifiers’ accuracy and performance. Therefore, choosing the appropriate algorithm for feature selection and feature weighting to preprocess the document can greatly improve the performance of classifiers. According to the Gini Index theory, this paper proposes an Improved Gini Index algorithm. This algorithm constructs a new feature selection and feature weighting function. The experimental results show that this algorithm can improve the classifiers’ performance effectively. At the same time, this algorithm is applied to a sensitive information identification system and has achieved a good result. The algorithm’s precision and recall are higher than those of traditional ones. It can identify sensitive information on the Internet effectively.