About this Journal Submit a Manuscript Table of Contents
ISRN Artificial Intelligence
Volume 2013 (2013), Article ID 829630, 17 pages
http://dx.doi.org/10.1155/2013/829630
Research Article

Gamma-Poisson Distribution Model for Text Categorization

Department of Information Science, Faculty of Arts and Sciences, Showa University, 4562 Kamiyoshida, Fujiyoshida City, Yamanashi 403-0005, Japan

Received 29 January 2013; Accepted 4 March 2013

Academic Editors: K. W. Chau, C. Chen, G. L. Foresti, and M. Loog

Copyright © 2013 Hiroshi Ogura et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

We introduce a new model for describing word frequency distributions in documents for automatic text classification tasks. In the model, the gamma-Poisson probability distribution is used to achieve better text modeling. The framework of the modeling and its application to text categorization are demonstrated with practical techniques for parameter estimation and vector normalization. To investigate the efficiency of our model, text categorization experiments were performed on 20 Newsgroups, Reuters-21578, Industry Sector, and TechTC-100 datasets. The results show that the model allows performance comparable to that of the support vector machine and clearly exceeding that of the multinomial model and the Dirichlet-multinomial model. The time complexity of the proposed classifier and its advantage in practical applications are also discussed.