- About this Journal ·
- Aims and Scope ·
- Article Processing Charges ·
- Articles in Press ·
- Author Guidelines ·
- Bibliographic Information ·
- Citations to this Journal ·
- Contact Information ·
- Editorial Board ·
- Editorial Workflow ·
- Free eTOC Alerts ·
- Publication Ethics ·
- Reviewers Acknowledgment ·
- Submit a Manuscript ·
- Subscription Information ·
- Table of Contents
ISRN Artificial Intelligence
Volume 2013 (2013), Article ID 829630, 17 pages
Gamma-Poisson Distribution Model for Text Categorization
Department of Information Science, Faculty of Arts and Sciences, Showa University, 4562 Kamiyoshida, Fujiyoshida City, Yamanashi 403-0005, Japan
Received 29 January 2013; Accepted 4 March 2013
Academic Editors: K. W. Chau, C. Chen, G. L. Foresti, and M. Loog
Copyright © 2013 Hiroshi Ogura et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- K. Church and W. A. Gale, “Inverse Document Frequency (IDF): a measure of deviations from poisson,” in Proceedings of the 3rd Workshop on Very Large Corpora, pp. 121–130, 1995.
- H. Ogura, H. Amano, and M. Kondo, “Feature selection with a measure of deviations from Poisson in text categorization,” Expert Systems with Applications, vol. 36, no. 3, pp. 6826–6832, 2009.
- H. Ogura, H. Amano, and M. Kondo, “Distinctive characteristics of a metric using deviations from Poisson for feature selection,” Expert Systems with Applications, vol. 37, no. 3, pp. 2273–2281, 2010.
- H. Ogura, H. Amano, and M. Kondo, “Comparison of metrics for feature selection in imbalanced text classification,” Expert Systems with Applications, vol. 38, no. 5, pp. 4978–4989, 2011.
- K. Church and W. A. Gale, “Poisson mixtures,” Natural Language Engineering, vol. 1, pp. 163–190, 1995.
- A. Gelman, B. Carlin, S. Stern, and B. Rubin, Bayesian Data Analysis (Texts in Statistical Science), Chapman and Hall/CRC, 2nd edition, 2003.
- R. E. Madsen, D. Kauchak, and C. Elkan, “Modeling word burstiness using the Dirichlet distribution,” in Proceedings of the 22nd International Conference on Machine Learning (ICML '05), pp. 545–552, August 2005.
- S. Clinchant and E. Gaussier, “The BNB distribution for text modeling,” in Proceedings of the Advances in Information Retrieval. 30th European Conference on IR Research, pp. 150–161, 2008.
- B. Allison, “An improved hierarchical Bayesian Model of Language for document classification,” in Proceedings of the 22nd International Conference on Computational Linguistics, pp. 25–32, 2008.
- S. Eyheramendy, D. Lewis, and D. Madigam, “On the naive bayes model for text categorization,” in Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics, pp. 332–339, 2003.
- T. Mitchell, Machine Learning, McGraw Hill, 1997.
- S. Kim, K. Han, H. Rim, and H. Myaeng, “Some effective techniques for naive Bayes text classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, pp. 1457–1466, 2006.
- T. Minka, “Estimating a Dirichlet distribution,” 2003, http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/.
- T. Joachims, Learning to Classify Text Using Support Vector Machines [Ph.D. thesis], Kluwer, 2002.
- J. Grim, J. Novovičová, and P. Somol, “Structural Poisson mixtures for classification of documents,” in Proceedings of the 19th International Conference on Pattern Recognition (ICPR '08), pp. 1–4, December 2008.
- A. Greenwood and D. Durand, “Aids for fitting the Gamma distribution by Maximum Likelihood,” Technometrics, vol. 2, p. 55, 1960.
- T. Minka, “Estimating a Gamma distribution,” 2002, http://research.microsoft.com/en-us/um/people/minka/papers/.
- J. D. M. Rennie, L. Shih, J. Teevan, and D. Karger, “Tackling the poor assumptions of naive bayes text classifiers,” in Proceedings of the 20th International Conference on Machine Learning, pp. 616–623, August 2003.
- K. Lang, “NewsWeeder: learning to filter netnews,” in Proceedings of the 12th International Machine Learning Conference, pp. 331–339, Morgan Kaufmann, 1995.
- D. Davidov, E. Gabrilovich, and S. Markovitch, “Parameterized generation of labeled datasets for text categorization based on a hierarchical directory,” in Proceedings of Sheffield SIGIR 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 250–257, July 2004.
- T. Joachims, “A probabilistic analysis of the Rocchio algorithm with TF-IDF for text categorization,” in Proceedings of the 40th International Conference on Machine Learning, pp. 143–151, 1997.
- A. M. Kibriya, E. Frank, B. Pfahringer, and G. Holmes, “Multinomial naive bayes for text categorization revisited,” in Proceedings of the 17th Australian Joint Conference on Artificial Intelligence, pp. 488–499, 2005.
- N. Slonim, G. Bejerano, S. Fine, and N. Tishby, “Discriminative feature selection via multiclass variable memory Markov model,” Eurasip Journal on Applied Signal Processing, vol. 2003, no. 2, pp. 93–102, 2003.
- N. A. Syed, H. Liu, and K. K. Suang, “Incremental learning with support vector machines,” in Proceedings of the Workshop on Support Vector Machines at the International Joint Conference on Artificial Intelligence, pp. 317–321, 1999.
- C. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval, Cambridge University Press, 2008.