About this Journal Submit a Manuscript Table of Contents
ISRN Artificial Intelligence
Volume 2013 (2013), Article ID 829630, 17 pages
http://dx.doi.org/10.1155/2013/829630
Research Article

Gamma-Poisson Distribution Model for Text Categorization

Department of Information Science, Faculty of Arts and Sciences, Showa University, 4562 Kamiyoshida, Fujiyoshida City, Yamanashi 403-0005, Japan

Received 29 January 2013; Accepted 4 March 2013

Academic Editors: K. W. Chau, C. Chen, G. L. Foresti, and M. Loog

Copyright © 2013 Hiroshi Ogura et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. K. Church and W. A. Gale, “Inverse Document Frequency (IDF): a measure of deviations from poisson,” in Proceedings of the 3rd Workshop on Very Large Corpora, pp. 121–130, 1995.
  2. H. Ogura, H. Amano, and M. Kondo, “Feature selection with a measure of deviations from Poisson in text categorization,” Expert Systems with Applications, vol. 36, no. 3, pp. 6826–6832, 2009. View at Publisher · View at Google Scholar · View at Scopus
  3. H. Ogura, H. Amano, and M. Kondo, “Distinctive characteristics of a metric using deviations from Poisson for feature selection,” Expert Systems with Applications, vol. 37, no. 3, pp. 2273–2281, 2010. View at Publisher · View at Google Scholar · View at Scopus
  4. H. Ogura, H. Amano, and M. Kondo, “Comparison of metrics for feature selection in imbalanced text classification,” Expert Systems with Applications, vol. 38, no. 5, pp. 4978–4989, 2011. View at Publisher · View at Google Scholar · View at Scopus
  5. K. Church and W. A. Gale, “Poisson mixtures,” Natural Language Engineering, vol. 1, pp. 163–190, 1995.
  6. A. Gelman, B. Carlin, S. Stern, and B. Rubin, Bayesian Data Analysis (Texts in Statistical Science), Chapman and Hall/CRC, 2nd edition, 2003.
  7. R. E. Madsen, D. Kauchak, and C. Elkan, “Modeling word burstiness using the Dirichlet distribution,” in Proceedings of the 22nd International Conference on Machine Learning (ICML '05), pp. 545–552, August 2005. View at Publisher · View at Google Scholar · View at Scopus
  8. S. Clinchant and E. Gaussier, “The BNB distribution for text modeling,” in Proceedings of the Advances in Information Retrieval. 30th European Conference on IR Research, pp. 150–161, 2008.
  9. B. Allison, “An improved hierarchical Bayesian Model of Language for document classification,” in Proceedings of the 22nd International Conference on Computational Linguistics, pp. 25–32, 2008.
  10. S. Eyheramendy, D. Lewis, and D. Madigam, “On the naive bayes model for text categorization,” in Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics, pp. 332–339, 2003.
  11. T. Mitchell, Machine Learning, McGraw Hill, 1997.
  12. S. Kim, K. Han, H. Rim, and H. Myaeng, “Some effective techniques for naive Bayes text classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, pp. 1457–1466, 2006.
  13. T. Minka, “Estimating a Dirichlet distribution,” 2003, http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/.
  14. T. Joachims, Learning to Classify Text Using Support Vector Machines [Ph.D. thesis], Kluwer, 2002.
  15. J. Grim, J. Novovičová, and P. Somol, “Structural Poisson mixtures for classification of documents,” in Proceedings of the 19th International Conference on Pattern Recognition (ICPR '08), pp. 1–4, December 2008. View at Scopus
  16. A. Greenwood and D. Durand, “Aids for fitting the Gamma distribution by Maximum Likelihood,” Technometrics, vol. 2, p. 55, 1960.
  17. T. Minka, “Estimating a Gamma distribution,” 2002, http://research.microsoft.com/en-us/um/people/minka/papers/.
  18. J. D. M. Rennie, L. Shih, J. Teevan, and D. Karger, “Tackling the poor assumptions of naive bayes text classifiers,” in Proceedings of the 20th International Conference on Machine Learning, pp. 616–623, August 2003. View at Scopus
  19. K. Lang, “NewsWeeder: learning to filter netnews,” in Proceedings of the 12th International Machine Learning Conference, pp. 331–339, Morgan Kaufmann, 1995.
  20. D. Davidov, E. Gabrilovich, and S. Markovitch, “Parameterized generation of labeled datasets for text categorization based on a hierarchical directory,” in Proceedings of Sheffield SIGIR 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 250–257, July 2004. View at Scopus
  21. T. Joachims, “A probabilistic analysis of the Rocchio algorithm with TF-IDF for text categorization,” in Proceedings of the 40th International Conference on Machine Learning, pp. 143–151, 1997.
  22. A. M. Kibriya, E. Frank, B. Pfahringer, and G. Holmes, “Multinomial naive bayes for text categorization revisited,” in Proceedings of the 17th Australian Joint Conference on Artificial Intelligence, pp. 488–499, 2005. View at Scopus
  23. N. Slonim, G. Bejerano, S. Fine, and N. Tishby, “Discriminative feature selection via multiclass variable memory Markov model,” Eurasip Journal on Applied Signal Processing, vol. 2003, no. 2, pp. 93–102, 2003. View at Publisher · View at Google Scholar · View at Scopus
  24. N. A. Syed, H. Liu, and K. K. Suang, “Incremental learning with support vector machines,” in Proceedings of the Workshop on Support Vector Machines at the International Joint Conference on Artificial Intelligence, pp. 317–321, 1999.
  25. C. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval, Cambridge University Press, 2008.