Table of Contents Author Guidelines Submit a Manuscript
Computational Intelligence and Neuroscience
Volume 2016, Article ID 1096271, 11 pages
http://dx.doi.org/10.1155/2016/1096271
Research Article

Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure

1Center on Big Data Sciences, Beijing University of Chemical Technology, Beijing 100039, China
2Institute of Policy and Management, Chinese Academy of Sciences, Beijing 100190, China

Received 2 March 2016; Accepted 8 June 2016

Academic Editor: Toshihisa Tanaka

Copyright © 2016 Wen Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. C. White, “Consolidating, accessing and analyzing unstructured data,” http://www.b-eye-network.com/view/2098.
  2. R. Rahimi, A. Shakery, and I. King, “Extracting translations from comparable corpora for Cross-Language Information Retrieval using the language modeling framework,” Information Processing & Management, vol. 52, no. 2, pp. 299–318, 2016. View at Publisher · View at Google Scholar · View at Scopus
  3. M. W. Berry, S. T. Dumais, and G. W. O'Brien, “Using linear algebra for intelligent information retrieval,” SIAM Review, vol. 37, no. 4, pp. 573–595, 1995. View at Publisher · View at Google Scholar · View at MathSciNet
  4. M. T. Hassan, A. Karim, J.-B. Kim, and M. Jeon, “CDIM: document clustering by discrimination information maximization,” Information Sciences, vol. 316, no. 20, pp. 87–106, 2015. View at Publisher · View at Google Scholar · View at Scopus
  5. S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” Journal of the American Society for Information Science, vol. 41, no. 6, pp. 391–407, 1990. View at Publisher · View at Google Scholar
  6. C. Laclau and M. Nadif, “Hard and fuzzy diagonal co-clustering for document-term partitioning,” Neurocomputing, vol. 193, pp. 133–147, 2016. View at Publisher · View at Google Scholar
  7. G. H. Golub and C. F. von Loan, Matrix Computations, The John Hopkins University Press, 3rd edition, 1996.
  8. L. Yue, W. Zuo, T. Peng, Y. Wang, and X. Han, “A fuzzy document clustering approach based on domain-specified ontology,” Data and Knowledge Engineering, vol. 100, pp. 148–166, 2015. View at Publisher · View at Google Scholar · View at Scopus
  9. R. K. Ando, “Latent semantic space: iterative scaling imrpoves precision of inter-document similarity measurement,” in Proceedings of the 23rd ACM International SIGIR Conference on Research and Development in Information Retrieval (SIGIR '00), pp. 216–223, Athens, Greece, July 2000.
  10. H. Yan, W. I. Grosky, and F. Fotouhi, “Augmenting the power of LSI in text retrieval: singular value rescaling,” Data and Knowledge Engineering, vol. 65, no. 1, pp. 108–125, 2008. View at Publisher · View at Google Scholar · View at Scopus
  11. F. Jiang and M. L. Littman, “Approximate dimension equalization in vector-based information retrieval,” in Proceedings of the 17th International Conference on Machine Learning (ICML '00), pp. 423–430, Stanford, Calif, USA, 2000.
  12. T. G. Kolda and D. P. O'Leary, “A semidiscrete matrix decomposition for latent semantic indexing in information retrieval,” ACM Transactions on Information Systems, vol. 16, no. 4, pp. 322–346, 1998. View at Publisher · View at Google Scholar · View at Scopus
  13. X. He, D. Cai, H. Liu, and W. Y. Ma, “Locality preserving indeixng for document reprenentation,” in Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 218–225, 2004.
  14. E. P. Jiang and M. W. Berry, “Information filtering using the Riemannian SVD (R-SVD),” in Solving Irregularly Structured Problems in Parallel: 5th International Symposium, IRREGULAR'98 Berkeley, California, USA, August 9–11, 1998 Proceedings, vol. 1457 of Lecture Notes in Computer Science, pp. 386–395, 2005. View at Google Scholar
  15. M. Welling, Fisher Linear Discriminant Analysis, http://www.ics.uci.edu/~welling/classnotes/papers_class/Fisher-LDA.pdf.
  16. J. Gao and J. Zhang, “Clustered SVD strategies in latent semantic indexing,” Information Processing and Management, vol. 41, no. 5, pp. 1051–1063, 2005. View at Publisher · View at Google Scholar · View at Scopus
  17. V. Castelli, A. Thomasian, and C.-S. Li, “CSVD: clustering and singular value decomposition for approximate similarity search in high-dimensional spaces,” IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 3, pp. 671–685, 2003. View at Publisher · View at Google Scholar · View at Scopus
  18. M. W. Berry, “Large scale singular value computations,” International Journal of Supercomputer Applications, vol. 6, pp. 13–49, 1992. View at Google Scholar
  19. C. D. Manning and H. Schutze, Foundations of Statisitcal Natural Language Processing, The MIT Press, 4th edition, 2001.
  20. G. Salton, A. Wang, and C. S. Yang, “A vector space model for information retrieval,” Journal of American Society for Information Science, vol. 18, no. 11, pp. 613–620, 1975. View at Google Scholar
  21. L. Jiang, C. Li, S. Wang, and L. Zhang, “Deep feature weighting for naive Bayes and its application to text classification,” Engineering Applications of Artificial Intelligence, vol. 52, pp. 26–39, 2016. View at Publisher · View at Google Scholar
  22. T. Van Phan and M. Nakagawa, “Combination of global and local contexts for text/non-text classification in heterogeneous online handwritten documents,” Pattern Recognition, vol. 51, pp. 112–124, 2016. View at Publisher · View at Google Scholar · View at Scopus
  23. H. Zha, O. Marques, and H. D. Simon, “Large scale SVD and subspace-based methods for information retrieval,” in Proceedings of the 5th International Symposium on Solving Irregularly Structured Problems in Parallel (IRREGULAR '98), pp. 29–42, Berkeley, Calif, USA, August 1998.
  24. J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, Boston, Mass, USA, 2nd edition, 2006.
  25. T. Kohonen, Self-Organization and Associative Memory, vol. 8 of Springer Series in Information Sciences, Springer, New York, NY, USA, 2nd edition, 1988. View at Publisher · View at Google Scholar · View at MathSciNet
  26. M. Steinbach, G. Karypis, and V. Kumar, “A comparison of document clustering techniques,” in Proceedings of the KDD Workshop on Text Mining, pp. 109–110, 2000.
  27. Y. M. Yang and X. Liu, “A re-examination of text categorization methods,” in Proceedings on the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '99), pp. 42–49, Berkeley, Calif, USA, August 1999.
  28. R. F. Corrêa and T. B. Ludermir, “Improving self-organization of document collections by semantic mapping,” Neurocomputing, vol. 70, no. 1–3, pp. 62–69, 2006. View at Publisher · View at Google Scholar · View at Scopus
  29. T. Hofmann, “Learning the similarity of documents: an information-geometric approach to document retrieval and categorization,” in Advances in Neural Information Processing Systems 12, pp. 914–920, The MIT Press, 2000. View at Google Scholar
  30. D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of Machine Learning Research, vol. 3, no. 4-5, pp. 993–1022, 2003. View at Google Scholar · View at Scopus