Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2014, Article ID 437162, 17 pages
http://dx.doi.org/10.1155/2014/437162
Research Article

A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences

1Department of Computer and Communication Engineering, Ming Chuan University, Taoyuan 333, Taiwan
2Department of Engineering Science, National Cheng Kung University, Tainan 701, Taiwan
3Department of Visual Communication Design, Hsuan Chuang University, Hsinchu 300, Taiwan

Received 17 December 2013; Accepted 10 March 2014; Published 10 April 2014

Academic Editors: J. G. Duque, J. T. Fernandez-Breis, and P. Melin

Copyright © 2014 Ming Che Lee et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, Wokingham, UK, 1989.
  2. G. Salton, The SMART Retrieval System-Experiments in Automatic Document Processing, Prentice Hall, Englewood Cliffs, NJ, USA, 1971.
  3. G. Salton and M. E. Lesk, “Computer evaluation of indexing and text processing,” Journal of the ACM, vol. 15, no. 1, pp. 8–36, 1998. View at Google Scholar
  4. C.-H. Wu, J.-F. Yeh, and Y.-S. Lai, “Semantic segment extraction and matching for internet FAQ retrieval,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 7, pp. 930–940, 2006. View at Publisher · View at Google Scholar · View at Scopus
  5. H. Chim and X. Deng, “Efficient phrase-based document similarity for clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 9, pp. 1217–1229, 2008. View at Publisher · View at Google Scholar · View at Scopus
  6. O. Zamir and O. Etzioni, “Web document clustering: a feasibility demonstration,” Proceedings of the19th International ACM SIGIR Conference SIGIR Forum, pp. 46–54, 1998. View at Google Scholar · View at Scopus
  7. O. Zamir, O. Etzioni, O. Madani, and R. M. Karp, “Fast and intuitive clustering of web documents,” in Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 287–290, 1997.
  8. Y. Li, D. McLean, Z. A. Bandar, J. D. O'Shea, and K. Crockett, “Sentence similarity based on semantic nets and corpus statistics,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 8, pp. 1138–1150, 2006. View at Publisher · View at Google Scholar · View at Scopus
  9. D. Michie, “Return of the imitation game,” Electronic Transactions on Artificial Intelligence, vol. 6, no. 2, pp. 203–221, 2001. View at Google Scholar
  10. J. Atkinson-Abutridy, C. Mellish, and S. Aitken, “Combining information extraction with genetic algorithms for text mining,” IEEE Intelligent Systems, vol. 19, no. 3, pp. 22–30, 2004. View at Publisher · View at Google Scholar · View at Scopus
  11. G. Erkan and D. R. Radev, “LexRank: graph-based lexical centrality as salience in text summarization,” Journal of Artificial Intelligence Research, vol. 22, pp. 457–479, 2004. View at Google Scholar · View at Scopus
  12. Y. Ko, J. Park, and J. Seo, “Improving text categorization using the importance of sentences,” Information Processing and Management, vol. 40, no. 1, pp. 65–79, 2004. View at Publisher · View at Google Scholar · View at Scopus
  13. Y. Liu and C. Q. Zong, “Example-based Chinese-English MT,” in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC '04), pp. 6093–6096, October 2004. View at Publisher · View at Google Scholar · View at Scopus
  14. P. W. Foltz, W. Kintsch, and T. K. Landauer, “The measurement of textual coherence with latent semantic analysis,” Discourse Processes, vol. 25, no. 2-3, pp. 285–307, 1998. View at Google Scholar
  15. T. K. Landauer, P. W. Foltz, and D. Laham, “Introduction to latent semantic analysis,” Discourse Processes, vol. 25, no. 2-3, pp. 259–284, 1998. View at Google Scholar
  16. T. K. Landauer, D. Laham, B. Rehder, and M. E. Schreiner, “How well can passage meaning be derived without using word order? A comparison of latent semantic analysis and humans,” in Proceedings of the 19th Annual Meeting of the Cognitive Science Society, pp. 412–417, 1997.
  17. T. Hofmann, “Probabilistic latent semantic indexing,” in Proceedings of Proceedings of the International ACM SIGIR Conference (SIGIR ’99), pp. 50–57, 1999.
  18. J. R. Bellegarda, “Exploiting latent sematic information in statistical language modeling,” Proceedings of the IEEE, vol. 88, no. 8, pp. 1279–1296, 2000. View at Publisher · View at Google Scholar · View at Scopus
  19. T.-C. Chou and M. C. Chen, “Using incremental PLSI for threshold-resilient online event analysis,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 3, pp. 289–299, 2008. View at Publisher · View at Google Scholar · View at Scopus
  20. J. C. Valle-Lisboa and E. Mizraji, “The uncovering of hidden structures by Latent Semantic Analysis,” Information Sciences, vol. 177, no. 19, pp. 4122–4147, 2007. View at Publisher · View at Google Scholar · View at Scopus
  21. H. Kim and J. Seo, “Cluster-based FAQ retrieval using latent term weights,” IEEE Intelligent Systems, vol. 23, no. 2, pp. 58–65, 2008. View at Publisher · View at Google Scholar · View at Scopus
  22. T. A. Letsche and M. W. Berry, “Large-scale information retrieval with latent semantic indexing,” Information Sciences, vol. 100, no. 1–4, pp. 105–137, 1997. View at Google Scholar · View at Scopus
  23. C.-H. Wu, C.-C. Hsia, J.-F. Chen, and J.-F. Wang, “Variable-length unit selection in TTS using structural syntactic cost,” IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 4, pp. 1227–1235, 2007. View at Publisher · View at Google Scholar · View at Scopus
  24. K. Zhou, X. Gui-Rong, Q. Yang, and Y. Yu, “Learning with positive and unlabeled examples using topic-sensitive PLSA,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 1, pp. 46–58, 2010. View at Publisher · View at Google Scholar · View at Scopus
  25. C. Burgess, K. Livesay, and K. Lund, “Explorations in context space: words,” Sentences, Discourse, Discourse Processes, vol. 25, no. 2-3, pp. 211–257, 1998. View at Google Scholar
  26. T. R. Gruber, “A translation approach to portable ontology specifications,” Knowledge Acquisition, vol. 5, no. 2, pp. 199–220, 1993. View at Publisher · View at Google Scholar · View at Scopus
  27. T. R. Gruber, “Toward principles for the design of ontologies used for knowledge sharing?” International Journal of Human: Computer Studies, vol. 43, no. 5-6, pp. 907–928, 1995. View at Publisher · View at Google Scholar · View at Scopus
  28. T. Berners-Lee, J. Hendler, and O. Lassila, “The semantic web,” Scientific American, vol. 284, no. 5, pp. 34–43, 2001. View at Publisher · View at Google Scholar · View at Scopus
  29. A. Formica, “Ontology-based concept similarity in Formal Concept Analysis,” Information Sciences, vol. 176, no. 18, pp. 2624–2641, 2006. View at Publisher · View at Google Scholar · View at Scopus
  30. P. Moreda, H. Llorens, E. Saquete, and M. Palomar, “Combining semantic information in question answering systems,” Information Processing and Management, vol. 47, no. 6, pp. 870–885, 2011. View at Publisher · View at Google Scholar · View at Scopus
  31. V. Janev and S. Vraneš, “Applicability assessment of Semantic Web technologies,” Information Processing and Management, vol. 47, no. 4, pp. 507–517, 2011. View at Publisher · View at Google Scholar · View at Scopus
  32. L. K. Albert, YMIR: an ontology for engineering design [Ph.D. thesis], University of Twente, Twente, The Netherlands, 1993.
  33. N. Guarino, “Understanding, building and using ontologies,” International Journal of Human Computer Studies, vol. 46, no. 2-3, pp. 293–310, 1997. View at Google Scholar · View at Scopus
  34. G. van Heijst, A. T. Schreiber, and B. J. Wielinga, “Using explicit ontologies in KBS development,” International Journal of Human Computer Studies, vol. 46, no. 2-3, pp. 183–292, 1997. View at Google Scholar · View at Scopus
  35. N. Guarino and P. Giaretta, “Ontologies and knowledge bases: towards a terminological clarification,” in Towards Very Large Knowledge Bases: Knowledge Building and Knowledge Sharing, pp. 25–32, 1995. View at Google Scholar
  36. A. T. Schreiber, B. J. Wielinga, and W. H. J. Jansweijer, “The KACTUS view on the “O” Word,” in Proceedings of the IJCAI Workshop on Basic Ontological Issues in Knowledge Sharing, pp. 159–168, 1995.
  37. M. Uschold and M. Gruninger, “Ontologies: principles, methods and applications,” Knowledge Engineering Review, vol. 11, no. 2, pp. 93–136, 1996. View at Google Scholar · View at Scopus
  38. RDF, http://www.w3.org/RDF/.
  39. RDF-Schema, http://www.w3.org/TR/rdf-schema/.
  40. XML, http://www.w3.org/XML/.
  41. OWL-REF, http://www.w3.org/TR/owl-ref/.
  42. WordNet, http://wordnet.princeton.edu/.
  43. K. U. Chen, J. C. Zhao, W. L. Zuo, F. L. He, and Y. H. Chen, “Automatic table integration by domain-specific ontology,” International Journal of Digital Content Technology and its Applications, vol. 5, no. 1, pp. 218–225, 2011. View at Publisher · View at Google Scholar · View at Scopus
  44. D. Pan, “An integrative framework for continuous knowledge discovery,” Journal of Convergence Information Technology, vol. 5, no. 3, pp. 46–53, 2010. View at Google Scholar · View at Scopus
  45. M. C. Lee, H. H. Chen, and Y. S. Li, “FCA based concept constructing and similarity measurement algorithms,” International Journal of Advancements in Computing Technology, vol. 3, no. 1, pp. 97–105, 2011. View at Publisher · View at Google Scholar · View at Scopus
  46. M. Ruiz-Casado, E. Alfonseca, and P. Castells, “Automatising the learning of lexical patterns: an application to the enrichment of WordNet by extracting semantic relationships from Wikipedia,” Data and Knowledge Engineering, vol. 61, no. 3, pp. 484–499, 2007. View at Publisher · View at Google Scholar · View at Scopus
  47. H.-C. Seo, H. Chung, H.-C. Rim, S. H. Myaeng, and S.-H. Kim, “Unsupervised word sense disambiguation using WordNet relatives,” Computer Speech and Language, vol. 18, no. 3, pp. 253–273, 2004. View at Publisher · View at Google Scholar · View at Scopus
  48. Z. Ye, C. Guoqiang, and J. Dongyue, “A modified method for concepts similarity calculation,” Journal of Convergence Information Technology, vol. 6, no. 1, pp. 34–40, 2011. View at Publisher · View at Google Scholar · View at Scopus
  49. W. Zhang, T. Yoshida, and X. Tang, “Using ontology to improve precision of terminology extraction from documents,” Expert Systems with Applications, vol. 36, no. 5, pp. 9333–9339, 2009. View at Publisher · View at Google Scholar · View at Scopus
  50. Z. Wu and M. Palmer, “Verb semantics and lexical selection,” in Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp. 133–138, 1994.
  51. D. Sleator and D. Temperley, “Parsing English with a link grammar,” Carnegie Mellon University Computer Science Technical Report CMU-CS-91-196, 1991. View at Google Scholar
  52. Abiword Project, http://www.abisource.com/projects/link-grammar/.
  53. C. Quirk, C. Brockett, and W. Dolan, “Monolingual machine translation for paraphrase generation,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '04), D. Lin and D. Wu, Eds., pp. 142–149, 2004.
  54. J. O'Shea, Z. Bandar, K. Crockett, and D. McLean, “A comparative study of two short text semantic similarity measures,” in Agent and Multi-Agent Systems: Technologies and Applications, vol. 4953 of Lecture Notes in Artificial Intelligence, pp. 172–181, 2008. View at Google Scholar
  55. A. Islam and D. Inkpen, “Semantic text similarity using corpus-based word similarity and string similarity,” ACM Transactions on Knowledge Discovery from Data, vol. 2, no. 2, article 10, 2008. View at Google Scholar · View at Scopus
  56. J. Oliva, J. I. Serrano, M. D. del Castillo, and Á. Iglesias, “SyMSS: a syntax-based measure for short-text semantic similarity,” Data and Knowledge Engineering, vol. 70, no. 4, pp. 390–405, 2011. View at Publisher · View at Google Scholar · View at Scopus
  57. G. Tsatsaronis, I. Varlamis, and M. Vazirgiannis, “Text relatedness based on aword thesaurus,” Journal of Artificial Intelligence Research, vol. 37, pp. 1–39, 2010. View at Publisher · View at Google Scholar · View at Scopus
  58. S. Wan, M. Dras, R. Dale, and C. Paris, “Using dependency-based features to take the parafarce out of paraphrase,” in Proceedings of the Australasian Language Technology Workshop, pp. 131–138, 2006.
  59. L. Qiu, M.-Y. Kan, and T.-S. Chua, “Paraphrase recognition via dissimilarity significance classification,” in Proceedings of the 11th Conference on Empirical Methods in Natural Language Proceessing (EMNLP '06), pp. 18–26, July 2006. View at Scopus
  60. H. Rubenstein and J. B. Goodenough, “Contextual correlates of synonymy,” Communications of the ACM, vol. 8, no. 10, pp. 627–633, 1965. View at Google Scholar
  61. J. Sinclair, Collins Cobuild English Dictionary for Advanced Learners, Harper Collins, 3rd edition, 2001.
  62. P. Turney, “Mining the web for synonyms: PMI-IR versus LSA on TOEFL,” in Proceedings of the 12th European Conference on Machine Learning (ECML '01), pp. 491–502, 2001.
  63. J. Jiang and D. Conrath, “Semantic similarity based on corpus statistics and lexical taxonomy,” in Proceedings of the International Conference Research on Computational Linguistics (ROCLING '96),, pp. 19–33, 1996.
  64. C. Leacock, G. A. Miller, and M. Chodorow, “Using corpus statistics and wordnet relations for sense identification,” Computational Linguistics, vol. 24, no. 1, pp. 146–165, 1998. View at Google Scholar · View at Scopus
  65. D. Lin, “An information-theoretic definition of similarity,” in Proceedings of the 15th International Conference on Machine Learning (ICML '98), pp. 296–304, 1998.
  66. P. Resnik, “Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language,” Journal of Artificial Intelligence Research, vol. 11, pp. 95–130, 1999. View at Google Scholar · View at Scopus
  67. P. Resnik, “Using information content to evaluate semantic similarity,” in Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI '95), pp. 448–453, 1995.
  68. M. Lesk, “Automated sense disambiguation using machine-readable dictionaries: how to tell a pine cone from an ice cream cone,” in Proceedings of the 5th Annual International Conference on Systems Documentation (SIGDOC '86), pp. 24–26, 1986.
  69. R. Mihalcea, C. Corley, and C. Strapparava, “Corpus-based and knowledge-based measures of text semantic similarity,” in Proceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference (AAAI '06), pp. 775–780, July 2006. View at Scopus
  70. Y. Zhang and J. Patrick, “Paraphrase identification by text canonicalization,” in Proceedings of the Australasian Language Technology Workshop, pp. 160–166, 2005.
  71. V. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, NY, USA, 1995.
  72. C. Corley and R. Mihalcea, “Measures of text semantic similarity,” in Proceedings of the ACL workshop on Empirical Modeling of Semantic Equivalence, Ann Arbor, Mich, USA, 2005.