Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2014, Article ID 437162, 17 pages
http://dx.doi.org/10.1155/2014/437162
Research Article

A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences

1Department of Computer and Communication Engineering, Ming Chuan University, Taoyuan 333, Taiwan
2Department of Engineering Science, National Cheng Kung University, Tainan 701, Taiwan
3Department of Visual Communication Design, Hsuan Chuang University, Hsinchu 300, Taiwan

Received 17 December 2013; Accepted 10 March 2014; Published 10 April 2014

Academic Editors: J. G. Duque, J. T. Fernandez-Breis, and P. Melin

Copyright © 2014 Ming Che Lee et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to “artificial language”, such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure.