About this Journal Submit a Manuscript Table of Contents
Advances in Artificial Intelligence
Volume 2011 (2011), Article ID 107310, 12 pages
http://dx.doi.org/10.1155/2011/107310
Research Article

Reducing Excessive Amounts of Data: Multiple Web Queries for Generation of Pun Candidates

1Otaru University of Commerce, Midori 3-5-21, Otaru 047-8501, Japan
2High-Tech Research Center, Intelligent Techniques Laboratory 6, Hokkai-Gakuen University, Minami 26, Nishi 11, Chuo-ku, Sapporo 064-0926, Japan
3Department of Information and Management Science, Otaru University of Commerce, Midori 3-5-21, Otaru 047-8501, Japan

Received 12 July 2011; Revised 25 November 2011; Accepted 6 December 2011

Academic Editor: Srinivas Bangalore

Copyright © 2011 Pawel Dybala et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Humor processing is still a less studied issue, both in NLP and AI. In this paper we contribute to this field. In our previous research we showed that adding a simple pun generator to a chatterbot can significantly improve its performance. The pun generator we used generated only puns based on words (not phrases). In this paper we introduce the next stage of the system's development—an algorithm allowing generation of phrasal pun candidates. We show that by using only the Internet (without any hand-made humor-oriented lexicons), it is possible to generate puns based on complex phrases. As the output list is often excessively long, we also propose a method for reducing the number of candidates by comparing two web-query-based rankings. The evaluation experiment showed that the system achieved an accuracy of 72.5% for finding proper candidates in general, and the reduction method allowed us to significantly shorten the candidates list. The parameters of the reduction algorithm are variable, so that the balance between the number of candidates and the quality of output can be manipulated according to needs.