Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2014, Article ID 196574, 10 pages
http://dx.doi.org/10.1155/2014/196574
Research Article

iSentenizer- : Multilingual Sentence Boundary Detection Model

NLP2CT Laboratory, Department of Computer and Information Science, University of Macau, Macau

Received 30 August 2013; Accepted 25 March 2014; Published 15 April 2014

Academic Editors: J. Shu and F. Yu

Copyright © 2014 Derek F. Wong et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. C. D. Manning, “Part-of-speech tagging from 97% to 100%: is it time for some linguistics?” in Computational Linguistics and Intelligent Text Processing, pp. 171–189, Springer,, New York, NY, USA, 2011. View at Google Scholar
  2. L. Zhu, D. F. Wong, and L. S. Chao, “Unsupervised chunking based on graph propagation from bilingual corpus,” The Scientific World Journal, vol. 2014, Article ID 401943, 10 pages, 2014. View at Publisher · View at Google Scholar
  3. X. Zeng, D. F. Wong, L. S. Chao, I. Trancoso, L. He, and Q. Huang, “Lexicon expansion for latent variable grammars,” Pattern Recognition Letters, vol. 42, pp. 47–55, 2014. View at Publisher · View at Google Scholar
  4. F. Wong, M. Dong, and D. Hu, “Machine translation based on translation corresponding tree structure,” Tsinghua Science and Technology, vol. 11, no. 1, pp. 25–31, 2006. View at Publisher · View at Google Scholar · View at Scopus
  5. L.-Y. Wang, D. F. Wong, and L. S. Chao, “TQDL: integrated models for cross-language document retrieval,” International Journal of Computational Linguistics and Chinese Language Processing, vol. 17, no. 4, pp. 15–31, 2012. View at Google Scholar
  6. G. Grefenstette and P. Tapanainen, “What is a word, what is a sentence?: problems of Tokenisation,” in Proceedings of the 3rd International Conference on Computational Lexicography (COMPLEX '94), pp. 79–87, 1994.
  7. D. D. Palmer and M. A. Hearst, “Adaptive multilingual sentence boundary disambiguation,” Computational Linguistics, vol. 23, no. 2, pp. 240–267, 1997. View at Google Scholar · View at Scopus
  8. A. Mikheev, “Periods, capitalized words, etc.,” Computational Linguistics, vol. 28, no. 3, pp. 289–318, 2002. View at Publisher · View at Google Scholar · View at Scopus
  9. J. C. Reynar and A. Ratnaparkhi, “A maximum entropy approach to identifying sentence boundaries,” in Proceedings of the 5th Conference on Applied Natural Language Processing, pp. 16–19, 1997.
  10. K. Evang, V. Basile, G. Chrupała, and J. Bos, “Elephant: sequence labeling for word and sentence segmentation,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '13), pp. 1422–1426, Seattle, Wash, USA, 2013.
  11. M. Fares, S. Oepen, and Y. Zhang, “Machine learning for high-quality tokenization replicating variable tokenization schemes,” in Proceedings of the Computational linguistics and Intelligent Text Processing, pp. 231–244, Springer, 2013.
  12. K. Tomanek, J. Wermter, and U. Hahn, “Sentence and token splitting based on conditional random fields,” in Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics, pp. 49–57, 2007.
  13. P. Koehn, “Europarl: a parallel corpus for statistical machine translation,” in MT Summit, vol. 5, 2005. View at Google Scholar
  14. F. Wong and S. Chao, “iSentenizer: an incremental sentence boundary classifier,” in Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE '10), August 2010. View at Publisher · View at Google Scholar · View at Scopus
  15. S. Chao and F. Wong, “An incremental decision tree learning methodology regarding attributes in medical data mining,” in Proceedings of the 8th International Conference on Machine Learning and Cybernetics, pp. 1694–1699, chn, July 2009. View at Publisher · View at Google Scholar · View at Scopus
  16. C. N. Silla Jr., J. Dalla Valle Jr., and C. A. A. Kaestner, “Detecç ao automática de sentenças com o uso de expressoes regulares,” in Proceedings of the Brazilian Congress of Computation (CBComp '03), pp. 548–560, 2003.
  17. C. N. Silla and C. A. A. Kaestner, “An analysis of sentence boundary detection systems for English and Portuguese documents,” in Lecture Notes in Computer Science, pp. 135–141, 2004. View at Google Scholar
  18. T. Kiss and J. Strunk, “Unsupervised multilingual sentence boundary detection,” Computational Linguistics, vol. 32, no. 4, pp. 485–525, 2006. View at Publisher · View at Google Scholar · View at Scopus
  19. H. R. Bittencourt and R. T. Clarke, “Use of classification and regression trees (CART) to classify remotely-sensed digital images,” in Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS '03), vol. 6, pp. 3751–3753, July 2003. View at Scopus
  20. S. R. Safavian and D. Landgrebe, “A survey of decision tree classifier methodology,” IEEE Transactions on Systems, Man and Cybernetics, vol. 21, no. 3, pp. 660–674, 1991. View at Publisher · View at Google Scholar · View at Scopus
  21. J. H. Friedman, “A recursive partitioning decision rule for nonparametric classification,” IEEE Transactions on Computers, vol. 100, no. 4, pp. 404–408, 1977. View at Google Scholar · View at Scopus
  22. P. Utgo and J. Clouse, “A Kolmogorov-Smirnoff metric for decision tree induction,” Technical Report 96-3, University of Massachusetts, Department of Computer Science, Amherst, Mass, USA, 1996. View at Google Scholar
  23. P. E. Utgoff, N. C. Berkman, and J. A. Clouse, “Decision tree induction based on efficient tree restructuring,” Machine Learning, vol. 29, no. 1, pp. 5–44, 1997. View at Google Scholar · View at Scopus
  24. E. Stamatatos, N. Fakotakis, and G. Kokkinakis, “Automatic extraction of rules for sentence boundary disambiguation,” in Proceedings of the Workshop on Machine Learning in Human Language Technology, pp. 88–92, 1999.
  25. D. G. Lee and H. C. Rim, “Towards language-independent sentence boundary detection,” in Computational Linguistics and Intelligent Text Processing, pp. 142–145, 2004. View at Google Scholar
  26. M. Stevenson and R. Gaizauskas, “Experiments on sentence boundary detection,” in Proceedings of the 6th Conference on Applied Natural Language Processing, pp. 84–89, 2000.
  27. N. Agarwal, K. H. Ford, and M. Shneider, “Sentence boundary detection using a maxEnt classifier,” 2005.
  28. A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki, “The DET curve in assessment of detection task performance,” in Proceedings of the 5th European Conference on Speech Communication and Technology, 1997.
  29. Y. Liu and E. Shriberg, “Comparing evaluation metrics for sentence boundary detection,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '07, vol. 4, pp. 185–188, usa, April 2007. View at Publisher · View at Google Scholar · View at Scopus
  30. M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini, “Building a large annotated corpus of English: the Penn Treebank,” Computational Linguistics, vol. 19, no. 2, pp. 313–330, 1993. View at Google Scholar
  31. C. Galves and H. Britto, “A construção do corpus anotado do Português histórico tycho Brahe: o sistema de anotação morfológica,” in Proceedings of the 4th International Conference on Computational Processing of Portuguese Language (PROPOR '99), pp. 55–67, Universidade de Évora, Évora, Portuguese, 1999.
  32. H. Britto, C. Galves, I. Ribeiro, M. Augusto, and A. P. Scher, “Morphological annotation system for automatic tagging of electronic textual corpora: from English to Romance languages,” in Proceedings of the 6th International Symposium of Social Communication, pp. 582–589, 1999.