Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2014, Article ID 745485, 10 pages
http://dx.doi.org/10.1155/2014/745485
Research Article

A Systematic Comparison of Data Selection Criteria for SMT Domain Adaptation

Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory, Department of Computer and Information Science, University of Macau, Macau, China

Received 30 August 2013; Accepted 4 December 2013; Published 10 February 2014

Academic Editors: J. Shu and F. Yu

Copyright © 2014 Longyue Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer, “The mathematics of statistical machine translation: parameter estimation,” Computational Linguistics, vol. 19, no. 2, pp. 263–311, 1993. View at Google Scholar
  2. A. Axelrod, X. He, and J. Gao, “Domain adaptation via pseudo in-domain data selection,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '11), pp. 355–362, July 2011. View at Scopus
  3. P. Koehn, F. J. Och, and D. Marcu, “Statistical phrase-based translation,” in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL '03), vol. 1, pp. 48–54, 2003.
  4. P. Koehn, A. Axelrod, A. B. Mayne, C. Callison-Burch, M. Osborne, and D. Talbot, “Edinburgh system description for the 2005 IWSLT speech translation evaluation,” in Proceedings of the International Workshop on Spoken Language Translation (IWSLT '05), pp. 68–75, 2005.
  5. P. Koehn, “Pharaoh: a beam search decoder for phrase-based statistical machine translation models,” in Proceedings of the Antenna Measurement Techniques Association (AMTA '04), pp. 115–124, Springer, Berlin, Germany, 2004.
  6. Y. Lü, J. Huang, and Q. Liu, “Improving statistical machine translation performance by training data selection and optimization,” in Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL '07), pp. 343–350, June 2007. View at Scopus
  7. K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, “BLEU: a method for automatic evaluation of machine translation,” in Proceedings of the Workshop on Automatic Summarization (ACL '02), pp. 311–318, Philadephia, Pa, USA, 2002.
  8. H. Daumé III and J. Jagarlamudi, “Domain adaptation for machine translation by mining unseen words,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT '11), pp. 407–412, June 2011. View at Scopus
  9. S. Mansour and H. Ney, “A simple and effective weighted phrase extraction for machine translation adaptation,” in Proceedings of the 9th International Workshop on Spoken Language Translation (IWSLT '12), pp. 193–200, 2012.
  10. P. Koehn and B. Haddow, “Towards effective use of training data in statistical machine translation,” in Proceedings of the 7th ACL Workshop on Statistical Machine Translation, pp. 317–321, 2012.
  11. J. Civera and A. Juan, “Domain adaptation in statistical machine translation with mixture modeling,” in Proceedings of the 2nd ACL Workshop on Statistical Machine Translation, pp. 177–180, 2007.
  12. G. Foster and R. Kuhn, “Mixture-model adaptation for SMT,” in Proceedings of the 2nd ACL Workshop on Statistical Machine Translation, pp. 128–136, 2007.
  13. V. Eidelman, J. Boyd-Graber, and P. Resnik, “Topic models for dynamic translation model adaptation,” in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers (ACL '12), vol. 2, pp. 115–119, 2012.
  14. S. Matsoukas, A. V. I. Rosti, and B. Zhang, “Discriminative corpus weight estimation for machine translation,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '09), vol. 2, pp. 708–717, August 2009. View at Scopus
  15. A. S. Hildebrand, M. Eck, S. Vogel, and A. Waibel, “Adaptation of the translation model for statistical machine translation based on information retrieval,” in Proceedings of the 10th Annual Conference on European Association for Machine Translation (EAMT '05), pp. 133–142, May 2005. View at Scopus
  16. S. C. Lin, C. L. Tsai, L. F. Chien, K. J. Chen, and L. S. Lee, “Chinese language model adaptation based on document classification and multiple domain-specific language models,” in Proceedings of the 5th European Conference on Speech Communication and Technology, 1997.
  17. J. Gao, J. Goodman, M. Li, and K. F. Lee, “Toward a unified approach to statistical language modeling for Chinese,” in Proceedings of the ACM Transactions on Asian Language Information Processing (TALIP '02), vol. 1, pp. 3–33, 2002.
  18. R. C. Moore and W. Lewis, “Intelligent selection of language model training data,” in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL '10), pp. 220–224, July 2010. View at Scopus
  19. V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions and reversals,” Soviet Physics Doklady, vol. 10, p. 707, 1966. View at Google Scholar
  20. P. Koehn and J. Senellart, “Convergence of translation memory and statistical machine translation,” in Proceedings of AMTA Workshop on MT Research and the Translation Industry, pp. 21–31, 2010.
  21. J. Leveling, D. Ganguly, S. Dandapat, and G. J. F. Jones, “Approximate sentence retrieval for scalable and efficient example-based machine translation,” in Proceedings of the 24th International Conference on Computational Linguistics (COLING '12), pp. 1571–1586, 2012.
  22. T. Hastie, R. Tibshirani, and J. J. H. Friedman, The Elements of Statistical Learning, vol. 1, Springer, New York, NY, USA, 2001.
  23. H. Wu and H. Wang, “Pivot language approach for phrase-based statistical machine translation,” Machine Translation, vol. 21, no. 3, pp. 165–181, 2007. View at Publisher · View at Google Scholar · View at Scopus
  24. P. Koehn and J. Schroeder, “Experiments in domain adaptation for statistical machine translation,” in Proceedings of the 2nd ACL Workshop on Statistical Machine Translation, pp. 224–227, 2007.
  25. A. Stolcke, “SRILM—an extensible language modeling toolkit,” in Proceedings of the International Conference on Spoken Language Processing, pp. 901–904, 2002.
  26. S. F. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling,” in Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL '96), pp. 310–318, 1996.
  27. P. Koehn, “Europarl: a parallel corpus for statistical machine translation,” in Proceedings of the 10th Conference of Machine Translation Summit (AAMT '05), vol. 5, pp. 79–86, Phuket, Thailand, 2005.
  28. H. P. Zhang, H. K. Yu, D. Y. Xiong, and Q. Liu, “HHMM-based Chinese lexical analyzer ICTCLAS,” in Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, vol. 17, pp. 184–187, 2003.
  29. P. Koehn, H. Hoang, A. Birch et al., “Moses: open source toolkit for statistical machine translation,” in Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (ACL '07), pp. 177–180, 2007.
  30. F. J. Och and H. Ney, “A systematic comparison of various statistical alignment models,” Computational Linguistics, vol. 29, no. 1, pp. 19–51, 2003. View at Publisher · View at Google Scholar · View at Scopus
  31. F. J. Och, “Minimum error rate training in statistical machine translation,” in Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL '03), vol. 1, pp. 160–167, 2003.
  32. M. Federico, N. Bertoldi, and M. Cettolo, “IRSTLM: an open source toolkit for handling large scale language models,” in Proceedings of the 9th Annual Conference of the International Speech Communication Association (INTERSPEECH '08), pp. 1618–1621, September 2008. View at Scopus