Table of Contents
Advances in Artificial Intelligence
Volume 2012, Article ID 484580, 15 pages
http://dx.doi.org/10.1155/2012/484580
Research Article

Learning to Translate: A Statistical and Computational Analysis

1European Commission-Joint Research Centre (JRC), IPSC, GlobeSec, Via Fermi 2749, 21020 Ispra, Italy
2Intelligent Systems Laboratory, University of Bristol, MVB, Woodland Road, Bristol BS8 1UB, UK
3Interactive Language Technologies, National Research Council Canada, 283 Boulevard Alexandre-Taché, Gatineau, QC, Canada J8X 3X7

Received 15 July 2011; Accepted 30 January 2012

Academic Editor: Peter Tino

Copyright © 2012 Marco Turchi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. W. Locke and A. Booth, Machine Translation of Languages, MIT Press, Cambridge, Mass, USA, 1955.
  2. P. F. Brown, S. D. Pietra, V. J. D. Pietra, and R. L. Mercer, “The mathematic of statistical machine translation: parameter estimation,” Computational Linguistics, vol. 19, no. 2, pp. 263–311, 1994. View at Google Scholar
  3. P. Koehn, F. J. Och, and D. Marcu, “Statistical phrase-based translation,” in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 48–54, Edmonton, Canada, 2003.
  4. R. Zens, F.-J. Och, and H. Ney, “Phrase-based statistical machine translation,” in Proceedings of the 25th Annual German Conference on AI (KI ’02), pp. 18–32, Springer, London, UK, 2002.
  5. P. Koehn, H. Hoang, A. Birch et al., “Moses: open source toolkit for statistical machine translation,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics, Demonstration Session, pp. 177–180, Columbus, Ohio, USA, 2007.
  6. N. Ueffing, M. Simard, S. Larkin, and J. H. Johnson, “NRC’s PORTAGE system for WMT,” in Proceedings of ACL 2nd Workshop on Statistical Machine Translation, pp. 185–188, Prague, Czech Republic, 2007.
  7. P. Koehn, “Europarl: a parallel corpus for statistical machine translation,” in Proceedings of the 10 th Machine Translation Summit, pp. 79–86, Phuket, Thailand, 2005.
  8. C. Callison-Burch, P. Koehn, C. Monz , and J. Schroeder, “Findings of the 2009 Workshop on Statistical Machine Translation,” in Proceedings of the Fourth Workshop on Statistical Machine Translation, C. Callison-Burch, P. Koehn, C. Monz , and J. Schroeder, Eds., pp. 263–311, Association for Computational Linguistics, Athens, Greece, 2009.
  9. M. Turchi, T. DeBie, and N. Cristianini, “Learning performance of a machine translation system: a statistical and computational analysis,” in Proceedings of the 3rd Workshop on Statistical Machine Translation, pp. 35–43, Columbus, Ohio, USA, 2008.
  10. Y. Al-Onaizan, J. Curin, M. Jahr et al., “Statistical machine translation: final report,” Tech. Rep., Johns Hopkins University, Summer Workshop on Language Engineering, Center for Speech and Language Processing, Baltimore, Md, USA, 1999. View at Google Scholar
  11. F. J. Och and H. Weber, “Improving statistical natural language translation with categories and rules,” in Proceedings of the 17th International Conference on Computational Linguistics, vol. 2, pp. 985–989, Stroudsburg, Pa, USA, 1998.
  12. F. J. Och and H. Ney, “Discriminative training and maximum entropy models for statistical machine translation,” in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 295–302, Philadelphia, Pa, USA, 2001.
  13. Y. Liu, Q. Liu, and S. Lin, “Discriminativeword alignment by linear modeling,” Computational Linguistics, vol. 36, no. 3, pp. 303–339, 2010. View at Publisher · View at Google Scholar · View at Scopus
  14. R. C. Moore, W.-T. Yih, and A. Bode, “Improved discriminative bilingual word alignment,” in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (ACL '06), pp. 513–520, Sydney, Australia, 2006.
  15. A. Fraser and D. Marcu, “Measuring word alignment quality for statistical machine translation,” Computational Linguistics, vol. 33, no. 3, pp. 293–303, 2007. View at Publisher · View at Google Scholar · View at Scopus
  16. K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318, Philadelphia, Pa, USA, 2001.
  17. F. J. Och, “Minimum error rate training in statistical machine translation,” in Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pp. 160–167, Sapporo, Japan, 2003.
  18. W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C++, Cambridge University Press, Cambridge, Mass, USA, 2002.
  19. A. Arun and P. Koehn, “Online learning methods for discriminative training of phrase based statistical machine translation,” in Proceedings of 11th the Machine Translation Summit, Copenhagen, Denmark, 2007.
  20. D. Chiang, Y. Marton, P. Resnik, and P. Resnik, “Online large-margin training of syntactic and structural translation features,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '08), pp. 224–233, Association for Computational Linguistics, Stroudsburg, Pa, USA, 2008.
  21. B. Efron and R. J. Tibshirani, An Introuction to the Bootstrap, CRC Press, Boca Raton, Fla, USA, 1994.
  22. P. Koehn, “Statistical significance tests for machine translation evaluation,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 388–395, Barcelona, Spain, 2004.
  23. A. Stolcke, “Srilm—an extensible language modeling toolkit,” in Proceedings of the International Conference on Spoken Language Processing, Denver, Colo, USA, 2002.
  24. F. J. Och and H. Ney, “A systematic comparison of various statistical alignment models,” Computational Linguistics, vol. 29, no. 1, pp. 19–51, 2003. View at Publisher · View at Google Scholar · View at Scopus
  25. G. Doddington, “Automatic evaluation of machine translation quality using n-gram co-occurrence statistics,” in Proceedings of the 2nd International Conference on Human Language Technology Research, pp. 138–145, Morgan Kaufmann Publishers, San Francisco, Calif, USA, 2002.
  26. S. Banerjee and A. Lavie, “Meteor: an automatic metric for mt evaluation with improved correlation with human judgments,” in Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, Mich, USA, 2005.
  27. A. Lavie and A. Agarwal, “Meteor: an automatic metric for mt evaluation with high levels of correlation with human judgments,” in Proceedings of 45th Annual Meeting of the Association for Computational Linguistics, pp. 228–231, Association for Computational Linguistics, Prague, Czech Republic, 2007.
  28. M. Snover, B. Dorr, R. Schwartz, L. Micciulla, and J. Makhoul, “A study of translation edit rate with targeted human annotation,” in Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, pp. 223–231, Association for Machine Translation in the Americas, Los Angeles, Calif, USA, 2006.
  29. R. Steinberger, B. Pouliquen, A. Widiger et al., “The jrc-acquis: a multilingual aligned parallel corpus with 20+ languages,” in Proceedings of the 5th International Conference on Language Resources and Evaluation, pp. 2142–2147, Genova, Italy, 2006.
  30. T. Brants, A. C. Popat, P. Xu, F. J. Och, and J. Dean, “Large language models in machine translation,” in Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL '07), pp. 858–867, Association for Computational Linguistics, Prague, Czech Republic, 2007.
  31. N. Draper and H. Smith, Applied Regression Analysis, John Wiley & Sons, New York, NY, USA, 1981.
  32. F. J. Och, “Statistical machine translation: foundations and recent advances,” in Proceedings of the Tutorial at MT Summit, 2005.
  33. M. Bloodgood and C. Callison-Burch, “Bucking the trend: large-scale cost-focused active learning for statistical machine translation,” in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL '10), pp. 854–864, Association for Computational Linguistics, Stroudsburg, Pa, USA, 2010.
  34. P. Koehn and H. Hoang, “Factored translation models,” in Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL '07), pp. 868–876, 2007.
  35. J. Blatz, E. Fitzgerald, G. Foster et al., “Confidence estimation for machine translation,” in Proceedings of the 20th international Conference on Computational Linguistics (COLING '04), vol. 315, Association for Computational Linguistics, Morristown, NJ, USA, 2004.
  36. L. Specia, M. Turchi, Z. Wang, J. Shawe-Taylor, and C. Saunders, “Improving the confidence of machine translation quality estimates,” in Proceedings of 12th the Machine Translation Summit, 2009.
  37. N. Ueffing and H. Ney, “Word-level confidence estimation for machine translation using phrase-based translation models,” in Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT '05), pp. 763–770, Association for Computational Linguistics, Morristown, NJ, USA, 2005.
  38. G. Wisniewski, A. Allauzen, and F. Yvon, “Assessing phrase-based translation models with oracle decoding,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ’10), pp. 933–943, Association for Computational Linguistics, Stroudsburg, Pa, USA, 2010.
  39. D. Vilar, J. Xu, L. F. D’Haro, and H. Ney, “Error analysis of statistical machine translation output,” in Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC '06), Genova, Italy, 2006.