Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2014, Article ID 438106, 13 pages
http://dx.doi.org/10.1155/2014/438106
Research Article

A Relationship: Word Alignment, Phrase Table, and Translation Quality

Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory, Department of Computer and Information Science, University of Macau, Macau

Received 30 August 2013; Accepted 10 March 2014; Published 16 April 2014

Academic Editors: J. Shu and F. Yu

Copyright © 2014 Liang Tian et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. P. Koehn, F. J. Och, and D. Marcu, “Statistical phrase-based translation,” in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 48–54, Association for Computational Linguistics, Stroudsburg, Pa, USA, 2003.
  2. R. Zens, F. J. Och, and H. Ney, Phrase-Based Statistical Machine Translation, Springer, Berlin, Germany, 2002.
  3. J. H. Shin, Y. S. Han, and K. S. Choi, “Bilingual knowledge acquisition from Korean-English parallel corpus using alignment method: Korean-English alignment at word and phrase level,” in Proceedings of the 16th Conference on Computational Linguistics, vol. 1, pp. 230–235, Association for Computational Linguistics, Stroudsburg, Pa, USA, 1996.
  4. K. Yamamoto, T. Kudo, Y. Tsuboi, and Y. Matsumoto, “Learning sequence-to-sequence correspondences from parallel corpora via sequential pattern mining,” in Proceedings of the HLT-NAACL Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond, vol. 3, pp. 73–80, Association for Computational Linguistics, Stroudsburg, Pa, USA, 2003.
  5. C. Goutte, K. Yamada, and E. Gaussier, “Aligning words using matrix factorisation,” in Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, Pa, USA, 2004.
  6. F. J. Och, C. Tillmann, H. Ney, and L. F. Informatik, Improved Alignment Models for Statistical Machine Translation, University of Maryland, College Park, Md, USA, 1999.
  7. F. J. Och and H. Ney, “A Comparison of Alignment Models for Statistical Machine Translation,” in Proceedings of the 18th Conference on Computational Linguistics, vol. 2, pp. 1086–1090, Association for Computational Linguistics, Stroudsburg, Pa, USA, 2000.
  8. F. J. Och and H. Ney, “A systematic comparison of various statistical alignment models,” Computational Linguistics, vol. 29, no. 1, pp. 19–51, 2003. View at Publisher · View at Google Scholar · View at Scopus
  9. F. J. Och and H. Ney, “The alignment template approach to statistical machine translation,” Computational Linguistics, vol. 30, no. 4, pp. 417–449, 2004. View at Publisher · View at Google Scholar · View at Scopus
  10. N. Duan, M. Li, and M. Zhou, “Improving phrase extraction via MBR phrase scoring and pruning,” in Proceedings of the MT Summit XIII, pp. 189–197, 2011.
  11. G. Neubig, T. Watanabe, E. Sumita, S. Mori, and T. Kawahara, “An unsupervised model for joint phrase alignment and extraction,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 632–641, Stroudsburg, Pa, USA, June 2011. View at Scopus
  12. H. L. M. Z. Hendra Setiawan, “Learning phrase translation using level of detail approach,” in The Tenth Machine Translation Summit (MT-Summit X), 2005. View at Google Scholar
  13. C. Tillmann, “A projection extension algorithm for statistical machine translation,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, USA, 2003.
  14. B. Zhao and S. Vogel, “A generalized alignment-free phrase extraction,” in Proceedings of the ACL Workshop on Building and Using Parallel Texts, pp. 141–144, Association for Computational Linguistics, Stroudsburg, Pa, USA, 2005.
  15. Y. Zhang and S. Vogel, “Competitive grouping in integrated phrase segmentation and alignment model,” in Proceedings of the ACL Workshop on Building and Using Parallel Texts, pp. 159–162, Association for Computational Linguistics, Stroudsburg, Pa, USA, 2005.
  16. A. Lopez, “Statistical machine translation,” ACM Computing Surveys, vol. 40, no. 3, article 8, 2008. View at Publisher · View at Google Scholar · View at Scopus
  17. K. Ganchev, J. V. Graç, and B. Taskar, “Better alignments = Better translations?” in Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 986–993, June 2008. View at Scopus
  18. D. Vilar, M. Popović, and H. Ney, “AER: do we need to “improve” our alignments,” in Proceedings of the International Workshop on Spoken Language Translation, pp. 205–212, 2006.
  19. A. Fraser and D. Marcu, “Measuring word alignment quality for statistical machine translation,” Computational Linguistics, vol. 33, no. 3, pp. 293–303, 2007. View at Publisher · View at Google Scholar · View at Scopus
  20. P. Lambert, S. Petitrenaud, Y. Ma, and A. Way, “What types of word alignment improve statistical machine translation?” Machine Translation, vol. 26, no. 4, pp. 289–323, 2012. View at Publisher · View at Google Scholar · View at Scopus
  21. J. H. Johnson, J. Martin, G. Foster, and R. Kuhn, “Improving translation quality by discarding most of the phrasetable,” in Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL '07), pp. 967–975, June 2007. View at Scopus
  22. J. H. Johnson, Conditional Significance Pruning: Discarding More of Huge Phrase Tables, 2012.
  23. M. Yang and J. Zheng, “Toward smaller, faster, and better hierarchical phrase-based SMT,” in Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and 4th International Joint Conference on Natural Language Processing of the AFNLP (ACL-IJCNLP '09), pp. 237–240, Association for Computational Linguistics, Stroudsburg, Pa, USA, August 2009. View at Scopus
  24. N. Tomeh, N. cancedda, and M. Dymetman, “Complexity-based phrase-table filtering for statistical machine translation,” in Proceedings of the MT Summit XII, 2009.
  25. M. Eck, S. Vogel, and A. Waibel, “Translation model pruning via usage statistics for statistical machine translation,” in Proceedings of the Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT '07), pp. 21–24, Association for Computational Linguistics, troudsburg, Pa, USA, 2007.
  26. M. Eck, S. Vogel, and A. Waibel, “Estimating phrase pair relevance for machine translation pruning,” in Proceedings of the MT Summit XI, pp. 159–165, 2007.
  27. Y. Chen, A. Eisele, and M. Kay, “Improving statistical machine translation efficiency by triangulation,” in Proceedings of the Sixth International Conference on Language Resources and Evaluation, 2008.
  28. Y. Chen, M. Kay, and A. Eisele, “Intersecting multilingual data for faster and better statistical translations,” in Proceedings of the Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT '09), pp. 128–136, Association for Computational Linguistics, Stroudsburg, Pa, USA, June 2009. View at Scopus
  29. G. Sanchis-Trilles, D. Ortiz-Martínez, J. González-Rubio, J. González, and F. Casacuberta, “Bilingual segmentation for phrasetable pruning in statistical machine translation,” in Proceedings of the 15th International Conference of the European Association for Machine Translation (EAMT '11), pp. 257–264, May 2011. View at Scopus
  30. N. Tomeh, M. Turchi, G. Wisinewski, A. Allauzen, and Y. François, “How good are your phrases? Assessing phrase quality with single class classification,” in Proceedings of the International Workshop on Spoken Language Translation, pp. 261–268, 2011.
  31. W. Ling, J. Graça, I. Trancoso, and A. Black, “Entropy-based pruning for phrase-based machine translation,” in Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 962–971, Association for Computational Linguistics, Stroudsburg, Pa, USA, 2012.
  32. R. Zens, D. Stanton, and P. Xu, “A systematic comparison of phrase table pruning techniques,” in Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 972–983, Association for Computational Linguistics, Stroudsburg, Pa, USA, 2012.
  33. K. Papineni, S. Roukos, T. Ward, and W. Zhu, “BLEU: a method for automatic evaluation of machine translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL '01), pp. 311–318, 2001.
  34. P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer, “The Mathematics of statistical machine translation: parameter estimation,” Computational Linguistics, vol. 19, no. 2, pp. 263–311, 1993. View at Google Scholar
  35. P. Koehn, Statistical Machine Translation, Cambridge University Press, New York, NY, USA, 1st edition, 2010.
  36. E. Galbrun, “Phrase table pruning for statistical machine translation,” Department of Computer Science Series of Publications C C-2009-22, University of Helsinki, 2007. View at Google Scholar
  37. T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley-Interscience, New York, NY, USA, 2006.
  38. Z. He, Y. Meng, Y. Lü, H. Yu, and Q. Liu, “Reducing SMT rule table with monolingual key phrase,” in Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and 4th International Joint Conference on Natural Language Processing of the AFNLP (ACL-IJCNLP '09), pp. 121–124, Association for Computational Linguistics, Stroudsburg, Pa, USA, August 2009. View at Scopus
  39. Z. He, Y. Meng, and H. Yu, “Discarding monotone composed rule for hierarchical phrase-based statistical machine translation,” in Proceedings of the 3rd International Universal Communication Symposium (IUCS '09), pp. 25–29, New York, NY, USA, December 2009. View at Publisher · View at Google Scholar · View at Scopus
  40. K. T. Frantzi and S. Ananiadou, “Extracting nested collocations,” in Proceedings of the 16th Conference on Computational Linguistics, vol. 1, pp. 41–46, Association for Computational Linguistics, Stroudsburg, Pa, USA, 1996.
  41. P. Koehn, H. Hoang, A. Birch et al., “Moses: open source toolkit for statistical machine translation,” in Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180, Association for Computational Linguistics, Stroudsburg, Pa, USA, 2007.
  42. D. Chiang, “A hierarchical phrase-based model for statistical machine translation,” in Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 263–270, Association for Computational Linguistics, Stroudsburg, Pa, USA, June 2005. View at Scopus
  43. W. A. Gale and K. W. Church, “Identifying word correspondences in parallel texts,” in Proceedings of the Workshop on Speech and Natural Language, pp. 152–157, Association for Computational Linguistics, Stroudsburg, Pa, USA, 1991.
  44. A. Stolcke, “SRILM—an extensible language modeling toolkit,” in Proceedings of the International Conference on Spoken Language Processing, pp. 901–904, 2002.
  45. M. Federico, N. Bertoldi, and M. Cettolo, “IRSTLM: An open source toolkit for handling large scale language models,” in Proceedings of the 9th Annual Conference of the International Speech Communication Association (INTERSPEECH '08), pp. 1618–1621, September 2008. View at Scopus
  46. P. Koehn, “Europarl: a parallel corpus for statistical machine translation,” in Proceedings of the MT Summit X, 2005.
  47. S. Green and J. DeNero, “A Class-based agreement model for generating accurately inflected translations,” in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 146–155, Association for Computational Linguistics, Stroudsburg, Pa, USA, 2012.
  48. N. Xue, F. D. Chiou, and M. lmer, “Building a large-scale annotated chinese corpus,” in Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–8, PaAssociation for Computational Linguistics, Stroudsburg, Pa, USA, 2002.
  49. N. Xue, F. Xia, F.-D. Chiou, and M. Palmer, “The Penn Chinese TreeBank: phrase structure annotation of a large corpus,” Natural Language Engineering, vol. 11, no. 2, pp. 207–238, 2005. View at Publisher · View at Google Scholar · View at Scopus
  50. S. Vogel, H. Ney, and C. Tillmann, “Hmm-based word alignment in statistical translation,” in Proceedings of the 16th International Conference on Computational Linguistics, pp. 836–841, 1996.
  51. CWMT, 2013, http://www.liip.cn/cwmt2013/.
  52. L. Tian, F. Wong, and S. Chao, “An improvement of translation quality with adding key-words in parallel corpus,” in Proceedings of the International Conference on Machine Learning and Cybernetics (ICMLC '10), pp. 1273–1278, Qingdao, China, July 2010. View at Publisher · View at Google Scholar · View at Scopus
  53. L. Tian, D. Wong, L. Chao et al., “UM-Corpus: a large English-Chinese parallel corpus for statistical machine translation,” in Proceedings of the 9th International Conference on Language Resources and Evaluation, 2014.