Course title:機器翻譯實作 (Machine Translation Lab) 教室:資電 326
Course number:ISA 663500
Instructor: 張俊盛
The course consists of a set of small exercises on machine translation based on statistical approach.
The purpose is to give students opportunity to get hand on experience by working with problems and
data available from the NIST Open Machine Translation (MT) evaluation series (1). The course will cover
the fundaments of machine translation, open-source tools for developing an MT systems and hands-
on sessions. The hands-on session will start with explanation of background, experimental data, and
snippets of code. The students are required to do the assignment in class. The instructor and teaching
assistants will be on hand to help students. The list of topics planned for the Fall, 2007 is as follows.
The Official programming language is Python.

Topics
1. Introduction
2. Machine translation and statistical machine translation
3. The NIST Open Machine Translation (MT) evaluation series
4. Subsentential alignment
5. Word alignment
6. Phrase alignment using parallel corpus
7. Phrase alignment using non-parallel corpus
8. Class based word and phrase alignment
9. Word and phrase based decoder
10. Syntax directed SMT
11. Parsing and bilingual parsing
12. Integration of Word Sense Disambiguation into MT
13. Web as corpus (monolingual or bilingual)
14. Web Mining of bilingual texts (paragraphs, phrases, word translations)
15. Machine translation evaluation
16. System integration


Reading

1. Machine Translation Workbook, by Kevin Knight.
www.isi.edu/~knight/ and www.isi.edu/natural-language/mt/wkbk.rtf
2. What is new in Statistical Machine Translation by Kevin Knight.
l2r.cs.uiuc.edu/~danr/Teaching/CS598-05/Lectures/knight.ppt.
3. Lecture Notes of the Second Machine Translation Marathon, Berlin, Germany, May 12-20, 2008.
4. Related link of the event:
schedule: http://euromatrix.net/events/second-machine-translation-marathon/
wiki: http://www.statmt.org/mtm2/?n=Main.HomePage

References:
5. NIST Open Machine Translation Home, http://www.nist.gov/speech/tests/mt/.
6. SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing,
Computational Linguistics , and Speech Recognition, Jurafsky and Martin, Ch. 25, MT,
http://www.cs.colorado.edu/~martin/slp2.html#Chapter24
7. Using collocations from comparable corpora to find translation equivalents, S Sharoff, B Babych, A
Hartley, Proceedings of the International Conference on Language, 2006.
8. ASSIST: Automated semantic assistance for translators, S Sharoff, B Babych, P Rayson, O Mudraya, S
Piao, Proceedings of EACL, 2006.
9. A hierarchical phrase-based model for statistical machine translation, D Chiang - Proceedings of
the 43rd Annual Meeting on Association for Computational Linguistics, 2005.
10. Fast and optimal decoding for machine translation, U Germann, M Jahr, K Knight, D Marcu, K
Yamada - Artificial Intelligence, 2004.
11. The alignment template approach to statistical machine translation, FJ Och, H Ney, Computational
Linguistics, 2004, MIT Press.
12. Integer programming decoder for machine translation, K Knight, K Yamada - US Patent 7,177,792,
2007 - Google Patents.
13. Clause Restructuring for Statistical Machine Translation - M Collins, P Koehn, I Kucerova – Annual
Meeting of the ACL, 2005.
14. Novel reordering approaches in phrase-based statistical machine translation - S Kanthak, D Vilar, E
Matusov, R Zens, H Ney – Workshop on Building and Using Parallel Texts: Data-Driven Machine
Translation and Beyond, 2005.
15. Integration of POS tag-based source reordering into SMT decoding by an extended search graph,
JM Crego, JB Marino – Conference of the Association for Machine Translation in the Americas, 2006.
16. Local search with very large-scale neighborhoods for optimal permutations in machine translation,
J Eisner, RW Tromble - Proceedings of the Human Language Technology (HLT), 2006.
17. Filtering multilingual Web content using fuzzy logic and self-organizing maps - R Chau, CH Yeh -
Neural Computing & Applications, 2004 - Springer
18. Using bilingual comparable corpora and semi-supervised clustering for topic tracking. F
Fukumoto, Y Suzuki - Proceedings of the COLING/ACL, 2006.
19. Large Language Models in Machine Translation - T Brants, A Popat, P Xu, F Och, J Dean -
Proceedings of the 2007 Joint Conference on Empirical Natural Language Processing, 2007.
20. Monolingual machine translation for paraphrase generation - C Quirk, C Brockett, W Dolan -
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 2004.
21. Translation Exercise Assistant: Automated Generation of Translation Exercises for Native-Arabic
speaker, J Burstein, D Marcu - Proceedings of HLT/EMNLP on Interactive Demonstrations, 2005.
22. SPMT: Statistical Machine Translation with Syntactified Target Language Phrases, D Marcu, W Wang,
A Echihabi, K Knight - Proceedings of EMNLP, Sydney, Australia, 2006.
23. Improving Machine Translation Performance by Exploiting Non-Parallel Corpora - DS Munteanu, D
Marcu - Computational Linguistics, 2005
24. Improved statistical machine translation using paraphrases - C Callison-Burch, P Koehn, M
Osborne - Proceedings of the main conference on Human Language Technology, 2006
25. Scaling phrase-based statistical machine translation to larger corpora and longer phrases - C
Callison-Burch, C Bannard, J Schroeder - Proceedings of the 43rd Annual Meeting on Association for
Computational Linguistics, 2005
26. Named entity translation, K Knight, Y Al-Onaizan - US Patent 7,249,013, 2007 - patentstorm.us
27. Machine Reading - O Etzioni, M Banko, MJ Cafarella - 2007 AAAI Spring Symposium on Machine
Reading, 2007.
28. A Comparative Study on Compositional Translation Estimation using a Domain/Topic-Specific
Corpus collected from the Web - M Tonoike, et al. Proc. 2nd International Workshop on Web as Corpus,
2006.
29. Mining translations of OOV terms from the web through cross-lingual query expansion - Y Zhang,
F Huang, S Vogel - Proceedings of the 28th annual international ACM SIGIR, 2005
30. The Wikipedia xml corpus - L Denoyer, P Gallinari - ACM SIGIR Forum, 2006.


Available Software

1. Giza++
2. Pharaoh
3. Moses

(Robert C. Moore, Microsoft Research)

1. Bilingual Sentence Aligner When people translate documents from one language to another, not all
sentences are translated one-for-one. Perl code implementation
2. Context-Free Parsing Algorithms Implementations of several parsing algorithms.
3. Unification Grammar Sentence Realization Algorithms Prolog implementations of two versions of
the unfication grammar sentence realization algorithm.

五、教學進度(Syllabus)

六、成績考核(Evaluation)

以小考、上機作業及進度報告為主。

七、可連結之網頁位址

張俊盛老師的網頁:http://nlp.cs.nthu.edu.tw/nlplab/teacher.htm
張智星老師的網頁:http://www.cs.nthu.edu.tw/~jang