一、課程說明(Course Description)

This course covers the fundamentals of natural language processing with
emphases on using statistical techniques, machine learning, very large
corpora to handle unrestrict text data, in particular on-line information,
electronic communication and the World Wide Web.

We will explore the problems and solutions in lexical, syntactic and
semantic analyses of text and show various techniques that are useful
for a wide range of applications, including machine translation,
information retrieval, and text categorization.


二、指定用書(Text Books)


三、參考書籍(References)

Foundations of Statistical Natural Language Processing,
Christopher Manning and Hinrich Schutze, The MIT Press.

Introduction
1. Fasold, Ralph and Jeff Connor-Linton. 2006. Introduction, Chapter 1, An
Introduction to Language and Linguistics
2. Mani, Inderjeet. 2006. Computational Linguistics, Chapter 14, Computation
Linguistics.

Corpora
3. J93-1001: Kenneth W. Church; Robert L. Mercer. Introduction to the Special Issue
on Computational Linguistics Using Large Corpora
http://acl.ldc.upenn.edu/J/J93/J93-1001.pdf
4. J03-3001: Adam Kilgarriff; Gregory Grefenstette. Introduction to the Special Issue
on the Web as Corpus http://acl.ldc.upenn.edu/J/J03/J03-3001.pdf

Math
5. FSNLP. Chapter 2 Mathematic foundation

Language Model
6. FSNLP. Chapter 6 Statistical Inference: n-gram Models over Sparse Data

Hidden Markov Model
7. FSNLP. Chapter 9 Markov Models

Machine Translation
8. Kevin Knight. 2005. Tutorial on Machine Translation
9. "Syntax-based Language Models for Machine Translation" (E. Charniak, K. Knight,
and K. Yamada), Proc. MT Summit IX, 2003.
http://www.isi.edu/natural-language/projects/rewrite/mtsummit03.pdf.
10. "Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and
Generating New Sentences" (B. Pang, K. Knight, and D. Marcu), Proc. NAACL-HLT, 2003.
http://www.isi.edu/natural-language/projects/rewrite/bopang.pdf
11. Christoph Tillmann; Hermann Ney. Word Reordering and a Dynamic Programming
Beam Search Algorithm for Statistical Machine Translation.
http://acl.ldc.upenn.edu/J/J03/J03-1005.pdf
Parsing
12. Eugene Charniak. 1997. Statistical Techniques for Natural Language Parsing, AI
Magazine, 18:4. 33-44.

Collocation
13. Smadja, McKeown, and Hatzivassiloglou. Translating Collocations for Bilingual
Lexicons. Smadja, McKeown, and Hatzivassiloglou. Translating Collocations for Bilingual
Lexicons. acl.ldc.upenn.edu/J/J96/J96-1001.pdf
14. Lin, Dekang and Pantel, Patrick. 2001. Dirt – discovery of inference rules from text.
acl.ldc.upenn.edu/I/I05/I05-5011.pdf
15. Rebecca Green, Bonnie J. Dorr, and Philip Resnik. Inducing Frame Semantic Verb
Classes from WordNet and LDOCE.
acl.ldc.upenn.edu/acl2004/main/pdf/264_pdf_2-col.pdf

Word Sense Disambiguation
16. David Yarowsky. 1992. Word-Sense Disambiguation Using Statistical Models of
Roget's Categories. acl.ldc.upenn.edu/C/C92/C92-2070.pdf
17. David Yarowsky. 1995. UNSUPERVISED WORD SENSE DISAMBIGUATION RIVALING
SUPERVISED METHODS. acl.ldc.upenn.edu/P/P95/P95-1026.pdf
18. Dominic Widdows, Stanley Peters, Scott Cederberg, Chiu-Ki Chan. 2003.
Unsupervised Monolingual and Bilingual Word-Sense Disambiguation of Medical
Documents using UMLS. acl.ldc.upenn.edu/acl2003/nlbio/pdf/Widdows.pdf
19. Lucy Vanderwende. 1994. Algorithm for automatic interpretation of noun
sequences. Proceedings of COLING, acl.ldc.upenn.edu/C/C94/C94-2125.pdf

Question Answering and Information Extraction
20. X98-1016: Roman Yangarber; Ralph Grishman. Transforming Examples into
Patterns for Information Extraction. acl.ldc.upenn.edu/X/X98/X98-1016.pdf
21. Heng Li. Information Retrieval by type
http://lamda.nju.edu.cn/conf/mla05/reports/HangLi.Machine%20Learning%20Approaches
%20to%20Information%20Retrieval.pdf#search=%22information%20%22by%20type%22%20%
22hang%20li%22%20msra%22
22. Learning to find answers on the Web
www.cs.columbia.edu/~eugene/papers/toit2002.pdf
23. Mitkov, R. & Ha, L.A. (2003). Computer-Aided Generation of Multiple-Choice Tests.
In Proceedings of the HLT-NAACL 2003 Workshop On Building Educational Applications
Using Natural Language Processing, Edmonton, Canada, May, pp. 17-22.

Misc

1. Statistical Natural Language Processing Reading List
ciir.cs.umass.edu/~fuchun/readlist_all/readlist.pdf


四、教學方式(Teaching Method)


五、教學進度(Syllabus)

1. Introduction to NLP.
2. Mathematical fundamentals: probability theory and information theory
3. Linguistics Essentials: parts of speech, phrase structure, semantics
4. Corpus-based NLP research
5. Word Sense Disambiguation
6. Lexical Acquisition
7. Markov Models
8. Part-of-speech Tagging
9. Probabilistic Context Free Grammar and Parsing
10. Text and translation alignment and Statistical Machine Translation
11. Clustering of words, use of word class in n-gram models
12. Information assess and document retrieval
13. Text categorization


六、成績考核(Evaluation)

Assignments
Midterm examination
Term project


七、可連結之網頁位址

http://nlplab.cs.nthu.edu.tw/