課程說明:

近年來,越來越多大專院校研究所的畢業論文採用語言分析為主題,尤其偏重在語義分析方面,語料的取
材豐富而多樣,而分析結果的應用更是令人感到新奇且驚艷。語言學之外的科系對於語義分析越來越感興
趣的氛圍,也吸引了語言學為專攻領域的研究者從傳統的研究方法走出來,加入這一波應用科技運算探索
自然語言語義處理相關議題的潮流。本學期的專題課程將帶領修課同學回顧目前不同科系對於以自然語言
為語料所進行的語義分析,並以目前發展技術純熟的套件來做實際練習,適合於正尋找有前瞻性之學位論
文方向的同學選修。


關鍵詞:

語言分析, 語義分析, 自然語言, 運算



指定用書:

Barriere, Caroline. (2016). Natural language understanding in a semantic web
context. Springer eBooks. [國立清華大學圖書館電子書 ISBN9783319413372]



教學方式:

課堂授課,專題討論



教學進度:

Part I: How to Discover the Hidden Topics from Given Documents using Latent
Semantic Analysis in Python

Week 1: Topic Modeling
Week 2: Text Classification
Week 3: Latent Semantic Analysis
Week 4: Using Gensim
Week 5: Determine Optimum Number of Topics in a Document
Week 6: Pros and Cons of LSA
Week 7: Cases of Topic Modeling (1)
Week 8: Cases of Topic Modeling (2)
Week 9: Review

Part II Working with Corpora

Week 10: Mutual Information and Collocations
Week 11: Estimating Sequence Probabilities
Week 12: Gathering N-Gram Probabilities from Corpora
Week 13: Building a Domain-Independent Corpus
Week 14: Issues in N-Gram Estimation
Week 15: Word Sense Disambiguation
Week 16: Bag-of-Words Content: Looking at Text Cohesion
Week 17: Bag-of-Words Comparison
Week 18: Gold Standard and Evaluation



成績考核:

(1) 課堂及討論參與 20%
(2) 資料收集 25%
(3) 資料分析 25%
(4) 專題報告 30%



相關網頁:

https://github.com/adashofdata/nlp-in-python-tutorial