Course Contents & Slides (Last update: 2001/11/07)

Double Click the Underlined Document Numbers to Download the Slides

Copyright Statements (版權聲明)

All Documents are Copyrighted by Their Respective Authors.
The course contents in the slides here are intended only for non-profitable
and educational purposes. Distribution other than personal and
educational uses is subject to liability problems and law suit.
本網頁所連結之各種文件, 其版權乃各該作者所有.
各投影片之內容僅供個人非營利之教育用途.
任意散佈上述文件供非個人教育用途之用恐有違法之虞, 請勿為之.

Courses: [NLP]

Natural Language Processing (#215021)

Text Books & References

(*) The textbook is to be decided since [1] has many minor but noisy errors as found in my last year's course -- (*) I am thinking of using [2] or [3] plus some of my own handouts as the replacement. [1] Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by Daniel Jurafsky & James H. Martin, Prentice Hall, 2000. Website: http://www.cs.colorado.edu/~martin/slp.html Errata: http://www.cs.colorado.edu/~martin/SLP/slp-errata.html (Local copy) Errata: http://www.cs.colorado.edu/~martin/SLP/New_Pages/pg455.pdf (Local copy) - Probabilistic CYK Algorithm [*] Slides & Addons: Preface Chapter 1: Introduction, Historical Review Chapter 2: Words, Regular Expression, Fast Matching (04/09) Chapter 2+: Chinese Words, Word Segmentation (03/27) Chapter 3: Morphology & Finite State Tranducers (04/10) Chapter 6: N-Gram (Model, Parameter Estimation & Smoothing) (04/17) Good-Tuning Smoothing & Backoff (04/17) Chapter 7: HMM (Hidden Markov Model) (slides: courtesy of Prof. Sheng-Wen Shih) Chapter 9: Context-Free Grammars (and a Simple Grammar for English) (06/05) Chapter 10: Parsing Algorithms -- (todo ...) Generic Chart Parsing (todo ...) CYK (todo ...) Earley (todo ...) Left-Corner Parsing (todo ...) LR Parser with Augmentation (todo ...) Chapter 12: PCFG, Trainable Grammars, and Lexicalization [I] Trainable Grammars: 1. Inside/Outside Probabilities 2. Estimation of Rule Probabilities 3. Finding Best Parse [IIa] GPSM: PCFG enhanced with Context Sensitivity, Lexicalization and Normalization [IIb] Lexicalized PCFG (todo) Todo ... [*] Figures of the book: Figs: http://www.cs.colorado.edu/~martin/SLP/Figures/PDF/slp-figs.tar.gz (Local copy) Other Figs: http://www.cs.colorado.edu/~martin/SLP/Figures/fixedtrees.tar (Local copy) [2] Foundations of Statistical Natural Language Processing, by Christopher D. Manning and Hinrich Schutze, MIT Press, 1999. [3] Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, by XueDong Huang, Alex Acero and Hsiao-Wen Hon, Prentice Hall PTR, Upper Saddle River, NJ 07458, USA, 2001. http://www.phptr.com (Ch. 1, Sec. 2.3-2.5, Ch. 3, Ch. 4, Sec. 5.8, Ch. 8, Ch. 11, Ch. 12, Ch. 13, Ch. 14, Sec. 17.3-17.5 will be particularly interesting for Statistical NLP.) [4] Advanced NLP Issues ... (beyond algorithmic/application points of view ...) [1] Modeling Problems [1.1] Features [1.2] Dependency [2] Estimation Problems [2.1] Performance Metrics [2.2] Dicrimination [2.3] Robustness [2.4] Adaptive Training [2.5] Supervised vs. Unsupervised Training - Why and When

Conference Papers

[11] Chang, J.-S. , Y.-F. Luo and K.-Y. Su, "GPSM: A Generalized Probabilistic Semantic Model for Ambiguity Resolution," Proceedings of ACL-92, pp. 177--184, 30th Annual Meeting of the Association for Computational Linguistics, University of Delaware, Newark, DE, USA, 28 June--2 July, 1992. [13] Tung-Hui Chiang, Jing-Shin Chang, Ming-Yu Lin and Keh-Yih Su, "Statistical Models for Word Segmentation and Unknown Word Resolution," Proceedings of ROCLING-V, ROC Computational Linguistics Conference V, pp. 123--146, National Taiwan University, Taipei, Taiwan, ROC, Sep. 18--20, 1992. (PDF version)

Journals and Books

[6] Tung-Hui Chiang, Jing-Shin Chang, Ming-Yu Lin and Keh-Yih Su, "Statistical Word Segmentation," in C.-R. Huang, K.-J. Chen and Benjamin K. T'sou (eds.): Journal of Chinese Linguistics, Monograph Series Number 9, Readings in Chinese Natural Language Processing, pp. 147-173. University of California, Berkeley. 1996. [7] K.-Y. Su, Tung-Hui Chiang, and Jing-Shin Chang, "An Overview of Corpus-Based Statistics-Oriented (CBSO) Techniques for Natural Language Processing," International Journal of Computational Linguistics & Chinese Language Processing (CLCLP), vol. 1 no. 1, pp. 101--157, August 1996. [8] Yu-Ling Una Hsu, Jing-Shin Chang, and Keh-Yih Su, "Computational Tools and Resources for Linguistics Studies," International Journal of Computational Linguistics & Chinese Language Processing (CLCLP), vol. 2, no. 1, pp. 1-39, 1997.

Miscellaneous

Su, K.-Y., T.-H. Chiang and J.-S. Chang, "Introduction to Corpus-based Statistics-oriented (CBSO) Techniques," Pre-Conference Workshop on Corpus-based NLP, ROC Computational Linguistics Conference VII, National Tsing-Hua Univ., Taiwan, ROC., Aug. 1994. Part I: Introduction (PDF/4) (PS/4) (PDF) (PS) Part II: Basic Concepts (PDF/4) (PS/4) (PDF) (PS) Part III: Techniques (PDF/4) (PS/4) (PDF) (PS) Errata: Corrections to Part I-III TXT "RFC 1922: Chinese Character Encoding for Internet Messages," (2nd Rev.) was invited as a co-author of this RFC document for my technical revisions on the ROC part of the ISO-2022 conformant encoding standard, aka ISO-2022-CN, (originated from the Chinese Character Subworking Group of the I18N/L10N Working Group of the Asian Pacific Networking Group, APNG-CC), March, 1995. Reports related to RFC1922

#Visitors: (Since 2001/04/11)