Summer Tutorial on Statistical Natural Language Processing

By Dr. Keh-Yih Su and Prof. Jing-Shin Chang


 

---

Natural Language Computing Group at Microsoft Research Asia (MSRA) is very pleased to announce that we have the honor to invite a world renowned scholar in Natural Language Processing, Dr. Keh-Yih Su and his colleague, Prof. Jing-Shin Chang, to MSRA to hold a summer tutorial on Statistical Natural Language Processing (SNLP).

We cordially invite all who are interested in SNLP to join us in this tutorial for the valuable opportunity to explore the fundamentals and advanced topics in the SNLP field.

Expenses:
1. This tutorial is free of charge, but you need to cover the travel and living expenses.
2. Microsoft Research Asia will provide the lecture materials, drinks and lunch.

For more details of the tutorial and registration information, please click Registration.

Note:
Please note that we have an upper limit of attendee number. We will give priority to students, teachers from universities if the number exceeds the capacity of the lecture room space.



Short Bios of Dr. Keh-Yih Su and Prof. Jin-Shin Chang

Dr. Su received his Ph.D. degree from the University of Washington, Seattle, USA in 1984. After graduation, he became a professor at the National Tsing-Hua University in Taiwan. In 1985, he started an English-to-Chinese Machine Translation project (which was later named BehaviorTran) and became its director. In 1988, this project was transferred to the Behavior Design Corporation (BDC), which was founded in the Science-Based Industrial Park (SBIP), Hsinchu, Taiwan, to support the long-term R&D work of the BehaviorTran Machine Translation System. Since then, the company has been providing translation service to many well-known international corporations. In 1998, Dr. Su left the university to join the Behavior Design Corp. He is now the general manager of the company.

Dr. Su has been a leading figure in the field of Natural Language Processing since 1986. In 1991, at the conference of Machine Translation Summit III, he proposed and presented the Corpus-Based Statistics-Oriented (CBSO) Approach adopting the two-way training mechanism and avoiding the problems induced by those purely statistical approaches (e.g., IBM 1990). He has published over 100 technical papers and has been the editorial member for several international journals. He has also been the invited speaker at numerous MT-related international conferences.


Professor Jing-Shin Chang received his BS Degree from the Department of Electrical Engineering of the National Tsing-Hua University (NTHU), Hsinchu, Taiwan, in 1984. In 1986, he joined the Machine Translation Research Group of NTHU, which had been under the leadership of Professor Keh-Yih Su of the EE Department since 1985. He became the project leader of the MT Research Group in 1987. During this period of time, he was the principle designer of a new generation MT parser and participated in most of the major research and development work on the ArchTran (now known as the BehaviorTran) Machine Translation System. In 1988, he started his study for his MS Degree and PhD Degree in the National Chiao-Tung University (NCTU), Hsinchu, Taiwan, and NTHU, respectively, while keeping a close cooperative relationship with the Behavior Design Corporation in all aspects of MT R&D work. He received his MS degree from NCTU in 1990, and PhD from NTHU in 1997, respectively. Since 1997, he had been a senior researcher of the Behavior Design Corporation, working on the next generation CBSO (Corpus-Based Statistics-Oriented) MT System. In 2000, he became an Assistant Professor at the Department of Computer Science and Information Engineering, National Chi-Nan University (NCNU), Puli, Nantou, Taiwan. Currently, he is still working on various challenging MT research and related topics, and is expected to do so for the foreseeable future.

The Two-Day Program:

Aug. 17: Introduction to Statistical Natural Language Processing (Mainly Cover Supervised Learning)

 Part I: Introduction (1)
 Problems and Characteristics of Natural Language Processing
 Part II: Introduction (2)
 What, When, and Why for Statistical Approach
 Part III: Basic Concepts and Background
 Feature Space, Probability, Estimator, Stochastic Process, Data Set Classification, and Performance Measure
 Part IV: Typical Applications
 Word Segmentation, Tagging, Selecting Parse Tree, Aligning Bilingual Corpus
 Part V: Techniques for Improving Performance
 Smoothing, Class-Based Model, Adaptive Learning, Tips for Checking
 Part VI: Advanced Topics
 Support Vector Machine, Maximum Entropy Models
 Appendix: Related Techniques
 Parameter Estimation, Fractional Factorial Experiment Design, Decision Tree

Aug. 18: Unsupervised Learning for Natural Language Processing
 Part I: Introduction
 What and When for Unsupervised Learning, Why It Is Getting More Popular
 Part II: Basic Concepts and Background (using EM as an example)
 Incomplete Data Space
 Learnability
 Part III: Typical Unsupervised Learning Algorithms: Viterbi & EM
 Procedures, Characteristics
 Part IV: Potential Traps & Source of Problems
 Various Mismatches, Model Deficiencies, Local Maximum, and Over-fitting
 Part V: Suggested Strategies for Better Performance
 Lessons Learned from Past Experience
 Recommended Procedures for Unsupervised Learning
 Part VI: Co-Training
 Basic Principle
 Example: Chinese Compound Noun Extraction