大規模機器學習實務 (Large-Scale Machine Learning)
學分數:3

一、 課程說明 (Course Description)
This class provides a practical guide for students to perform large-scale data analysis with open-source tools. We bring machine learning theory, tools and, and
real-world datasets together to teach students how to analyze massive data effectively and efficiently.

This course is divided into 3 parts. In the first part, we review some maths required by machine learning. In the second part, we introduce fundamental machine
learning concepts/models/algorithms. And lastly, in part 3 we discuss the large-scale machine learning for big data and how it differs from small-scale learning
tasks.

This course emphasizes BOTH theory and coding. It is intended for senior undergraduate and graduate students who have proper understanding of computer
programming, probability, calculus, and linear algebra. In particular, we will use Python as the main programming language throughout the course. Although being
helpful, the background knowledge of large-scale machine learning tools/libraries such as Spark and Theano is not necessary.

二、 指定用書 (Textbook)
[1]. Lecture Notes

三、 參考文獻 (References)
[1]. TBA

四、 教學方式 (Teaching Method)
Lecture

五、 教學進度 (Syllabus)
1. Introduction
• Python 101
MATH REVIEW
2. Linear algebra
• EDA, Scikeit-learn, and PCA
3. Probability and information theory
• Decision tree and random forest
4. Optimization 1
• Perceptron and adaline
5. Optimization 2
• Linear regression
MACHINE LEARNING
6. Model capacity and regularization
• Ridge and LASSO
7. Maximum likelihood estimation
• Logistic regression
8. Support vector machines and experiments
• Scikit-learn pipeline
LARGE-SCALE ML
9. What’s the difference? (model and optimization)
• Spark and MLlib
10. Approximate inference
• Monte Carlo Simulation + Yahoo Finance
11. Neural networks and deep learning
• MNIST
12. NN training and regularization
13. Convolutional NN
• Theano and Keras + ILSVRC
14. Recurrent NN
• NLP, Image Captioning + MSCOCO

六、 成績考核 (Evaluation)
Midterm exam: 20%
Assignments: 30%
Projects: 20%