學分數：3

一、 課程說明 (Course Description)

This class provides a practical guide for students to perform large-scale data analysis with open-source tools. We bring machine learning theory, tools and, and

real-world datasets together to teach students how to analyze massive data effectively and efficiently.

This course is divided into 3 parts. In the first part, we review some maths required by machine learning. In the second part, we introduce fundamental machine

learning concepts/models/algorithms. And lastly, in part 3 we discuss the large-scale machine learning for big data and how it differs from small-scale learning

tasks.

This course emphasizes BOTH theory and coding. It is intended for senior undergraduate and graduate students who have proper understanding of computer

programming, probability, calculus, and linear algebra. In particular, we will use Python as the main programming language throughout the course. Although being

helpful, the background knowledge of large-scale machine learning tools/libraries such as Spark and Theano is not necessary.

二、 指定用書 (Textbook)

[1]. Lecture Notes

三、 參考文獻 (References)

[1]. TBA

四、 教學方式 (Teaching Method)

Lecture

五、 教學進度 (Syllabus)

1. Introduction

• Python 101

MATH REVIEW

2. Linear algebra

• EDA, Scikeit-learn, and PCA

3. Probability and information theory

• Decision tree and random forest

4. Optimization 1

• Perceptron and adaline

5. Optimization 2

• Linear regression

MACHINE LEARNING

6. Model capacity and regularization

• Ridge and LASSO

7. Maximum likelihood estimation

• Logistic regression

8. Support vector machines and experiments

• Scikit-learn pipeline

LARGE-SCALE ML

9. What’s the difference? (model and optimization)

• Spark and MLlib

10. Approximate inference

• Monte Carlo Simulation + Yahoo Finance

11. Neural networks and deep learning

• MNIST

12. NN training and regularization

13. Convolutional NN

• Theano and Keras + ILSVRC

14. Recurrent NN

• NLP, Image Captioning + MSCOCO

六、 成績考核 (Evaluation)

Midterm exam: 20%

Assignments: 30%

Projects: 20%