統計與機器學習概論一(Introduction of Statistics and Machine Learning (I)) is designed
for machine learning beginners where many examples (but not all) are
made for students with biomedical background. The class lays mathematical
foundations for its sequel class 統計與機器學習概論二(Introduction of Statistics and
Machine Learning (II)) that we recommend against students to take if (I) or
similar has never been taken. These relevant mathematical concept and details
include but not limited to relation between statistics and probability, concept of
and inference from sampling, distributions, linear algebra, linear transformation,
regression and correlation, Goodness-of-fit test, Discriminant analysis, Training
Methods, Maximum Likelihood and Bayesian Parameter Estimation
and ODEs applied for biomedical research, Decision Trees, Estimators, Decisions
and Machine Learning Basics. Python workshop is provided along with the class
where the successful completion of two homework in the class requires programming
skills.

Teachers: 楊立威、張筱涵、洪樂文、李祈均老師 (Profs LW Yang, HH Chang, YW Hong, CC Lee)

Textbooks:
1. https://www.amazon.com/Analysis-Biological-Data-Michael-
Whitlock/dp/0981519407 (@NTHU library)
2. Biological Sequence Analysis Probabilistic Models of Proteins and Nucleic
Acids (@NTHU library)
3.

Weeks:
1. 9/14. From the things you already knew (to some extent) – mean, SD,
variance, how is statistics related to probability? Comparing two means (doing it
with your MS Excel sheet). Why does statistics/probability constitute the basics
of machine learning (examples on drug development and exercise prescription)?
(LW Yang)

9/21, 9/28 holidays – Self-study on math chapters

2. 10/5. Read the math formula – index, difference between probability
(discrete function) and probability density (continuous function), learning
summation/multiplication signs in the games to align two protein sequences (LW
Yang).
3. 10/12 Distributions Rules Probability addition/multiplication, dependency,
conditional probability, marginal probability and Bayes' theorem (LW Yang)
4. 10/19. definition of important names, sampling biases and rules, Inference
from samples A (w1, HH Chang)
5. 10/26 Inference from samples B (w4, HH Chang)
6. 11/2 Goodness-of-fit test, contingency analysis and one sample inference
(w6/7, HH Chang)
7. 11/9 Comparing of two means (continued), (Relative) Entropy, Information
Content, “Distance” between Distributions, Boltzmann Relation and Normalization –
frequency vs proportion, normalization by controls, by ranking, by probability (LW
Yang)
8. 11/16 Correlation and Regression (HH Chang)
9. 11/23 ODE & Chain Rule & Back-propagation of layers of Perceptrons (by YW
Liu)
10. 11/30 Detection Theory (YW Hong)
11. 12/7 Estimation Theory (YW Hong)
12. 12/14 Moving Towards Data-Driven Techniques (YW Hong)
13. 12/21 Introduction to Machine Learning (CC Lee)
14. 12/17 Manners of Learning (Supervised, Unsupervised, Semisupervised)(CC Lee)
15. 1/4 ML Experiment Setup (Training, Testing, Cross validation)(CC Lee)
16. 1/11 Final Exam

Grading:
Two math exams (10% each), two statistics/probability homework (10% each), two
computational homework (15% each – Protein Sequence Aligner & CpG island predictor
using log-odds) due on 1/16, 2022.
Final(30%)

With programming workshop:
Python programming taught in 10+ weeks + one session of basic Linux