一、 課程說明 (Course Description)
This course provides an overview of the current database management systems
in the cloud, and explains how they are different from traditional database
systems. The goals are 1) to get students familiar with some ste-of-the-srt
implementations (e.g., Google BigTable, Google MegaStore, and
Google Spanner etc.); and more importantly, 2) to help students make better
decisions on the design trade-offs when they want to build their own
database systems after taking a particular set of target
applications/tenants in mind.

Proper understanding of Java/OOP programming and data structure is required.
剛修完作業系統、演算法,對大型程式系統有興趣之學生優先。

二、 指定用書 (Textbook)
Lecture Notes

三、 參考書籍 (References)
[1] Database Management Systems, 3ed, by Raghu Ramakrishnan et al., ISBN:
0072465638
[2] Database System Concepts, 6ed, by Abraham Silberschatz et al, ISBN:
0073523321
[3] Database Design and Implementation, by Edward Sciore, ISBN: 0471757160
[4] Principles of Distributed Database Systems, 3ed, by M. Tamer Ozsu, iSBN:
1441988335

四、 教學方式 (Teaching Method)
Lecture and Lab

五、 教學進度 (Syllabus)
PART-I FUNDAMENTALS
1. Introduction to cloud databases
A. Relational model, SQL, transactions, and ACID (with quiz)
B. Scalability, availability, and elasticity
2. Query engine
A. JDBC and DB server
B. Relational algebra
C. Query plans, scans, and the storage interface
D. Parsing
E. Planning
3. Storage engine
A. Disk and file management
B. Memory management
C. Transaction management (+OCC)
D. Record management
E. Metadata management
4. Benchmarking with TPC-C
5. Optimizations
A. Indexing
B. Materialization and sorting
C. Multi-buffer plans
D. Query optimization

PART-II CLOUD DATABASES
6. Distributed databases
A. Partitioning and 2PC (for atomicity)
B. Replication: eager/lazy, master/slave and group communication
(replication: tutorial)
i. 2PL/ OCC, 2PC for Availability
C. Distributed data independence
D. Query processing
7. The CAP theorem and Paxos
8. Usage analysis: operational and analytic workloads
9. Analytics
A. Google File System
B. MapReduce
C. Graph models and Google Pregel (if time allows)
10. NoSQL operations
A. Google BigTable and GFSv2
B. Megastore
C. Key-value and document stores (if time allows)
11. NewSQL operations
A. Google CouldSQL
B. Google Spanner and F1
C. Deterministic DBs

PART-III PRACTICE
12. Deadlock avoidance and conservative locking
13. Selinger-style query optimizer
14. High scalability: determinism with Paxos

六、 成績考核 (Evaluation)
Midterm Exam 20%
Homework 60%
Term Project 20%