Intended Students:
CS Graduate Students or Undergraduate students in their 3rd, or 4th year

objectives:
To teach students theory, algorithms, Python programming, Verilog
implementation, and ASIC optimization of contemporary neural networks in terms
of performance, accuracy, model size, energy efficiency

Prerequisites:
Digital Logic Design and Hardware Lab
Computer Architecture
Familiar with Synthesizable Verilog

Reference:
Selected papers
Handouts

Grading:
Machine problems
Term Project
Final Presentation

Contents:
1. Introduction to deep learning
2. Python implementation of the LeNet CNN
3. Optimization Opportunities
Trade-Offs Among Accuracy, Model Size, Speed, Energy
4. Architecture-Level Optimization
Operation Scheduling, Resource Sharing, Data Movement
5. Synthesizable Verilog + FPGA Implementation
6. Comparison with software approaches
7. State-of-the-art hardware accelerators research
8. Term project demo