Course description:

Heterogeneous computing environments that hybrid different types of computation units, such as GPU,
FPGA, and accelerators, have become one of the major architectural designs, not only in the high
performance computing society, but also appearing in commodity computational hardware. The
challenge to software or system development is how to orchestrate those different components to
achieve better performance, power consumption, or other goals. The purpose of this class is to point
out those challenges and to survey current solutions. One or two of modern heterogeneous computing
environments and programming methods will be focused to obtain deeper understandings of those
issues and solutions.

Grade:
1. 50% for 5 programming assignments
2. 50% final project (proposal, presentation, and report)

Tentative schedule
1. Introduction to heterogeneous computing (1 week)
2. CUDA programming language (3 weeks)
2.A. Basic language elements
2.B. Thread, block, and thread
2.C. Parallel algorithms of summation and prefix sum
3. Basic performance optimization techniques (3 weeks)
2.A. Memory access model
2.B. Program execution model
2.C. Performance measurement and tuning tools
2.D. Parallel algorithms of matrix transport and multiplication
4. Advanced performance optimization techniques (4 weeks)
2.A. Data streaming, data compression
2.B. Data structure and algorithms
2.C. Multiple GPU with MPI or MapReduce
2.D. Parallel algorithms of ray-tracing
5. Other accelerators and programming languages (4 weeks)
2.A. CUDA in Matlab, Python, and Fortran, etc
2.B. OpenCL, OpenAcc, etc
2.C. RedDragon, IBM Cell, Intel MIC, FPGA, etc
6. Case studies and project presentations (2 weeks)