Course description:

Heterogeneous computing environments that hybrid different types of computation
units, such as GPU, FPGA, and accelerators, have become one of the major
architectural designs, not only in the high performance computing society, but
also appearing in commodity computational hardware. The challenge to software or
system development is how to orchestrate those different components to achieve
better performance, power consumption, or other goals. The purpose of this class
is to point out those challenges and to survey current solutions. One or two of
modern heterogeneous computing environments and programming methods will be
focused to obtain deeper understandings of those issues and solutions.

Grade:
1. 50% for 5 programming assignments
2. 50% final project (proposal, presentation, and report)

Tentative schedule
1. Introduction to heterogeneous computing (1 week)
2. CUDA programming language (3 weeks)
2.A. Basic language elements
2.B. Thread, block, and thread
2.C. Parallel algorithms of summation and prefix sum
3. Basic performance optimization techniques (3 weeks)
2.A. Memory access model
2.B. Program execution model
2.C. Performance measurement and tuning tools
2.D. Parallel algorithms of matrix transport and multiplication
4. Advanced performance optimization techniques (4 weeks)
2.A. Data streaming, data compression
2.B. Data structure and algorithms
2.C. Multiple GPU with MPI or MapReduce
2.D. Parallel algorithms of ray-tracing
5. Other accelerators and programming languages (4 weeks)
2.A. CUDA in Matlab, Python, and Fortran, etc
2.B. OpenCL, OpenAcc, etc
2.C. RedDragon, IBM Cell, Intel MIC, FPGA, etc
6. Case studies and project presentations (2 weeks)