Course Description
New commodity parallel computing devices, including Graphics processing units (GPUs) and IBM Cell processor, bring the originally elite high performance computing into the reach of general public. To program and accelerate applications on the new high performance computing devices, we must understand both the computational architecture and the principles of program optimization. This course discusses GPU and IBM Cell hardware, as well as concepts and techniques for optimizing general purpose computing on the new architectures.
Modern GPUs are high-performance parallel computing devices. Floating point performance of these devices has far outpaced that of conventional CPUs, stimulating interest in general purpose computing using these devices. However, General Purpose Computing on GPU (GPGPU) has traditionally been hard, requiring a mix of graphics specific languages and unfamiliar programming paradigms. To lower the barrier of entry, NVIDIA and AMD has released CUDA (Compute Unified Driver Architecture) and CTM (Close To Metal), both of which are C APIs for programming GPUs. This course will introduce the fundamental organization of GPU hardware, discuss the architectures and the program models as defined by CUDA or CTM, and compare the new architectures with general-purpose processors.
In addition to the discussion of hardware and architecture of GPUs, another major theme of this course is the program optimization techniques. We will discuss general optimization concepts include mapping an algorithm to the hardware's computational resources, efficiently utilizing the unconventional memory hierarchy, and profiling and optimizing a design. Moreover, this course emphasizes using the CUDA and CTM to program and optimize computationally demanding applications on the new platforms.
The coursework will include homework, exams, and a final project. Students must have a strong understanding of programming in C or C++ to do well in this course. By taking this course, students will deepen understanding of the interaction between software and hardware, and gain hand-on experience of using and designing cutting-edge technology in the frontier of high performance computing.
Course Information
Course Number: ELEG 455/655
Course Title: Programming Modern Graphics Cards
Time: MWF 10:10am-11:00am
Location: 122 Sharp Lab
Prerequisites: Computer architecture. Parallel programming is recommended but not required
Instructor Information
Instructor:
- Xiaoming Li <xli@ece.udel.edu>
Office Hours:Mon/Wed
11:00am-12:00pm and by appointment
Office Location:308 DuPont
Text Books
No required text books. However, the following books are recommended and will help your projects.
- GPU Gems 3 --- by Hubert Nguyen (Chapter 29 to Chapter 41)
Topics Covered
- Parallel programming (overview)
- Program optimization (overview)
- Numerical analysis (overview)
- Graphics Processing Unit (GPU) hardware and architecture
- CUDA, CTM and Cell
- CUDA programming model
- Applying GPUs to general-purpose computing
- Performance measurement
- Application optimization for GPU
Grading
- Midterm Exam (15%): This exam will test all the concepts learned in the first half of the semester including parallel computing, numerical analysis, and FPGA, GPU, and Cell programming. The exam will be closed book and closed notes.
- Final Exam (15%): This exam will test all the concepts learned in the whole semester including parallel computing, numerical analysis, and FPGA, GPU, and Cell programming. The exam will be closed book and closed notes.
- Individual Projects (20%): There will be 2-3 individual projects in the first half of the class. The goals for the projects are for the student to train extensively with program optimization techniques and practice CUDA and Cell programming skills.
- Paper Presentation (5%): Midway through the semester each group will give a presentation on a research paper that is closely related to the group's project. In this presentation, you are also expected to discuss the background of the chosen project topic.
- Project Proposal and Design Presentation (10%): Midway through the semester each group will provide a proposal for their project. This will be an 8-10 page document describing the algorithm they will be implementing, their approach for solving the problem, the expected results and the strategy for evaluating both the correctness and the performance of your implementation. In particular, the proposal should describe in detail what design choices are made and the justifications for the choices. It is understood that the proposal only represents the initial design of the project and the design may change in the process of the project. However, you should demonstrate a principled and well-thought-out approach when you make design choices.The Design Presentation will present your project design proposal.
- Final Project (30%): Students will perform a final project as described below. The grade for this project will be based on the project implementation, final reports and accompanying final presentation.
- Peer Evaluation (5%): At the end of the semester, every team member will review the contribution and the effort of every other team members. The peer reviews contribute 5% of the final grade.
Late Policy
Late submission will be penalized on an hourly scheme.
Up to 1 hour late, -15%
Up to 2 hours late, -40%
Up to 3 hours late, -70%
Zero grade after 3 hours.
Project
The class will break up into groups of two to implement a single computationally intense algorithm on a GPU. The algorithm must be approved by the instructor. Specific deliverables for this project are as follows:
- An 8-10 page document describing the algorithm they will be implementing, their approach for solving the problem, and the expected results (due at the midterm of the semester)
- A presentation describing the problem they are solving and their initial design (due at the midterm of the semester)
- The final project will consist of a report documenting the project including the design decisions made, results, and a discussion of alternative approaches. This will be provided as a written document and an accompanying presentation (due near the end of the semester)
Suggested Project Topics
Application tuning:
- Eigenvalue solver
- Blocked LU Decomposition
- Ray Tracing
- Discrete Cosine Transform (DCT)
- Weiner Filtering
- Finite Difference (heat, static EM)
- QR Decomposition
Architecture research:
- Micro-benchmarks for GPU
- Performance models for GPU