Programming Graphics Processing Units for High Performance Computing
The next Information Research Group meeting will be in E11_A3 - Arts Lecture Theatre 3 from 1 to 2pm on Friday, 30th October 2009. The focus will be on programming Graphics Processing Units for High Performance Computing.
The Graphics Processing Unit (GPU) is a recent computer hardware development providing high performance graphics rendering. These highly parallel processors are now being applied to computationally intensive problems. The GPU architecture is well suited to parallel computing with minimal inter-process communication. Tasks are divided into multiple threads that run independently on single processing elements in the same way that regions of an image are rendered.
Now that GPUs have evolved into fully programmable devices they have become an ideal resource for acceleration of many arithmetic and memory bandwidth intensive scientific applications. GPUs are typically composed of groups of single-instruction multiple-thread processing units. Parallel machines in the past failed to achieve their full performance potential due to memory access conflicts and divergence of execution paths for conditional execution operations. GPU design ameliorates these problems by using hardware multithreading, clusters of small processing units and virtualized processors.
The CUDA development environment makes programming GPUs accessible to a wide group of users. Thread allocation and memory management is simplified, code can be read from arbitrary addresses in memory and fast shared memory used as a user-managed cache, enabling higher bandwidth. The CUDA programming model is based on the decomposition of work into grids and thread blocks. Grids decompose a large problem into thread blocks which are concurrently executed by the pool of available multiprocessors. Each thread block contains from 64 to 512 threads, which are concurrently executed by the processors within a single multiprocessor. Each thread block is computed by running a group of threads, known as a warp, in lockstep on the multiprocessor.
At the meeting we will look at coding practical algorithms for high performance implementation and identify problems that will be developed further at the UNE Summer CUDA Code Camp.
