Warped-MC: An Efficient Memory Controller Scheme for Massively Parallel Processors

The performance of GPU’s external memories is becoming more critical since a modern GPU runs thousands of concurrent threads that demand a huge volume of data. In order to utilize resources in the memory hierarchy more efficiently, GPU employs a memory coalescing scheme to reduce the number of demand requests created from a group of threads (i.e. a warp). However, GPU’s memory coalescing does not work well for applications that exhibit irregular memory access patterns, thus a single warp can generate multiple memory transactions. Since memory requests are serviced by different hierarchy levels and/or memory partitions, multiple outstanding requests from a single warp exhibit diverged fetch latency. Considering the execution time of a load warp is decided by the slowest memory transaction, the diverged memory latency within a warp is a critical performance factor for load warps. In this paper, we propose a warp-aware memory controller scheme, called Warped-MC, to mitigate the memory latency divergence issues. Based on the in-depth analysis, we reveal the memory latency divergence within a warp is mainly caused by GPU memory controllers. While the conventional FR-FCFS memory controller can maximize the effective bandwidth of DRAM channels, the scheduling scheme of the conventional memory controller can exacerbate the memory latency divergence of a warp. Warped-MC employs a warp-aware scheduling scheme to alleviate the memory latency divergence, thus Warped-MC can tackle the long tail of the load warp execution time to improve the performance of memory-intensive applications. We implement Warped-MC on GPGPU-Sim configured with the modern GPU architecture, and our evaluation results exhibit Warped-MC can improve the performance of memory-intensive applications by 8.9% on average with a maximum of 45.8%.

International Conference on Parallel Processing (ICPP)
