Computer System Architecture Lab
Computer System Architecture Lab
Home
News
Members
Publications
Research
Gallery
Contact
Light
Dark
Automatic
Scheduling
Linebacker: Preserving Victim Cache Lines in Idle Register Files of GPUs
Modern GPUs suffer from cache contention due to the limited cache size that is shared across tens of concurrently running warps. To …
Yunho Oh
,
Gunjae Koo
,
Murali Annavaram
,
Won Woo Ro
PDF
Cite
Project
DOI
Custom Link
CTA-Aware Prefetching and Scheduling for GPU
Albeit GPUs are supposed to be tolerant to long latency of data fetch operation, we observe that L1 cache misses occur in a bursty …
Gunjae Koo
,
Hyeran Jeon
,
Zhenhong Liu
,
Nam Sung Kim
,
Murali Annavaram
PDF
Cite
Project
Slides
DOI
Slide Show
Warped-Preexecution: A GPU Pre-Execution Approach for Improving Latency Hiding
This paper presents a pre-execution approach for improving GPU performance, called P-mode (pre-execution mode). GPUs utilize a number …
Keunsoo Kim
,
Sangpil Lee
,
Myung Kuk Yoon
,
Gunjae Koo
,
Won Woo Ro
,
Murali Annavaram
PDF
Cite
Project
DOI
Custom Link
Revealing Critical Loads and Hidden Data Locality in GPGPU Applications
In graphics processing units (GPUs), memory access latency is one of the most critical performance hurdles. Several warp schedulers and …
Gunjae Koo
,
Hyeran Jeon
,
Murali Annavaram
PDF
Cite
Project
DOI
Custom Link
Cite
×