Scheduling

MOST: Memory Oversubscription-Aware Scheduling for Tensor Migration on GPU Unified Storage

Deep Neural Network (DNN) training demands large memory capacities that exceed the limits of current GPU onboard memory. Expanding GPU …

Junsu Kim, Jaebeom Jeon, Jaeyong Park, Sangun Choi, Minseong Gil, Seokin Hong, Gunjae Koo, Myung Kuk Yoon, Yunho Oh

MOST: Memory Oversubscription-Aware Scheduling for Tensor Migration on GPU Unified Storage

TLP Balancer: Predictive Thread Allocation for Multitenant Inference in Embedded GPUs

This letter introduces a novel software technique to optimize thread allocation for merged and fused kernels in multitenant inference …

Minseong Gil, Jaebeom Jeon, Junsu Kim, Sangun Choi, Gunjae Koo, Myung Kuk Yoon, Yunho Oh

TLP Balancer: Predictive Thread Allocation for Multitenant Inference in Embedded GPUs

Linebacker: Preserving Victim Cache Lines in Idle Register Files of GPUs

Modern GPUs suffer from cache contention due to the limited cache size that is shared across tens of concurrently running warps. To …

Yunho Oh, Gunjae Koo, Murali Annavaram, Won Woo Ro

Linebacker: Preserving Victim Cache Lines in Idle Register Files of GPUs

CTA-Aware Prefetching and Scheduling for GPU

Albeit GPUs are supposed to be tolerant to long latency of data fetch operation, we observe that L1 cache misses occur in a bursty …

Gunjae Koo, Hyeran Jeon, Zhenhong Liu, Nam Sung Kim, Murali Annavaram

Warped-Preexecution: A GPU Pre-Execution Approach for Improving Latency Hiding

This paper presents a pre-execution approach for improving GPU performance, called P-mode (pre-execution mode). GPUs utilize a number …

Keunsoo Kim, Sangpil Lee, Myung Kuk Yoon, Gunjae Koo, Won Woo Ro, Murali Annavaram

Revealing Critical Loads and Hidden Data Locality in GPGPU Applications

In graphics processing units (GPUs), memory access latency is one of the most critical performance hurdles. Several warp schedulers and …

Gunjae Koo, Hyeran Jeon, Murali Annavaram