
SIMD acronym stands for Single Instruction Multiple Data, a class of parallel computers. SIMD architecture and conditional branch execution problem in GPUs Section VII concludes the paper with the discussion of the results.Ī. Section VI contains information on related works. Section V contains an experimental evaluation of the transformation's effect on a benchmarking application. Section IV describes the proof-of-concept LLVM Transformation Pass implementing the routine. Section Ill introduces the actual transformation routine. In Section II, we describe some concepts used in this work, such as control flow graph and natural loop. In the rest of the Section I we describe the SIMD branching model and the problem of nested loops in greater detail. We develop a proof-of-concept version of the transformation for the LLVM compiler platform and apply it to a benchmark utility designed to simulate nested loops constructs that can be found in real-life applications.

In this paper, we build such a routine for the case of nested loops with data-dependent number of iterations.
Gw basic nested loop programs code#
However, the divergence problem can be solved for some types of branching constructs by applying a corresponding code transformation routine. This inefficiency stems from the fact that SIMD devices serialize execution of divergent branches. SIMD devices, such as GPUs, suffer significant performance loss when running an algorithm that relies on unpredictable conditional jumps. Keywords- Branch Divergence, Control Flow Analysis, LLVM, Nested Loops, SIMD. Depending on the dataset and nested loops parameters, the transformation reduces the worst-case running time of a specialized GPU benchmarking application up to 24 times. The routine is implemented as a Low-Level Virtual Machine (LLVM) Transformation Pass. The transformed program remains logically equivalent to the original one, while its branching pattern becomes better suited for execution on a SIMD device. The routine reduces loop nest level by merging the inner loop with the outer loop.
A specialized compile-time Control Flow Graph (CFG) transformation routine can solve this problem. Due to inefficient execution of divergent branches, SIMD devices can lose performance on nested loops with data-dependent exit conditions. Flattening of Data-Dependent Nested Loops for Compile-Time Optimization of GPU ProgramsĪbstract - Modern Graphics Processing Units (GPUs) belong to the "Single Instruction Multiple Data" (SIMD) computational architecture class.
