面向GPU数据流程序并行优化-计算机应用技术专业毕业论文.docxVIP

  • 16
  • 0
  • 约4.29万字
  • 约 52页
  • 2019-05-08 发布于上海
  • 举报

面向GPU数据流程序并行优化-计算机应用技术专业毕业论文.docx

华 华 中 科 技 大 学 硕 士 学 位 论 文 II万方数据 II 万方数据 Abstract Graphics processing unit (GPU) has low price but is powerful in computing capacity, these characteristics make it increasingly popular in the field of high performance computing in recent years. Programming languages on GPU such as CUDA and OpenCL make the GPU programming widely accepted, but the GPU programming is still a very complex task for two reasons: On the one hand, design an algorithm for a specific GPU is very time-consuming, and requires programmers to be very familiar with GPU algorithm itself and also the underlying architecture. On the other, these codes are lack of portability. The performance that codes gains on different GPU platform may be very different. Codes can gain a high performance on a GPU, but it is almost impossible to maintain the performance on another one, and often need to be modified to achieve the goal. To reduce the difficulty of GPU programming and make the codes portable, we propose a frame that can map the dataflow programs to GPU efficiently. The input of this frame are the COStream codes, the output are OpenCL codes which are parallel optimized for the GPU. The system will optimize the programs in two levels: software pipelining scheduling and optimize for dataflow program according to the characteristics of the GPU platform. METIS will be used to finish the partition of the task, which takes into account the load balancing and communication overhead. In the optimization process, we introduce a variable called expansion factor, which can not only make full use of GPU computing resources, but also can effectively reduce the number of synchronization between threads, thus reduce the synchronization overhead. We can use pinned memory to accelerate the data transfer between the CPU and the GPU. Overlap the compute and communicate efficiently will be helpful for the performance. In addition, the rational use of local memory is not only able to access to the global memory in a coale

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档