ECE498ALProgrammingMassivelyParallel.ppt

下载文档 降价啦

4
0
约1.21千字
约 14页
2017-04-30 发布于天津
举报
版权申诉
保障服务

ECE498ALProgrammingMassivelyParallel.ppt

1、本文档共14页，可阅读全部内容。
2、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

ECE 498ALProgramming Massively Parallel ProcessorsLecture 6: CUDA MemoriesPart 2;Md;Pd1,0;Every Md and Nd Element is used exactly twice in generating a 2X2 tile of P;Pd1,0;Each phase of a Thread Block uses one tile from Md and one from Nd;Tiled Matrix Multiplication Kernel;CUDA Code – Kernel Execution Configuration;First-order Size Considerations in G80;Md;G80 Shared Memory and Threading;Tiled Matrix Multiplication Kernel;Tiling Size Effects;Global variables declaration __host__ __device__... __global__, __constant__, __texture__ Function prototypes __global__ void kernelOne(…) float handyFunction(…) Main () allocate memory space on the device – cudaMalloc(d_GlblVarPtr, bytes ) transfer data from host to device – cudaMemCpy(d_GlblVarPtr, h_Gl…) execution configuration setup kernel call – kernelOneexecution configuration( args… ); transfer results from device to host – cudaMemCpy(h_GlblVarPtr,…) optional: compare against golden (host computed) solution Kernel – void kernelOne(type args,…) variables declaration - __local__, __shared__ automatic variables transparently assigned to registers or local memory syncthreads()… Other functions float handyFunction(int inVar…);