web,信息检索pcchap2.ppt

web,信息检索pcchap2

Chapter 2 Parallel Programming Platforms Limitation of memory system performance 程序执行的效率不仅依赖于processor 的速度,也依赖内存的速度 A memory system consists of DRAM and multiple levels of catches. Example 2.2 CPU 1 GHz(109), two add units and can execute four instructions in each cycle, add time = 4 G Dram access time 100 ns (= 100 cycles), processor must wait 100 ns before it can process the data A?B= a1b1+a2b2+,…,+anbn, 实际每次取数需100ns, 实际峰值为1G/100=100M ,大大低与运算峰值。解决方法是用catch把block的数据读到快速缓存中。 Strided access 在存储空间上交叉取数据 Dichotomy(二分法) of parallel computing platforms 按控制结构分 SIMD (single instruction, multiple

文档评论(0)

1亿VIP精品文档

相关文档