分 块 本章内容 Cache存储系统Cache性能提高Cache性能降低缺失率编译优化 /* Before */ for (i = 0; i N; i = i+1) for (j = 0; j N; j = j+1) { r = 0; for (k = 0; k N; k = k+1) r = r + y[i][k]*z[k][j]; x[i][j] = r; } X[] i j Y[] i k Z[] k j Newer accesses Older accesses i循环一次,要读取矩阵z的所有N×N个元素,对矩阵y一行中的N个元素进行重复访问,对矩阵x一行中的N个元素进行写操作。 3 之 2 分 块 /* After */ for (jj = 0; jj N; jj = jj+B) for (kk = 0; kk N; kk = kk+B) for (i = 0; i N; i = i+1) for (j = jj; j min(jj+B-1,N); j = j+1) { r = 0; for (k = kk; k min(kk+B-1,N); k = k+1)
原创力文档

文档评论(0)