HDU4920 Matrix multiplication (CPU cache对程序的影响)
来源:互联网 发布:淘宝网10至20包邮 编辑:程序博客网 时间:2024/06/05 06:40
Problem Description
Given two matrices A and B of size n×n, find the product of them.
bobo hates big integers. So you are only asked to find the result modulo 3.
bobo hates big integers. So you are only asked to find the result modulo 3.
Input
The input consists of several tests. For each tests:
The first line contains n (1≤n≤800). Each of the following n lines contain n integers -- the description of the matrix A. The j-th integer in the i-th line equals Aij. The next n lines describe the matrix B in similar format (0≤Aij,Bij≤109).
The first line contains n (1≤n≤800). Each of the following n lines contain n integers -- the description of the matrix A. The j-th integer in the i-th line equals Aij. The next n lines describe the matrix B in similar format (0≤Aij,Bij≤109).
Output
For each tests:
Print n lines. Each of them contain n integers -- the matrix A×B in similar format.
Print n lines. Each of them contain n integers -- the matrix A×B in similar format.
Sample Input
10120 12 34 56 7
Sample Output
00 12 1
经典的矩阵乘法因为第三层循环(最内层循环)是对k进行循环,因此b[k][j]是对b逐列进行访问。我们知道内存中二维数组是以行为单位连续存储的,逐列访问将会每次跳1000*4(bytes)。根据cpu cache的替换策略,将会有大量的cache失效。
因此square2.cpp将j循环和k循环交换位置,这样就保证了
c[i][j] += a[i][k] * b[k][j];
这条语句对内存的访问是连续的,增加了cache的命中率,大大提升了程序执行速度。
具体见样例:http://blog.csdn.net/a775700879/article/details/11750703
代码如下:
#include <iostream>#include <cstdio>#include <cstring>using namespace std;const int maxn = 810;int a[maxn][maxn],b[maxn][maxn],c[maxn][maxn];int n;int main(){ while(~scanf("%d",&n)){ int i,j,k; for(i=0;i<n;i++){ for(j=0;j<n;j++){ scanf("%d",&a[i][j]); a[i][j]%=3; c[i][j]=0; } } for(i=0;i<n;i++) for(int j=0;j<n;j++){ scanf("%d",&b[i][j]); b[i][j]%=3; } for(i=0;i<n;i++) for(k=0;k<n;k++) for(j=0;j<n;j++) c[i][j]=c[i][j]+a[i][k]*b[k][j]; for(i=0;i<n;i++){ for(j=0;j<n-1;j++) printf("%d ",c[i][j]%3); printf("%d\n",c[i][n-1]%3); } } return 0;}
0 0
- HDU4920 Matrix multiplication (CPU cache对程序的影响)
- cpu cache对程序性能的影响
- HDU4920 Matrix multiplication 矩阵
- hdu4920 Matrix multiplication
- HDU4920:Matrix multiplication
- hdu4920 Matrix multiplication
- hdu4920 Matrix multiplication [矩阵乘法 压位 ]
- HDU4920:Matrix multiplication(思维 & bitset)
- 以矩阵乘法为例,了解cpu cache对程序性能的影响
- 以矩阵乘法为例 了解cpu cache对程序性能的影响
- cache line 对程序性能的影响
- Cache 结构对程序性能的影响
- hdu4920 Matrix multiplication 2014 Multi-University Training Contest 5
- HDU 4920 Matrix multiplication (Cache命中的优化?)
- cpu对编程的影响
- Cpu Cache对程序性能分析
- hdu4920——Matrix multiplication(矩阵快速幂or循环外提)
- 如何消除CACHE对DMA的影响?
- hdu 1171 Big Event
- Linux源码学习笔记:syscalls
- 一种快速的无监督的向量化方法做地标识别
- association 的使用
- ActiveX组件开发和使用
- HDU4920 Matrix multiplication (CPU cache对程序的影响)
- Python中的条件选择和循环语句
- rman configure命令
- mysql由浅入深视频-有高可用架构、调优、排错等
- const使用及问题总结
- 匹配一个2个相邻并且相同的字符
- LA 4329 Ping pong乒乓比赛【树状数组】
- 找朋友
- java播放声音