HDU4920 Matrix multiplication (CPU cache对程序的影响)

来源:互联网 发布:淘宝网10至20包邮 编辑:程序博客网 时间:2024/06/05 06:40
Problem Description
Given two matrices A and B of size n×n, find the product of them.

bobo hates big integers. So you are only asked to find the result modulo 3.
 

Input
The input consists of several tests. For each tests:

The first line contains n (1≤n≤800). Each of the following n lines contain n integers -- the description of the matrix A. The j-th integer in the i-th line equals Aij. The next n lines describe the matrix B in similar format (0≤Aij,Bij≤109).
 

Output
For each tests:

Print n lines. Each of them contain n integers -- the matrix A×B in similar format.
 

Sample Input
10120 12 34 56 7
 

Sample Output
00 12 1


经典的矩阵乘法因为第三层循环(最内层循环)是对k进行循环,因此b[k][j]是对b逐列进行访问。我们知道内存中二维数组是以行为单位连续存储的,逐列访问将会每次跳1000*4(bytes)。根据cpu cache的替换策略,将会有大量的cache失效。

因此square2.cpp将j循环和k循环交换位置,这样就保证了

c[i][j] += a[i][k] * b[k][j];

这条语句对内存的访问是连续的,增加了cache的命中率,大大提升了程序执行速度。

具体见样例:http://blog.csdn.net/a775700879/article/details/11750703

代码如下:

#include <iostream>#include <cstdio>#include <cstring>using namespace std;const int maxn = 810;int a[maxn][maxn],b[maxn][maxn],c[maxn][maxn];int n;int main(){    while(~scanf("%d",&n)){        int i,j,k;        for(i=0;i<n;i++){            for(j=0;j<n;j++){                scanf("%d",&a[i][j]);                a[i][j]%=3;                c[i][j]=0;            }        }        for(i=0;i<n;i++)            for(int j=0;j<n;j++){                scanf("%d",&b[i][j]);                b[i][j]%=3;            }        for(i=0;i<n;i++)            for(k=0;k<n;k++)                for(j=0;j<n;j++)                    c[i][j]=c[i][j]+a[i][k]*b[k][j];        for(i=0;i<n;i++){            for(j=0;j<n-1;j++)                printf("%d ",c[i][j]%3);            printf("%d\n",c[i][n-1]%3);        }    }    return 0;}


0 0