HDU4920 Matrix multiplication (CPU cache对程序的影响)

来源：互联网发布：淘宝网10至20包邮编辑：程序博客网时间：2024/06/05 06:40

Problem Description

Given two matrices A and B of size n×n, find the product of them.

bobo hates big integers. So you are only asked to find the result modulo 3.

Input

The input consists of several tests. For each tests:

The first line contains n (1≤n≤800). Each of the following n lines contain n integers -- the description of the matrix A. The j-th integer in the i-th line equals A_ij. The next n lines describe the matrix B in similar format (0≤A_ij,B_ij≤10⁹).

Output

For each tests:

Print n lines. Each of them contain n integers -- the matrix A×B in similar format.

Sample Input

10120 12 34 56 7

Sample Output

00 12 1

经典的矩阵乘法因为第三层循环（最内层循环）是对k进行循环，因此b[k][j]是对b逐列进行访问。我们知道内存中二维数组是以行为单位连续存储的，逐列访问将会每次跳1000*4(bytes)。根据cpu cache的替换策略，将会有大量的cache失效。

因此square2.cpp将j循环和k循环交换位置，这样就保证了

c[i][j] += a[i][k] * b[k][j];

这条语句对内存的访问是连续的，增加了cache的命中率，大大提升了程序执行速度。

具体见样例：http://blog.csdn.net/a775700879/article/details/11750703

代码如下：

#include <iostream>#include <cstdio>#include <cstring>using namespace std;const int maxn = 810;int a[maxn][maxn],b[maxn][maxn],c[maxn][maxn];int n;int main(){    while(~scanf("%d",&n)){        int i,j,k;        for(i=0;i<n;i++){            for(j=0;j<n;j++){                scanf("%d",&a[i][j]);                a[i][j]%=3;                c[i][j]=0;            }        }        for(i=0;i<n;i++)            for(int j=0;j<n;j++){                scanf("%d",&b[i][j]);                b[i][j]%=3;            }        for(i=0;i<n;i++)            for(k=0;k<n;k++)                for(j=0;j<n;j++)                    c[i][j]=c[i][j]+a[i][k]*b[k][j];        for(i=0;i<n;i++){            for(j=0;j<n-1;j++)                printf("%d ",c[i][j]%3);            printf("%d\n",c[i][n-1]%3);        }    }    return 0;}

0 0