用多线程并发的方式来计算两个矩阵的乘法
来源:互联网 发布:网络零售商有哪些 编辑:程序博客网 时间:2024/04/25 01:03
要求很简单,计算两个矩阵的乘法。为了加速,这里面使用了pthread库,来并发计算。
基本思路如下图。
比如用两个线程来计算。矩阵A * B。那么就把A分成两份。比如下图,就是0,2,4和1,3,5这两份。
在线程1中计算第0,2,4行和B个列的乘积,在线程2中计算1,3,5行和B各个列的乘积。
思路很简单。最后代码如下:
// pthread.cpp : Defines the entry point for the console application.//#include <stdlib.h>#include "pthread.h"#include <x86intrin.h>#include <xmmintrin.h>int THREADS_COUNT = 4;pthread_t threads[16];int PRINT = 0;void test(int dim);int * multiplyPthread(int * a, int* b, int dim);int * multiplySimple(int * a, int* b, int dim);int mul(int* a, int* b, int row, int col, int dim);struct THREAD_PARAM { int * a; int * b; int * buffer; int dim; int index; int step;};struct THREAD_PARAM params[16];void output(int * buf, int dim) { for (int i=0; i<dim; i++) { for (int j=0; j<dim; j++) { printf("%d ", buf[i * dim + j]); } printf("\n"); } printf("============\n");}int main(int argc, char* argv[]){ int size[] = {4, 16, 32, 48, 64, 128, 256, 512, 1024, 2048, 4096}; for (int i=0; i<8; i++) { test(size[i]); } return 0;}void test(int dim) { int * a = (int*)malloc(dim * dim * sizeof(int)); for (int i=0; i<dim; i++) { for (int j=0; j<dim; j++) { a[i * dim + j] = i * j + 1; } } int * b = (int*)malloc(dim * dim * sizeof(int)); for (int i=0; i<dim; i++) { for (int j=0; j<dim; j++) { b[i * dim + j] = i * j + 2; } } struct timeval start0,start1,start2; struct timeval end0,end1,end2; unsigned long diff0, diff1,diff2; gettimeofday(&start0,NULL); int * result = multiplySimple(a, b, dim); gettimeofday(&end0,NULL); gettimeofday(&start1,NULL); int * resultPthread = multiplyPthread(a, b, dim); gettimeofday(&end1,NULL); // gettimeofday(&start2,NULL); // int * resultPthreadSSE = multiplyPthread(a, b, dim, 1); // gettimeofday(&end2,NULL); if (PRINT) { output(a, dim); output(b, dim); output(result, dim); output(resultPthread, dim); // output(resultPthreadSSE, dim); } diff0 = 1000000 * (end0.tv_sec-start0.tv_sec)+ end0.tv_usec-start0.tv_usec; diff1 = 1000000 * (end1.tv_sec-start1.tv_sec)+ end1.tv_usec-start1.tv_usec; //diff2 = 1000000 * (end2.tv_sec-start2.tv_sec)+ end2.tv_usec-start2.tv_usec; printf("(%d) the difference for simple is %ld\n", dim, diff0); printf("(%d) the difference for threaded is %ld\n", dim, diff1); //printf("(%d) the difference for threaded with SSE is %ld\n", dim, diff2); free(result); free(resultPthread); free(a); free(b);}int * multiplySimple(int* a, int* b, int dim) { int * result = (int*)malloc(dim * dim * sizeof(int)); int sum = 0; for (int i=0; i<dim; i++) { for (int j=0; j<dim; j++) { sum = 0; for (int k=0; k<dim; k++) { sum += (a[dim * i +k] * b[dim * k + j]); } result[dim * i + j] = sum; } } return result;}void *Calculate(void *param) { struct THREAD_PARAM *p = (struct THREAD_PARAM*)param; int dim = p->dim; int index = p->index; int *a = p->a; int *b = p->b; int *result = p->buffer; int step = p->step; int sum = 0; for (int i=index; i<dim; i+=step) { for (int j=0; j<dim; j+=1) { sum = 0; // int sum = mul(a, b, i, j, dim); for (int k=0; k<dim; k++) { // printf("cal %d, %d, %d\n", i, j, k); sum += (a[dim * i +k] * b[dim * k + j]); } result[dim * i + j] = sum; } } pthread_exit(NULL); return 0;}int * multiplyPthread(int* a, int* b, int dim) { int * result = (int*)malloc(dim * dim * sizeof(int)); for (int i=0; i<THREADS_COUNT; i++) { params[i].buffer = result; params[i].index = i; params[i].dim = dim; params[i].step = THREADS_COUNT; params[i].a = a; params[i].b = b; int rc = pthread_create(&threads[i], NULL, Calculate, (void *)(¶ms[i])); } for(int t=0; t<THREADS_COUNT; t++) { void* status; int rc = pthread_join(threads[t], &status); if (rc) { printf("ERROR; return code from pthread_join() is %d\n", rc); } // printf("Completed join with thread %d status= %ld\n",t, (long)status); } return result;}编译参数:
# clang -O1 -lpthread -Wall mul.c
运行环境:
CentOS7, 4核的CPU,所以这里开了4个线程。
运行结果分析。
当矩阵的大小比较小的时候,普通的矩阵乘法比多线程的算法快得多。这也是可以理解的,因为创建线程需要一定的时间。
当矩阵的大小为64时,多线程的时间和普通单线程的时间基本上相同。
当矩阵的大小大于64时,多线程的时间明显好于单线程。
当矩阵的大小大于256时,多线程的性能达到单线程的4倍左右,很理想:
(4) the difference for simple is 1
(4) the difference for threaded is 420
(16) the difference for simple is 10
(16) the difference for threaded is 117
(32) the difference for simple is 63
(32) the difference for threaded is 151
(48) the difference for simple is 177
(48) the difference for threaded is 196
(64) the difference for simple is 379
(64) the difference for threaded is 329
(128) the difference for simple is 4456
(128) the difference for threaded is 2376
(256) the difference for simple is 40366
(256) the difference for threaded is 10581
(512) the difference for simple is 387046
(512) the difference for threaded is 97153
- 用多线程并发的方式来计算两个矩阵的乘法
- 手把手教你用Execel计算两个矩阵的乘法
- 从矩阵乘法的不同计算方式来看局部性原理
- 矩阵乘法的并行计算
- 矩阵乘法的多线程实现
- 实现两个N*N矩阵的乘法
- 两个N*N矩阵的乘法
- Hadoop 稀疏矩阵乘法的MapReduce计算
- 计算矩阵运算的乘法次数
- 矩阵乘法的计算和来源
- 矩阵乘法的MPI并行计算
- 计算两个矩阵的乘积
- 计算两个矩阵的乘积
- 计算两个矩阵的乘积
- 动态规划 的方法求矩阵乘法的最少计算加括号方式
- 矩阵乘法的四种理解方式
- 矩阵乘法的四种理解方式
- 矩阵乘法的四种理解方式
- mac修改host文件,让你的mac轻松上google
- 作业
- 打桩法与普通debug工具的优劣及使用打桩法的技巧
- 清除所有的mobileprovision
- Java学习之Iterator(迭代器)的一般用法 (转)
- 用多线程并发的方式来计算两个矩阵的乘法
- LeetCode 之 Count and Say
- Andorid获取状态栏高度的三种方法
- RS485基本概念及可靠性设计
- 弄懂Android手机、pad适配的dp与px
- 玩转云端(云服务器使用详解)
- javascript模板引擎——artTemplate
- FLEX 网格布局及响应式处理
- jquery.validate ie8 验证提交问题