使用GPU进行字符串匹配--cuda编程实现

来源：互联网发布：c语言里n次方编辑：程序博客网时间：2024/06/05 05:24

在csdn上下载了一个使用GPU进行字符串匹配的demo，但是运行过程中遇到了一些问题。以下记录一下解决问题的过程。
（下载地址http://download.csdn.net/download/lllmcy/2585869）

cuda程序的后缀为.cu，编译时使用nvcc，其使用方法与gcc相似。例如nvcc test.cu -o test
nvcc的官方文档http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#axzz4fRUn88M1

1。首先遇到错误
fatal error: cutil.h: No such file or directory
查到有人说在sample/common/inc中可以找到，但是我并没有找到该文件。网上所有尝试找cutil文件的回答都不适应。
自从cuda5.0以后，就删除了cutil.h，官方不在后续版本中兼容cutil.h，据说这个文件是可以下载到，但是我并没有这么干，官方删掉自然有删掉的道理，而且其他函数中肯定实现了相似的功能。
我直接将cutil.h注释掉了，于是

kmp.cu(62): error: identifier "cutCreateTimer" is undefinedkmp.cu(63): error: identifier "cutStartTimer" is undefinedkmp.cu(73): error: identifier "cutStopTimer" is undefinedkmp.cu(74): error: identifier "cutGetTimerValue" is undefinedkmp.cu(75): error: identifier "cutDeleteTimer" is undefined5 errors detected in the compilation of "/tmp/tmpxft_00001b6b_00000000-9_kmp.cpp1.ii".

因此可以基本判定，cutCreateTimer、cutStartTimer、cutStopTimer、cutGetTimerValue、cutDeleteTimer这几个函数应该是在cutil.h中支持的。经过查询，这些的确是cutil.h支持的。而这些函数的功能是记录程序的运行时间，因此可以寻找cuda中其他记录时间的方法。
查询发现event可以做到，使用方法可以参考官方API：http://docs.nvidia.com/cuda/index.html#axzz4fRUn88M1
programing guide -> 3.2.5.6 Events
我的使用方法如下

cudaEvent_t start, stop;cudaEventCreate(&start);cudaEventCreate(&stop);dim3 block(16,16);dim3 grid(t_len/256+1,1);cudaEventRecord(start, 0);kmp_kernel<<<grid,block>>>(dd,d_len,t_len,d_text,d_num);cudaMemcpy(num,d_num,sizeof(int),cudaMemcpyDeviceToHost);cudaEventRecord(stop, 0);//synchronizecudaEventSynchronize(start); //optionalcudaEventSynchronize(stop); //wait for the event to be executed!//calculate timefloat dt_ms;cudaEventElapsedTime(&dt_ms, start, stop);printf("GPU processing time: %f (ms)\n", dt_ms);cudaEventDestroy(start);cudaEventDestroy(stop);

以上参考：http://blog.csdn.net/jdhanhua/article/details/4843653
未定义的错误解决了

2。然后又遇到编译错误

/tmp/tmpxft_00001c27_00000000-29_kmp_kernel.o: In function `__device_stub__Z10kmp_kernelPciiS_Pi(char*, int, int, char*, int*)':tmpxft_00001c27_00000000-9_kmp_kernel.cudafe1.cpp:(.text+0x63): multiple definition of `__device_stub__Z10kmp_kernelPciiS_Pi(char*, int, int, char*, int*)'/tmp/tmpxft_00001c27_00000000-21_kmp.o:tmpxft_00001c27_00000000-4_kmp.cudafe1.cpp:(.text+0x267): first defined here/tmp/tmpxft_00001c27_00000000-29_kmp_kernel.o: In function `kmp_kernel(char*, int, int, char*, int*)':tmpxft_00001c27_00000000-9_kmp_kernel.cudafe1.cpp:(.text+0x13c): multiple definition of `kmp_kernel(char*, int, int, char*, int*)'/tmp/tmpxft_00001c27_00000000-21_kmp.o:tmpxft_00001c27_00000000-4_kmp.cudafe1.cpp:(.text+0x340): first defined herecollect2: error: ld returned 1 exit status

这是因为kmp_kernel被重复定义了
我有三个文件kmp.cu、kmp_kernel.cu、test_file.h，并且kmp.cu中include了后两个文件，编译命令是

nvcc kmp.cu kmp_kernel.cu -o test

于是报错。

include了在编译时就不需要加上该文件名了
在编译时加上文件名就不需要再include了

参考：http://stackoverflow.com/questions/27446690/getting-multiple-definition-errors-with-simple-device-function-in-cuda-c

3。使用cpu运行字符串匹配程序与GPU字符串匹配进行对比
cpu版本的单线程实现程序如下：

#include <stdio.h>#include <string.h>#include <time.h>char text[] = "写上一大堆文本当做查询库";    //字符串库int main(){    int n=0;    char d[50];             //输入用户需要查询的字符串    printf("dst:");    scanf("%s", &d);    int d_num = strlen(d);    int t_num = strlen(text);    int i =0, j=0;    struct timespec tpstart; //记录开始时间    struct timespec tpend;   //记录结束时间    long timedif;            //记录运行使用时间    clock_gettime(CLOCK_MONOTONIC, &tpstart);    while(j<d_num && i<t_num){        if(j<0 || d[j]==text[i]){            i++;            j++;        }else{            j=0;            i++;            continue;        }        if(j>=d_num){            j=0;            n ++;        }    }    clock_gettime(CLOCK_MONOTONIC, &tpend);    timedif = 1000*(tpend.tv_sec-tpstart.tv_sec) + (tpend.tv_nsec - tpstart.tv_nsec)/1000000;    printf("the resule is: %d\n", n);    printf("time: %ldms\n", timedif);    return 0;}

将以上部分添加到主程序即可。

我的最终版本的程序就不公布了，毕竟主要的部分是下载的别人的。下载链接已经在文章开始给出。

0 0