字符串匹配Sunday算法实现

来源:互联网 发布:sql的查询语句怎么写 编辑:程序博客网 时间:2024/05/18 05:58

  今天稍稍度娘了一下匹配字符串的算法,主要搜索结果还是源自CSDN。其中一篇《字符串匹配算法总结》介绍了好几种字符串匹配算法,写的非常好。我挑了其中作者也比较推崇,比较新,效能比较好,算法又比较简单的Sunday算法,稍加揣摩,然后发现贴的实现代码,居然和算法描述的从右往左比对是冲突的,因此我断定贴的代码是错的。仔细看就能发现从左向右比对,很容易会跳过一种匹配,如从a1a1a2寻找a1a2,就会匹配失败(?也许也是对的)。

  下面是我做的实现,融入到我的‘类string’类的find实现当中了。对于其他网贴的实现方式略有调整。对于JAssert,data_->len() 这种未解释的小东西,相信读者您都明白是啥。

//#define SupportLongSubString#ifdef SupportLongSubString  //用于支持较长的sub字符串    typedef unsigned short OCCType;#else    typedef unsigned char OCCType;#endifsize_t JinString::find(const char *sub, size_t startPos){    if(!data_ || data_->len()<startPos)return STRNPOS;    ///[Sun 90] D.M. Sunday: A Very Fast Substring Search Algorithm. Communications of the ACM, 33, 8, 132-142 (1990)    /// http://www-igm.univ-mlv.fr/~lecroq/string/    /// http://www.iti.fh-flensburg.de/lang/algorithmen/pattern/sundayen.htm    /// http://blog.csdn.net/force_eagle/article/details/10340907    ///[Sunday算法]根据网上的算法,纠错、调整、改进    //预处理和strlen一起做,原本算法这个数组是要预计算移位数,但是我不先计算长度    //所以重新约定occ[]值v含义:len-(v-1)= len-v+1 = 需移动的字节数. v=0表示无法匹配需要移动len+1字节    OCCType occ[256];    memset(occ,0,256);    int subLen=0;    unsigned char ch;    while((ch = (unsigned char)sub[subLen])){        occ[ch] = (OCCType)++subLen;    }    JAssert(subLen<(1<<(sizeof(OCCType)*8)));  //断言subString长度不可越界    if(subLen+startPos>data_->len())return STRNPOS;    for(int matchAt=1;matchAt<=subLen;matchAt++)    {        if(data_->buf()[startPos + subLen - matchAt]                != sub[subLen - matchAt])        {  //如果不匹配,按算法移动startPos重新匹配           startPos += subLen - occ[(unsigned char) (data_->buf()[startPos + subLen])] + 1;           if(subLen+startPos>data_->len())return STRNPOS;           matchAt = 0;        }    }    return startPos;}

  代码通过了以下的测试:

void TestJinString::unitTest(){    JinString strs,strm,strjzy;    strs = "abcdefghijklmn";    strm = "MMMMM";    strjzy = "jzyajzybjzyajzyz";    JinString subjzy("jzyajzyz");    //JinString str("abcdefghijklmnMMMMMjzyajzybjzyajzyzMMMMMabcdefghijklmn")    JinString str;    str += strs;    str = str + strm + strjzy + strm + strs;    JAssert(str.find(subjzy) ==  strs.length() + strm.length() + 8);    JAssert(str.find(strs) == 0);    JAssert(str.find(strm) == strs.length());    JAssert(str.find(strs,str.find(strm)) == strs.length() + strm.length()*2 + strjzy.length());    JAssert(str.find("jzyajzybjzyc")==STRNPOS);}

文章首发地址 http://blog.csdn.net/d0cking/article/details/40689685 也是俺写滴……

0 0
原创粉丝点击