Opencup 010352 Problem 5. Autocomplete 哈希+bitset

来源:互联网 发布:农村淘宝下载安装 编辑:程序博客网 时间:2024/06/07 00:54

Problem 5. Autocomplete

Input file: input.txt
Output file: output.txt
Time limit: 1 second

Memory limit: 256 megabytes


Two words are considered similar, if they are equal when compared in case-insensitive way, but in a
case-sensitive comparison they differ in no more than K positions.
A dictionary containing W words as well as Q query-words is given. For each query-word, print a single

integer: the number of similar words in the dictionary.


Input

The first line of the input file contains an integer K —the maximum number of positions at which the
words can differ by case (0 <= K <= 5).
The second line contains an integer W —the number of words in the dictionary (1 <= W <= 1 000).
The following W lines contain the dictionary, one word per line. Each line consists of small and capital
Latin letters. All words are non-empty and are no longer than 2 000 symbols.
The following line contains an integer Q—the number of queries (1 <= Q <= 1 000).
The next Q lines contain queries, one word per line. Same as with the words in the dictionary, each query
consists of capital and small Latin letters, all queries are non-empty and no longer than 2 000 symbols

each.


Output

For each of Q queries from the input file print a single integer: the number of similar words in the

dictionary. Answers to the queries must be printed in the same order as the queries are listed in the input.


Example

input.txt                                                    output.txt
2                                                                    3
5                                                                    0
theword                                                         3
TheWord                                                       0
THEWORD
thewordandsomeletters
theword
4
theword
The
theword

TheWordAndSomeLetters


Source

XVII Open Cup named after E.V. Pankratiev. GP of Eurasia.

Opencup 010352


My Solution

题意:定义2个字符串如果忽略大小写则完全相同,如果区分大小写则只有不大于k个位置不同,则认为这2个字符串是“相似的”。给出w个(1<=w<=1000)个长度小于2000的模式串,然后q次询问(1<=q<=1000),每次输入一个长度小于2000的字符串问在那n个模式串中有多少个与其互为“相似的”。


哈希+bitset

这里2e6*(26+26)的复杂度,所以AC自动机基本上直接用不上了,如果只存小写,则虽然勉强够内存,但其实还是很可能会MLE的,并且也解决不了问题。

所以想到用哈希,

首先对于每个模式串在不区分大小写的情况下可以O(length)的得到一个哈希值hashval[i],然后O(length)的处理出一个bitset<2016> hashmp[i],当s[j]为大写时 hashmp[i][j] = 1,

其它时候为0.

然后对于输入的询问串s,可以得到不区分大小写的hash值 tocmpv 和一个存储了大小写情况的bitset<2016> tocmp,

然后扫一遍模式串,每次如果模式串与s的哈希值相同,则他们在不区分大小写的情况下相同,然后res = hashmp[i] ^tocmp得到他们不同的地方的个数,如果res.count() <=k则刷新ans。

时间复杂度 O(w*q*length/32) 大概是1e9/32吧

空间复杂度 O(n*length/32)


#include <iostream>#include <cstdio>#include <string>#include <cstring>#include <vector>#include <map>#include <bitset>using namespace std;typedef long long LL;typedef unsigned long long ull;const int CHAR_SIZE = 26 + 26;const int MAX_SIZE = 2e6 + 8;const int MAXN = 1e3 + 8;bitset<2*MAXN> hashmp[MAXN], tocmp, res;ull hashval[MAXN], tocmpv;char s[2*MAXN];const ull p = 1e12 + 7;             //!struct HASH{    ull Hash[2*MAXN], xp[2*MAXN];    void init1(int sz){        xp[0] = 1;        for(int i = 1; i <= sz; i++) xp[i] = xp[i-1] * p;    }    void init2(char *s){//0~n-1 based        int sz = strlen(s);        Hash[sz] = 0;        for(int i = sz - 1; i >= 0; i--){            Hash[i] = Hash[i+1] * p + (s[i] - 'a' + 1);        }    }    ull get_Hash(int st, int len){        return Hash[st] - Hash[st + len] * xp[len];    }} hh;int main(){    #ifdef LOCAL    freopen("e.txt", "r", stdin);    //freopen("e.out", "w", stdout);    #endif // LOCAL    //ios::sync_with_stdio(false); cin.tie(0);    int k, n, q, len, now, i, j, ans, tmp;    scanf("%d%d", &k, &n);    for(i = 0; i < n; i++){        scanf("%s", s);        len = strlen(s);        for(j = 0; j < len; j++){            if(isupper(s[j])) hashmp[i][j] = 1, s[j] = tolower(s[j]);        }        hh.init1(len);        hh.init2(s);        hashval[i] = hh.get_Hash(0, len);    }    scanf("%d", &q);    while(q--){        scanf("%s", s);        len = strlen(s);        tocmp.reset();        for(j = 0; j < len; j++){            if(isupper(s[j])) tocmp[j] = 1, s[j] = tolower(s[j]);        }        hh.init1(len);        hh.init2(s);        tocmpv = hh.get_Hash(0, len);        ans = 0;        for(i = 0; i < n; i++){            if(hashval[i] == tocmpv){                res = hashmp[i] ^ tocmp;                if(res.count() <= k) ans++;            }        }        printf("%d\n", ans);    }    return 0;}


  Thank you!

                                                                                                                                            ------from ProLights

原创粉丝点击