HDU2532－字符串匹配

来源：互联网发布：淘宝店名起名大全男装编辑：程序博客网时间：2024/05/21 10:52

Engine

Time Limit: 5000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 267 Accepted Submission(s): 57

Problem Description

谷歌、百度等搜索引擎已经成为了互连网中不可或缺的一部分。在本题中，你的任务也是设计一个搜索论文的搜索引擎，当然，本题的要求比起实际的需求要少了许多。
本题的输入将首先给出一系列的论文，对于每篇论文首先给出标题，然后给出它被引用的次数。然后会有一系列的搜索询问，询问标题中包含特定关键词的论文有哪些。
每一个询问可能包含多个关键词，你需要找出标题包含所有关键词的论文。
“包含”必须是标题中有一个词正好是给定的关键词，不区分大小写。
对每个询问，都按被引用的次数从多到少输出满足条件的论文的标题。如果有被引用的次数相同的论文，则按照论文在输入中的顺序排列，先给出的论文排在前面。

Input

输入包含多组数据。
每组数据首先有一行包含一个整数N(1<=N<=1000)，表示论文的数目，N=0表示输入结束。每组论文的信息第一行是论文的标题，由字母（大小写均可）和空格组成，不超过10个词，每个词不超过20个字符，标题总共不超过250个字符。第二行是一个整数K(0<=K<=108)，表示它被引用的次数。在论文信息结束以后，有一行包含一个整数M(1<=M<=100)，表示询问的数目。接下来有M行，每行是一个询问，由L(1<=L<=10)个空格分开的词构成，每个词不超过20个字符。

Output

对每个询问，按照题目给定的顺序输出满足条件的论文的标题；如果没有满足条件的论文，就不输出。在每组询问的输出之后输出一行”***”，在每组数据的输出之后输出一行”---”。

Sample Input

6Finding the Shortest Path120Finding the k Shortest Path80Find Augmenting Path in General Graph80Matching in Bipartite Graph200Finding kth Shortest Path50Graph Theory and its Applications406shortest pathk shortest pathgraphpathfindapplication0

Sample Output

Finding the Shortest PathFinding the k Shortest PathFinding kth Shortest Path***Finding the k Shortest Path***Matching in Bipartite GraphFind Augmenting Path in General GraphGraph Theory and its Applications***Finding the Shortest PathFinding the k Shortest Path Find Augmenting Path in General GraphFinding kth Shortest Path***Find Augmenting Path in General Graph******---

Source

The 6th UESTC Programming Contest

思路：

　　题目很容易理解，对每个询问，都按被引用的次数从多到少输出满足条件的论文的标题。如果有被引用的次数相同的

论文，则按照论文在输入中的顺序排列，先给出的论文排在前面。因此我们可以用一个结构体node来保存每一个标题信息

（id,reference,char text[300]），先根据reference的大小排序，如果相等，则再根据id排序。由于标题可能是不规则的。比如

“ find The shortest Path ”,空格和大小写都没有规则，所以，需要先预处理：使得格式变成" keyword keyword keyword "

（全部为小写并且中间空格只有一个），剩下来的问题就是字符串匹配的问题了，可以用C库的strstr(char*,char*),也可以自

己实现kmp,BM,BF等等。代码跑了953ms,差点悲剧。

有几个易错点需要注意（我在编程的时候遇到的）：

1.不能用scanf("%s ",&n)接收值，并且忽略后面的\n,space等等，因为第二行的开头可能存在space,这么做连开头的空格也忽略了

贡献了一次PE

2.sort的大小写要区分，这是小问题

3.目标串和模式串的末尾应该留一个space(小窍门)，防止这种WA,Find和Finding虽然有相同前缀，但是并不是同一关键字，如果

后面留一个空格，则"Find "和"Finding "就不是相同关键字，刚好合乎要求。

/*全部转成小写---strlwr函数原型：char*strlwr(char*src)功  能: 将字符串src转换成小写形式，只转换src中出现的大写字母，不改变其他字符返回值：返回指向src的指针。函数名: strstr功  能: 在串中查找指定字符串的第一次出现用  法: char *strstr(char *str1, char *str2);*/#include <iostream>#include <ctype.h>#include <algorithm>#include <string>using namespace std;struct node{char title[300];int id;int reference;}text[1010];char src[300],dst[300];bool cmp(const node& x,const node& y){if (x.reference == y.reference)return x.id < y.id;elsereturn x.reference > y.reference;}void format(char *dst)//格式化:清除多余的空格,全部变成小写,O(n){int i=0,j,len=0,n=strlen(dst);while (i < n){while (i < n && isspace(dst[i]))++i;j = i;while (j < n && !isspace(dst[j]))dst[len++] = tolower(dst[j++]);if (j!=i){dst[len++] = ' ';i = j;}}dst[len] = '\0';}int main(){#ifndef ONLINE_JUDGEfreopen("2.txt","r",stdin);#endifint n,k,i,j;while (~scanf("%d",&n) && n!=0){getchar();for (i=0; i < n; ++i){gets(text[i].title);scanf("%d",&text[i].reference);getchar();text[i].id = i;}sort(text,text+n,cmp);scanf("%d",&k);getchar();for(i=0; i < k; ++i){gets(src);format(src);for(j=0; j < n; ++j){strcpy(dst,text[j].title);format(dst);if (strstr(dst,src)){puts(text[j].title);}}puts("***");}puts("---");}return 0;}

上面的代码时间复杂度太高,953ms,所以,可能是查询比较频繁导致的,因此,可以在读取字符串的时候就先格式化,

将format的顺序换一换,优化一下时间效率.时间从953ms,降到了100ms,内存从564K升到860K.

/*全部转成小写---strlwr函数原型：char*strlwr(char*src)功  能: 将字符串src转换成小写形式，只转换src中出现的大写字母，不改变其他字符返回值：返回指向src的指针。函数名: strstr功  能: 在串中查找指定字符串的第一次出现用  法: char *strstr(char *str1, char *str2);*/#include <iostream>#include <ctype.h>#include <algorithm>#include <string>using namespace std;struct node{char title[300];char low[300];int id;int reference;}text[1010];char src[300],dst[300];bool cmp(const node& x,const node& y){if (x.reference == y.reference)return x.id < y.id;elsereturn x.reference > y.reference;}void format(char *dst)//格式化:清除多余的空格,全部变成小写,O(n){int i=0,j,len=0,n=strlen(dst);strlwr(dst);while (i < n){while (i < n && dst[i]==' ')++i;j = i;while (j < n && dst[j]!=' ')dst[len++] = dst[j++];if (j!=i){dst[len++] = ' ';i = j;}}dst[len] = '\0';}int main(){#ifndef ONLINE_JUDGEfreopen("2.txt","r",stdin);#endifint n,k,i,j;while (~scanf("%d",&n) && n!=0){getchar();for (i=0; i < n; ++i){gets(text[i].title);strcpy(text[i].low,text[i].title);format(text[i].low);scanf("%d",&text[i].reference);getchar();text[i].id = i;}sort(text,text+n,cmp);scanf("%d",&k);getchar();for(i=0; i < k; ++i){gets(src);format(src);for(j=0; j < n; ++j){//strcpy(dst,text[j].title);//这种方式时间复杂性太高//format(dst);if (strstr(text[j].low,src)){puts(text[j].title);}}puts("***");}puts("---");}return 0;}

最后,试了一下KMP版本的,跑了400ms,还是C语的strstr函数比较强大.

/*全部转成小写---strlwr函数原型：char*strlwr(char*src)功  能: 将字符串src转换成小写形式，只转换src中出现的大写字母，不改变其他字符返回值：返回指向src的指针。函数名: strstr功  能: 在串中查找指定字符串的第一次出现用  法: char *strstr(char *str1, char *str2);*/#include <iostream>#include <ctype.h>#include <algorithm>#include <string>using namespace std;struct node{char title[300];char low[300];int id;int reference;}text[1010];char src[300],dst[300];bool cmp(const node& x,const node& y){if (x.reference == y.reference)return x.id < y.id;elsereturn x.reference > y.reference;}int next[10005];void getNext(char *pattern){int i=0,j=-1,pLen = strlen(src);next[0]=-1;while( i < pLen ){if( j==-1 || pattern[i] == pattern[j]){++i,++j;next[i]=j;}else{j=next[j];}}}int KMP(char *str,char *pattern){int i=0, j=0, sLen=strlen(str), pLen=strlen(pattern);getNext(pattern);while (i < sLen && j < pLen){if (j== -1 || str[i] == pattern[j]){++i;++j;}else{j=next[j];}}if (j == pLen)return i-j+1;else return -1;}void format(char *dst)//格式化:清除多余的空格,全部变成小写,O(n){int i=0,j,len=0,n=strlen(dst);strlwr(dst);while (i < n){while (i < n && isspace(dst[i]))++i;j = i;while (j < n && !isspace(dst[j]))dst[len++] = dst[j++];if (j!=i){dst[len++] = ' ';i = j;}}dst[len] = '\0';}int main(){#ifndef ONLINE_JUDGEfreopen("2.txt","r",stdin);#endifint n,k,i,j;while (~scanf("%d",&n) && n!=0){getchar();for (i=0; i < n; ++i){gets(text[i].title);strcpy(text[i].low,text[i].title);format(text[i].low);scanf("%d",&text[i].reference);getchar();text[i].id = i;}sort(text,text+n,cmp);scanf("%d",&k);getchar();for(i=0; i < k; ++i){gets(src);format(src);for(j=0; j < n; ++j){//strcpy(dst,text[j].title);//format(dst);if (KMP(text[j].low,src) != -1){puts(text[j].title);}}puts("***");}puts("---");}return 0;}

0 0