kmp(字符串匹配)
来源:互联网 发布:tcl电视数据已满 编辑:程序博客网 时间:2024/06/16 17:51
kmp 算法,可以计算模式串是否在主串中出现,以及出现的位置。
kmp_nextt 模式串的自我匹配,
j 1 2 3 4 5 6 7 8
模式 a b a a b c a c
nextt[j] 0 1 1 2 2 3 1 2
get_nextval() 是 kmp_nextt 的优化
具体例子:
j 1 2 3 4 5
模式 a a a a b
nextt[j] 0 1 2 3 4
nextval[j] 0 0 0 0 4
代码:
#include <stdio.h>#include <string.h>const int N = 10100;char a[N], b[N]; // 主串 a, 模式串 b。 int nextt[N]; //int nextval[N]; void kmp_nextt(int blen) // 模式串 b 的自我匹配 {int i = 1, j = 0;nextt[1] = 0;while(i < blen){if(j == 0 || b[i] == b[j]) {++i, ++j;nextt[i] = j;}else j = nextt[j];} }void get_nextval(int blen) // 模式串自我匹配的优化 {int i = 1, j = 0;nextval[1] = 0;while(i < blen){if(j == 0 || b[i] == b[j]){++i, ++j;if(b[i] != b[j])nextval[i] = j;else nextval[i] = nextval[j];}else{j = nextval[j];}} }int kmp(int alen, int blen, int pos)//求模式串 b , 在主串 a 第 pos 个字符之后的位置 {int i = pos, j = 1;while(i <= alen && j <= blen){ if(j == 0 || a[i] == b[j]) { ++i, ++j; } else j = nextt[j]; } if(j > blen) return i - blen; // 匹配成功返回一个位置 else return false; } int main(){ scanf("%s %s", a+1, b+1); // 主串 模式串 , 下表都是从 1 开始 int alen = strlen(a+1); // 主串长度 int blen = strlen(b+1); // 模式串长度 kmp_nextt (blen); printf("%d\n", kmp (alen, blen, 1)); return 0; }
下面有几道题验证下代码:
验证下 kmp_nextt:
Power Strings
Time Limit: 3000MS Memory Limit: 65536KTotal Submissions: 52257 Accepted: 21771
Description
Given two strings a and b we define a*b to be their concatenation. For example, if a = "abc" and b = "def" then a*b = "abcdef". If we think of concatenation as multiplication, exponentiation by a non-negative integer is defined in the normal way: a^0 = "" (the empty string) and a^(n+1) = a*(a^n).
Input
Each test case is a line of input representing s, a string of printable characters. The length of s will be at least 1 and will not exceed 1 million characters. A line containing a period follows the last test case.
Output
For each s you should print the largest n such that s = a^n for some string a.
Sample Input
abcdaaaaababab.
Sample Output
143题意:
就是问最多有几个循环节。最少一个,就是自身。
思路:
求nextt[b+1] 回到那里,blen - (nextt[blen+1]-1) 就是循环节的最短长度。
代码:
#include <stdio.h>#include <string.h>const int N = 1001000; char b[N];int nextval[N]; int nextt[N];void kmp_nextt(int blen) //模式串自我匹配 {int i = 1, j = 0;nextt[1] = 0;while(i <= blen){if(j == 0 || b[i] == b[j]) {++i, ++j;nextt[i] = j;}else j = nextt[j];} }int main(){while(scanf("%s",b+1), b[1]!='.') // 下标从 1 开始 { int blen = strlen(b+1); kmp_nextt(blen); // for(int i=1; i<=blen+1; i++)// printf("%d ",nextt[i]); int ans = 1; if(blen%(blen-(nextt[blen+1]-1)) == 0) ans = blen / (blen - (nextt[blen+1]-1)); printf("%d\n",ans);}return 0;}验证下 kmp。
Oulipo
Time Limit: 3000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)Total Submission(s): 16251 Accepted Submission(s): 6479
Problem Description
The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.
Input
The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:
One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.
One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.
Output
For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.
Sample Input
3BAPCBAPCAZAAZAZAZAVERDIAVERDXIVYERDIAN
Sample Output
130
题意:上面哪个是模式串,下面的是主串,问模式串在主串中出现几次。
代码:
#include <stdio.h> #include <string.h>const int N = 1000100;char a[N], b[N]; // 主串 a, 模式串 b。 int nextt[N]; //void kmp_nextt(int blen){int i = 1, j = 0;nextt[1] = 0;while(i < blen){if(j == 0 || b[i] == b[j]) {++i, ++j;nextt[i] = j;}else j = nextt[j];} }int ans = 0;int kmp(int alen, int blen, int pos)//求模式串 b , 在主串 a 第 pos 个字符之后的位置 {int i = pos, j = 1;while(i <= alen && j <= blen){ if(j == 0 || a[i] == b[j]) { ++i, ++j; if(j > blen || j == blen && a[i] == b[j]) // 每当成功匹配 ,结果 +1 { ans ++; j = nextt[j]; } } else j = nextt[j]; } if(j > blen) return i - blen;else return false; } int main(){ int n; scanf("%d ",&n); while(n--) { scanf("%s %s", b+1, a+1); int alen = strlen(a+1); // 主串长度 int blen = strlen(b+1); // 模式串长度 ans = 0; kmp_nextt (blen); kmp(alen, blen, 1); printf("%d\n",ans); } return 0;}
get_nextval 找到相应题会补上。
阅读全文
0 0
- KMP 字符串匹配算法
- kmp字符串匹配算法
- kmp字符串匹配算法
- KMP字符串匹配算法
- 字符串匹配算法-kmp
- KMP(字符串匹配)算法
- 字符串匹配 KMP
- KMP 字符串匹配算法
- 字符串匹配算法:KMP
- KMP算法 字符串匹配
- 字符串匹配 KMP 算法
- KMP字符串匹配(1)
- KMP字符串匹配(2)
- KMP字符串匹配(3)
- KMP字符串匹配算法
- KMP字符串匹配
- KMP字符串匹配算法
- KMP字符串匹配
- Python SMTP发送邮件
- 动态日历
- [LeetCode]414. Third Maximum Number
- 沉浸式通知栏自动退出的问题
- log4j输出多个自定义日志文件
- kmp(字符串匹配)
- Git fetch和git pull的区别
- Java WEB系统国际化经验总结
- SDWebImage底层的实现
- 【转】函数的声明和定义
- Codeforces Round #338 (Div. 2)-D Multipliers(所有因数之积)
- 解决占用端口号的方法
- ServeltAndJsp学习笔记--1
- System.IndexOutOfRangeException: 在位置 0 处没有任何行。