子字符串查找

来源:互联网 发布:淘宝怎样取消退款 编辑:程序博客网 时间:2024/05/29 03:45

一、KMP算法

1、KMP算法实现(p500):

public class KMP {private String pat;private int[][] dfa;//确定有限状态自动机public void dfa(String pat)//构造DFA{this.pat = pat;int M = pat.length();int R = 256;dfa = new int[R][M];dfa[pat.charAt(0)][0] = 1;for (int X = 0, j = 1; j < M; j++){for (int c = 0; c < R; c++)dfa[c][j] = dfa[c][X];dfa[pat.charAt(j)][j] = j + 1;X = dfa[pat.charAt(j)][X];}}public int search(String txt) {int i,j,M = pat.length(),N = txt.length();for (i = 0, j = 0; i < N && j < M; i ++)j = dfa[txt.charAt(i)][j];if (j == M) return i - M;else return N;}public static void main(String[] args) {String pat = "ABABAC";String txt = "ABABCABABAC";KMP kmp = new KMP();kmp.dfa(pat);System.out.println(kmp.search(txt));// TODO Auto-generated method stub}}

2、对于长度为M的模式字符串和长度为N的文本,KMP字符串查找算法访问的字符不会超过M+N个。

3、KMP算法由于不回退,适用于长度不确定的输入流中进行查找,以及在重复性很高的文本中查找重复性很高的模式。

二、Boyer-Moore字符串查找算法(启发式的处理不匹配字符)

1、实现(p504)

public class BoyerMoore//启发式处理不匹配字符 {private String pat;private int[] right;//记录每个字符在模式中出现的最右位置BoyerMoore(String pat){this.pat = pat;int R = 256;int M = pat.length();right = new int[R];for (int c = 0; c < R; c++)right[c] = -1;for (int j = 0; j < M; j++)right[pat.charAt(j)] = j;}public int search(String txt){int N = txt.length();int M = pat.length();int skip;for (int i = 0; i <= N-M; i += skip){skip = 0;for (int j = M - 1; j >= 0; j--)if (txt.charAt(i + j) != pat.charAt(j)){skip = j - right[txt.charAt(i+j)];if (skip < 0) skip = 1;break;}if (skip == 0) return i;}return N;}public static void main(String[] args) {// TODO Auto-generated method stubString pat = "ABABAC";String txt = "ABABCABABAC";BoyerMoore bm = new BoyerMoore(pat);System.out.println(bm.search(txt));}}

2、在一般情况下,对于长度为N的文本和长度为M的模式字符串,使用了Boyer-Moore的子字符串查找算法通过启发式处理不匹配的字符需要~N/M次字符串比较。

三、Rabin-Krap指纹字符串查找算法(基于散列的字符串查找算法)

1、实现(p508):

public class RabinKarp{private String pat;private int R = 256;private long Q;private long RM;private long patHash;private int M;public RabinKarp(String pat){this.pat = pat;this.M = pat.length();Q = 97;RM = 1;for (int i = 1; i <= M - 1; i++)RM = (RM * R) % Q;patHash = hash(pat, M);}private long hash(String pat, int M){long h = 0;for (int i = 0; i < M; i++){h = (R * h + pat.charAt(i)) % Q;}return h;}public boolean check(int i){return true;}public int search(String txt){int N = txt.length();long txtHash = hash(txt, M);if (patHash == txtHash&& check(0)) return 0;for (int i = M; i < N; i++){txtHash = (txtHash + Q - RM * txt.charAt(i-M) % Q) % Q;txtHash = (txtHash * R + txt.charAt(i)) % Q;if (patHash == txtHash && check(i - M + 1)) return i - M + 1;}return N;}public static void main(String[] args) {// TODO Auto-generated method stubString pat = "ABABAC";String txt = "ABABCABABAC";RabinKarp rk = new RabinKarp(pat);System.out.println(rk.search(txt));}}
2、使用蒙特卡洛算法的Rabin-Karp子字符串查找算法的运行时间是线性级别且出错概率极小;使用拉斯维加斯算法的Rabin-Karp算法能够保证正确性且性能及其接近线性级别。

四、各种字符串查找算法实现成本总结:

算法
版本最坏情况操作次数一般情况操作次数在文中是否回退正确性格外空间需要Knuth-Morris-Pratt完整的DFA2N1.1N否是MRBoyer-Moore算法启发式查找不匹配的字符MNN/M是是RRabin-Krap算法蒙特卡洛算法
拉斯维加斯算法7N
7N*7N否
是是1暴力算法  MN1.1N是是1


0 0
原创粉丝点击