Longest Repeated Substring
来源:互联网 发布:北京化工大学c语言 编辑:程序博客网 时间:2024/05/24 03:12
http://nriverwang.blogspot.hk/
This problem is from codeeval.com.
Problem Description:
You are to find the longest repeated substring in a given text. Repeated substrings may not overlap. If more than one substring is repeated with the same length, print the first one you find.(starting from the beginning of the text). NOTE: The substrings can't be all spaces.Input Sample:
Your program should accept as its first argument a path to a filename. The input file contains several lines. Each line is one test case. Each line contains a test string. eg.
bananaabc
Output Sample:For each set of input produce a single line of output which is the longest repeated substring. If there is none, print out the string NONE. eg.
anNONE
Analysis:
The input text is actually a string and suppose the length of string is n.Brute Force:
The brute force method is easy: for each index pair [i, j] (1 <= i, j <= n) of the string, count the length of non-overlapping repeated substring starting from this two indices . And then get the first repeated substring with the longest length. Although the non-overlapping constraint can help to reduce the character comparison times while checking two substrings, the running time in the worst case is still O(n^3) (can be proved, but omitted here). Obviously the brute force method is not a good choice.
Suffix Array:
Another method is to use suffix array. For the input string str, we first sort all of its suffixes in the suffix array. eg.
str = banana, its suffixes are:bananaananananaananaa
after sort, the suffix array looks like:aanaananabananananana
Then for each two adjacent suffixes on the sorted suffix array, count the length of the common prefix (Note: should avoid counting the overlapped part). The longest repeated prefix appearing first on the original input string is the result. e.g.aanaananabanananananaThe red prefixes an and green prefixes na both have the same longest length 2, but an is the result because it appears before na in str = banana.Similar to brute force method, the suffixes comparison in sorting and repeated prefix length counting make the running time be O(n^2lgn) (suppose an O(nlgn) sorting algorithm is used). The length of suffix array is n and it may just need to store the indices pointing to the suffixes in the input string rather than the suffixes themselves. Therefore the space complexity is O(n).
Dynamic Programming:
O(n^2lgn) looks good, however, we can solve this problem in O(n^2) time and O(n^2) space with DP. Suppose we first don't consider the constraint of no overlap. The idea is to find the longest repeated suffix for all pairs of prefixes of the string str. The length of the longest repeated suffix for prefixes str[1..i] and str[1..j] is:
c(i, j) = 1) c(i-1, j-1) + 1 if str[i] == str[j]; 2) 0 otherwise.The maximum of c(i, j) (1 <= i, j <= n) is the length of longest repeated substring (suffix) of str. The longest repeated substring should be stored and dynamically updated during the calculating of c(i, j).
Now consider the non-overlapping constraint, otherwise the above process will give us the string itself. Look at prefixes str[1..i] and str[1..j], the end indices of their suffixes are i and j (for simplicity, suppose i <= j), respectively. If their common suffix is longer than j-i, the two suffixes must overlap. The overlapped part should not be counted and the above recursion should be modified as:
c(i, j) = 1) c(i-1, j-1) + 1 if str[i] == str[j] && abs(i-j) > c(i-1, j-1); 2) 0 otherwise.If more than one repeated substring have the same longest length, we just need to take the first one in the input string. More details can be found in the code.
Code (C++)
Bellow is the code of DP method: std::string repeated_substring(std::string &str) { int len = str.length(); int **c = new int*[len + 1]; for (int i = 0; i <= len; ++i) c[i] = new int[len + 1]; for (int i = 0; i <= len; ++i) { c[i][0] = 0; c[0][i] = 0; } int max_len = 0, index = len + 1; for (int i = 1; i <= len; ++i) { for (int j = 1; j <= len; ++j) { if (str[i-1] == str[j-1] && abs(i-j) > c[i-1][j-1]) { c[i][j] = c[i-1][j-1] + 1; if (c[i][j] > max_len) { max_len = c[i][j]; index = std::min(i, j); } } else { c[i][j] = 0; } } } for (int i = 0; i <= len; ++i) delete[] c[i]; delete[] c; if (max_len > 0) { std::string ret = str.substr(index - max_len, max_len); for (int i = 0; i < max_len; ++i) if(ret[i] != ' ') return ret; } return "NONE";}
0 0
- Longest Repeated Substring
- Longest Repeated Substring zoj3199
- Longest Repeated Substring
- ZOJ 3199 Longest Repeated Substring
- [LeetCode] Longest substring without repeated characters
- ZOJ 3199 Longest Repeated Substring 后缀数组 + RMQ预处理
- Leetcode Problem 3: Longest non-repeated substring(LNRS).
- 459. Repeated Substring Pattern
- 459. Repeated Substring Pattern
- Repeated Substring Pattern
- 459. Repeated Substring Pattern*
- 459. Repeated Substring Pattern
- Repeated Substring Pattern
- 459. Repeated Substring Pattern
- 459. Repeated Substring Pattern
- 459. Repeated Substring Pattern
- Repeated Substring Pattern
- Leetcode Repeated Substring Pattern
- html5实现刮刮卡效果
- Linux中关于文件权限的一些小结
- Android AVD创建及设置中各参数详解
- Java 关键字 final 和static 的区别
- ThreadPoolExecutor 理解
- Longest Repeated Substring
- iar for arm 6.3以上打开旧工程字体改变的解决
- MYSQL自动备份策略的选择与实践
- 文档兼容性设置 X-UA-compatible
- PHP预定义变量
- VC6.0 中如何对齐零乱的代码
- Hashtable,Hashmap,Propeties
- Kinect学习DAY2:OpenNI
- linux命令 sed Makefile学习