poj-3261 Milk Patterns 后缀数组简单题

来源:互联网 发布:java 视频断点上传 编辑:程序博客网 时间:2024/06/05 09:50

Farmer John has noticed that the quality of milk given by his cows varies from day to day. On further investigation, he discovered that although he can’t predict the quality of milk from one day to the next, there are some regular patterns in the daily milk quality.

To perform a rigorous study, he has invented a complex classification scheme by which each milk sample is recorded as an integer between 0 and 1,000,000 inclusive, and has recorded data from a single cow over N (1 ≤ N ≤ 20,000) days. He wishes to find the longest pattern of samples which repeats identically at least K (2 ≤ K ≤ N) times. This may include overlapping patterns – 1 2 3 2 3 2 3 1 repeats 2 3 2 3 twice, for example.

Help Farmer John by finding the longest repeating subsequence in the sequence of samples. It is guaranteed that at least one subsequence is repeated at least K times.

Input
Line 1: Two space-separated integers: N and K
Lines 2.. N+1: N integers, one per line, the quality of the milk on day i appears on the ith line.
Output
Line 1: One integer, the length of the longest pattern which occurs at least K times
一道模板题。
题意:
求在一个串中至少重复出现k次的最长子串的长度。
分析:
这种求重复子串的题可想到后缀数组里的height数组,即相邻后缀字符串的最长公共前缀(LCP),同时引入一个定理j和k的LCP等于height[rank[j]+1],height[rank[j]+2],……height[rank[k]]中最小值(画图易证),由此,我们可以推出,1。从j到k的字典序的区间最小公共字符串长度对应唯一的子串,2.如果l不在j到k之间,那么,l与区间中某一后缀不可能存在比区间最小LCP更长的LCP。
      方法一:因此我们看某一LCP的重复次数必然在连续的区间检测即可,必然不可能存在间断的公共前缀,基于这一方法可以用二分法,先从1到n二分枚举LCP长度,然后在height数组里检测一组超过k的连续值(均超过枚举长度)。
      方法二:我们也可以直接求出所有长度为k的区间的最小height的最大值。所以我们就可以用RMQ.
     

#include <iostream>#include <stdio.h>#include <string.h>#define maxn 20001using namespace std;int s[maxn];int sa[maxn],t[maxn],t2[maxn],c[maxn],n;int rank[maxn],height[maxn];int k;int H[1000001];void build_sa(int m){    int i,*x = t,*y = t2;    //jishu sort    for(i=0;i<m;i++)c[i] = 0;    for(i=0;i<n;i++)c[x[i] = s[i]]++;//from 0 to n-1    for(i=1;i<m;i++)c[i] += c[i-1];    for(i=n-1;i>=0;i--)sa[--c[x[i]]] = i;    for(int k=1;k<=n;k<<=1){        int p = 0;        //second key word        for(i = n-k;i<n;i++)y[p++] =i;        for(i = 0;i<n;i++)if(sa[i]>=k)y[p++] = sa[i]-k;        //sort first key word         for(i = 0;i<m;i++) c[i] = 0;        for(i = 0;i<n;i++) c[x[y[i]]]++;        for(i = 0;i<m;i++) c[i] += c[i-1];        for(i = n-1;i>=0;i--) sa[--c[x[y[i]]]] = y[i];        //according to sa and y to count new x        swap(x,y);        p = 1;x[sa[0]] = 0;        for(i= 1;i<n;i++){            x[sa[i]] = y[sa[i-1]]==y[sa[i]] && y[sa[i-1]+k]==y[sa[i]+k]?p-1:p++;//the easy logic        }        if(p>=n)break;//after that the sa never change         m = p;//next time the max value    }} void getH(){    int i,j,k = 0;    for(i = 0;i<n;i++)rank[sa[i]] = i;    for(i = 0;i<n;i++){        if(k)k--;        int j = sa[rank[i]-1];        while(s[i+k]==s[j+k])k++;        height[rank[i]] = k;    }} int dp[maxn][22]; void RMQ_init(){    for(int i=1; i<=n; i++) dp[i][0]=height[i];    for(int j=1; (1<<j)<=n; j++)        for(int i=1;i+(1<<j)-1<=n;i++)            dp[i][j]=min(dp[i][j-1],dp[i+(1<<(j-1))][j-1]);}int RMQ(int L,int R){    int k=0;    while((1<<(k+1))<=R-L+1) k++;    return min(dp[L][k],dp[R-(1<<k)+1][k]);}int tt[1000001];int main(){    while(cin>>n>>k){        for(int i=0;i<n;i++){            cin>>s[i];        }        memset(dp,0,sizeof(dp));        memset(H,0,sizeof(H));        memset(tt,0,sizeof(tt));         for(int i=0;i<n;i++)tt[s[i]] = H[s[i]]++;//将范围1-1000000的数压缩到1-n;        for(int i=1;i<1000001;i++)H[i] += (H[i-1]-tt[i]);        for(int i=0;i<n;i++)s[i] = H[s[i]];        build_sa(n+1);        getH();        int ans = 0;        RMQ_init();        for(int i=1;i+k-2<n;i++){            ans = max(RMQ(i,i+k-2),ans);        }        cout<<ans<<endl;            }    return 0;} 
0 0