distinct subsequence

来源:互联网 发布:中国民企军火出口知乎 编辑:程序博客网 时间:2024/06/05 21:10

A subsequence of a given sequence is just the given sequence with some elements (possibly none) left out. Formally, given a sequence X =x1x2xm, another sequence Z = z1z2zk is a subsequence of X if there exists a strictly increasing sequence <i1i2, …, ik> of indices of such that for all j = 1, 2, …, k, we have xij = zj. For example, Z = bcdb is a subsequence of X = abcbdab with corresponding index sequence< 2, 3, 5, 7 >.

In this problem your job is to write a program that counts the number of occurrences of Z in X as a subsequence such that each has a distinct index sequence.

LeeCode 题目如下:

Given a string S and a string T, count the number of distinct subsequences of T in S.

A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).

Here is an example:
S = "rabbbit"T = "rabbit"

Return 3.

思路1:递归(TLE)

如果当前字符相同,结果加上S和T在该index之后的匹配方法数

如果当前字符不同,将S的指针向后移,递归计算


class Solution {private:    int cnt;    int len_s;    int len_t;public:    Solution():cnt(0){}    void Count(string S,string T, int idx_ss, int idx_ts){        if(idx_ts == len_t){            cnt++;            return;        }        int i;        for (i=idx_ss; i<len_s; i++) {            if (S[i] == T[idx_ts]) {                Count(S, T, i + 1, idx_ts + 1);            }        }    }        int numDistinct(string S, string T) {        len_s = S.length();        len_t = T.length();        Count(S, T, 0, 0);        return cnt;    }};

思路2:DP

如果当前字符相同,dp[i][j]结果等于用S[i](dp[i-1][j-1])和不用S[i](dp[i-1][j])方法数求和

如果当前字符不同,dp[i][j] = dp[i-1][j]


class Solution {private:    int len_s;    int len_t;public:    int Count(string S,string T){        int i,j;        int dp[len_s][len_t];        memset(dp, 0, sizeof(dp));                if (S[0]==T[0]) {            dp[0][0] = 1;        }                for(i=1;i<len_s;i++){            dp[i][0] = dp[i-1][0];            if (T[0]==S[i]) {                dp[i][0]++;            }        }                        for (i=1; i<len_s; i++) {            for (j=1; j<len_t && j<=i; j++) {                if (S[i]!=T[j]) {                    dp[i][j] = dp[i-1][j];                    //cout<<dp[i-1][j]<<endl;                }                else{                    dp[i][j] = dp[i-1][j-1] + dp[i-1][j];                    //dp[i-1][j-1]: use S[i], as S[i]==T[j]                    //dp[i-1][j]  : don't use S[i]                    //cout<<dp[i][j]<<endl;                }            }        }        return dp[len_s-1][len_t-1];    }        int numDistinct(string S, string T) {        len_s = S.length();        len_t = T.length();        return Count(S, T);    }};


在stack overflow 找到如下的解决办法(DP办法):

From LeetCode

Given a string S and a string T, count the number of distinct subsequences of T in S.

A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).

Here is an example: S = "rabbbit", T = "rabbit"

Return 3.

I see a very good DP solution, however, I have hard time to understand it, anybody can explain how this dp works?

int numDistinct(string S, string T) {        vector<int> f(T.size()+1);        //set the last size to 1.        f[T.size()]=1;        for(int i=S.size()-1; i>=0; --i){            for(int j=0; j<T.size(); ++j){                f[j]+=(S[i]==T[j])*f[j+1];                printf("%d\t", f[j] );            }            cout<<"\n";        }        return f[0];    }
share|edit
 

2 Answers

activeoldestvotes
up vote12down voteaccepted

First, try to solve the problem yourself to come up with a naive implementation:

Let's say that S.length = m and T.length = n. Let's write S{i} for the substring of S starting at i(suffix array). For example, if S = "abcde"S{0} = "abcde"S{4} = "e", and S{5} = "". We use a similar definition for T.

Let N[i][j] be the distinct subsequences for S{i} and T{j}. We are interested in N[0][0](because those are both full strings).

There are two easy cases: N[i][n] for any i and N[m][j] for j<n. How many subsequences are there for "" in some string S? Exactly 1. How many for some T in ""? Only 0.

Now, given some arbitrary i and j, we need to find a recursive formula. There are two cases.

If S[i] != T[j], we know that N[i][j] = N[i+1][j] (I hope you can verify this for yourself, I aim to explain the cryptic algorithm above in detail, not this naive version).

If S[i] = T[j], we have a choice. We can either 'match' these characters and go on with the next characters of both S and T, or we can ignore the match (as in the case that S[i] != T[j]). Since we have both choices, we need to add the counts there: N[i][j] = N[i+1][j] + N[i+1][j+1].


In order to find N[0][0] using dynamic programming, we need to fill the N table. We first need to set the boundary of the table:

N[m][j] = 0, for 0 <= j < n //第m 行N[i][n] = 1, for 0 <= i <= m  // 第n 列

Because of the dependencies in the recursive relation, we can fill the rest of the table looping ibackwards and j forwards:

for (int i = m-1; i >= 0; i--) {    for (int j = 0; j < n; j++) {        if (S[i] == T[j]) {            N[i][j] = N[i+1][j] + N[i+1][j+1];        } else {            N[i][j] = N[i+1][j];        }    }}

We can now use the most important trick of the algorithm: we can use a 1-dimensional array f, with the invariant in the outer loop: f = N[i+1]; This is possible because of the way the table is filled. If we apply this to my algorithm, this gives:

f[j] = 0, for 0 <= j < nf[n] = 1for (int i = m-1; i >= 0; i--) {    for (int j = 0; j < n; j++) {        if (S[i] == T[j]) {            f[j] = f[j] + f[j+1];        } else {            f[j] = f[j];        }    }}

We're almost at the algorithm you gave. First of all, we don't need to initialize f[j] = 0. Second, we don't need assignments of the type f[j] = f[j].

Since this is C++ code, we can rewrite the snippet

if (S[i] == T[j]) {    f[j] += f[j+1];}

to

f[j] += (S[i] == T[j]) * f[j+1];

and that's all. This yields the algorithm:

f[n] = 1for (int i = m-1; i >= 0; i--) {    for (int j = 0; j < n; j++) {        f[j] += (S[i] == T[j]) * f[j+1];    }}
share|edit
 
 
thanks for explanation, hope I can vote more times. –  J.W. Dec 11 '13 at 4:06
 
can you explain this "N[i][n] = 1, for 0 <= i <= m"??? –  nyus2006 Apr 28 at 0:17
 
@S.H. you can think of it as for(int i = 0; i <= m; i++) { N[i][n] = 1; }. The big difference is that that way is operational: I provide an 'algorithm' how to set the values, whereas the way in the post isdeclarative: I only care about the values, not about how to achieve them. That's a more mathematical way of writing it. –  Vincent van der Weele Apr 28 at 6:19 


#include <iostream>#include <vector>#include <cstdlib>#include <cstdio>//The zero initialization is specified in the//standard as default zero initialization/value//\initialization for builtin types, primarily to//support just this type of case in template use.////Note that this behavior is different from a// local variable such as int x; which leaves// the value uninitialized (as in the C language//that behavior is inherited from).using namespace std;int numDistinct(string S, string T) {        vector<int> f(T.size() + 1); //默认的vector的每一个element 均被初始化为0        //set the last size to 1.        f[T.size()]=1;        for(int i = S.size() - 1; i >= 0; --i){            cout << "i = " << i << "\t";            // traverse the T string and compare with S            for(int j=0; j < T.size(); ++j){                f[j] += (S[i] == T[j]) * f[j+1];                printf("%d\t", f[j] );            }            cout<<"\n";        }        return f[0];}int main() {    string S = "rabbbitr";    string T = "rabit";    cout << numDistinct(S, T) << endl;}

运行结果如下:






0 0