最长公共子序列——LCS

来源：互联网发布：韩国域名注册商编辑：程序博客网时间：2024/05/17 06:33

参考：http://blog.csdn.net/v_JULY_v/article/details/6110269

最长公共子序列的DP 解决方法：

其可以化简为最优的子结构

记:

Xi=﹤x1，⋯，xi﹥即X序列的前i个字符 (1≤i≤m)（前缀）
Yj=﹤y1，⋯，yj﹥即Y序列的前j个字符 (1≤j≤n)（前缀）

假定Z=﹤z1，⋯，zk﹥∈LCS(X , Y)。

若xm=yn（最后一个字符相同），则不难用反证法证明：该字符必是X与Y的任一最长公共子序列Z（设长度为k）的最后一个字符，即有zk = xm = yn 且显然有Zk-1∈LCS(Xm-1 , Yn-1)即Z的前缀Zk-1是Xm-1与Yn-1的最长公共子序列。此时，问题化归成求Xm-1与Yn-1的LCS（LCS(X , Y)的长度等于LCS(Xm-1 , Yn-1)的长度加1）。
若xm≠yn，则亦不难用反证法证明：要么Z∈LCS(Xm-1, Y)，要么Z∈LCS(X , Yn-1)。由于zk≠xm与zk≠yn其中至少有一个必成立，若zk≠xm则有Z∈LCS(Xm-1 , Y)，类似的，若zk≠yn 则有Z∈LCS(X , Yn-1)。此时，问题化归成求Xm-1与Y的LCS及X与Yn-1的LCS。LCS(X , Y)的长度为：max{LCS(Xm-1 , Y)的长度, LCS(X , Yn-1)的长度}。

由于上述当xm≠yn的情况中，求LCS(Xm-1 , Y)的长度与LCS(X , Yn-1)的长度，这两个问题不是相互独立的：两者都需要求LCS(Xm-1，Yn-1)的长度。另外两个序列的LCS中包含了两个序列的前缀的LCS，故问题具有最优子结构性质考虑用动态规划法。

也就是说，解决这个LCS问题，你要求三个方面的东西：1、LCS（Xm-1，Yn-1）+1；2、LCS（Xm-1，Y），LCS（X，Yn-1）；3、max{LCS（Xm-1，Y），LCS（X，Yn-1）}。

2.1、最长公共子序列的结构

最长公共子序列的结构有如下表示：

设序列X=<x₁, x₂, …, x_m>和Y=<y₁, y₂, …, y_n>的一个最长公共子序列Z=<z₁, z₂, …, z_k>，则：

若x_m=y_n，则z_k=x_m=y_n且Z_k-1是X_m-1和Y_n-1的最长公共子序列；
若x_m≠y_n且z_k≠x_{m ，}则Z是X_m-1和Y的最长公共子序列；
若x_m≠y_n且z_k≠y_n ，则Z是X和Y_n-1的最长公共子序列。

其中X_m-1=<x₁, x₂, …, x_m-1>，Y_n-1=<y₁, y₂, …, y_n-1>，Z_k-1=<z₁, z₂, …, z_k-1>。

那么就是其动规方程：

引用另一篇博客：

最大子序列：最大子序列是要找出由数组成的一维数组中和最大的连续子序列。比如{5,-3,4,2}的最大子序列就是{5,-3,4,2}，它的和是8,达到最大；而{5,-6,4,2}的最大子序列是{4,2}，它的和是6。你已经看出来了，找最大子序列的方法很简单，只要前i项的和还没有小于0那么子序列就一直向后扩展，否则丢弃之前的子序列开始新的子序列，同时我们要记下各个子序列的和，最后找到和最大的子序列。更多请参看：程序员编程艺术第七章、求连续子数组的最大和。
最长公共子串：找两个字符串的最长公共子串，这个子串要求在原字符串中是连续的。其实这又是一个序贯决策问题，可以用动态规划来求解。我们采用一个二维矩阵来记录中间的结果。这个二维矩阵怎么构造呢？直接举个例子吧："bab"和"caba"(当然我们现在一眼就可以看出来最长公共子串是"ba"或"ab")
　　 b　　a　　b
c　　0　　0　　0
a　　0　　1　　0
b　　1　　0　　1
a　　0　　1　　0
我们看矩阵的斜对角线最长的那个就能找出最长公共子串。
不过在二维矩阵上找最长的由1组成的斜对角线也是件麻烦费时的事，下面改进：当要在矩阵是填1时让它等于其左上角元素加1。
　　 b　　a　　b
c　　0　　0　　0
a　　0　　1　　0
b　　1　　0　　2
a　　0　　2　　0
这样矩阵中的最大元素就是最长公共子串的长度。
在构造这个二维矩阵的过程中由于得出矩阵的某一行后其上一行就没用了，所以实际上在程序中可以用一维数组来代替这个矩阵。

给出代码：

#include <iostream>#include <algorithm>#include <cstring>#include <cstdio>#include <cstdlib>using namespace std;int dp[201][201];char str1[201];char str2[201];int main(){    while(~scanf("%s%s", str1, str2))    {        dp[0][0] = 0;        for(int i = 1; i <= strlen(str1); i ++){            for(int j = 1; j <= strlen(str2); j ++)<span style="white-space:pre"></span>{                if(str1[i - 1] == str2[j - 1]){                    dp[i][j] = dp[i - 1][j - 1] + 1;                }                else{                    dp[i][j] = 0;                }            }        }        printf("%d\n", dp[strlen(str1)][strlen(str2)]);    }    return 0;}

最长公共子序列LCS问题：最长公共子序列与最长公共子串的区别在于最长公共子序列不要求在原字符串中是连续的，比如ADE和ABCDE的最长公共子序列是ADE。
我们用动态规划的方法来思考这个问题如是求解。首先要找到状态转移方程：
等号约定，C1是S1的最右侧字符，C2是S2的最右侧字符，S1‘是从S1中去除C1的部分，S2'是从S2中去除C2的部分。
LCS(S1,S2)等于：

（1）LCS（S1，S2’）
（2）LCS（S1’，S2）
（3）如果C1不等于C2：LCS（S1’，S2’）；如果C1等于C2：LCS（S1'，S2'）+C1；

边界终止条件：如果S1和S2都是空串，则结果也是空串。

下面我们同样要构建一个矩阵来存储动态规划过程中子问题的解。这个矩阵中的每个数字代表了该行和该列之前的LCS的长度。与上面刚刚分析出的状态转移议程相对应，矩阵中每个格子里的数字应该这么填，它等于以下3项的最大值：

（1）上面一个格子里的数字
（2）左边一个格子里的数字
（3）左上角那个格子里的数字（如果C1不等于C2）；左上角那个格子里的数字+1（如果C1等于C2）

举个例子：

　　　　 G　　C　　T　　A
　　 0　　0　　0　　0　　0
G　 0　　1　　1　　1　　1
B　 0　　1　　1　　1　　1
T　 0　　1　　1　　2　　2
A 0　　1　　1　　2　　3

填写最后一个数字时，它应该是下面三个的最大者：

（1）上边的数字2
（2）左边的数字2
（3）左上角的数字2+1=3,因为此时C1==C2

所以最终结果是3。

在填写过程中我们还是记录下当前单元格的数字来自于哪个单元格，以方便最后我们回溯找出最长公共子串。有时候左上、左、上三者中有多个同时达到最大，那么任取其中之一，但是在整个过程中你必须遵循固定的优先标准。在我的代码中优先级别是左上>左>上。

下图给出了回溯法找出LCS的过程：

#include <iostream>#include <algorithm>#include <cstring>#include <cstdio>#include <cstdlib>using namespace std;int dp[201][201];char str1[201];char str2[201];int main(){    while(~scanf("%s%s", str1, str2))    {        dp[0][0] = 0;        for(int i = 1; i <= strlen(str1); i ++){            for(int j = 1; j <= strlen(str2); j ++){                if(str1[i - 1] == str2[j - 1]){                    dp[i][j] = dp[i - 1][j - 1] + 1;                }                else{                    dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);                }            }        }        printf("%d\n", dp[strlen(str1)][strlen(str2)]);    }    return 0;}

这里有一道例题：

L - Common Subsequence

Time Limit:1000MS Memory Limit:10000KB 64bit IO Format:%I64d & %I64u

Submit Status

Description

A subsequence of a given sequence is the given sequence with some elements (possible none) left out. Given a sequence X = < x1, x2, ..., xm > another sequence Z = < z1, z2, ..., zk > is a subsequence of X if there exists a strictly increasing sequence < i1, i2, ..., ik > of indices of X such that for all j = 1,2,...,k, x _{i_j} = zj. For example, Z = < a, b, f, c > is a subsequence of X = < a, b, c, f, b, c > with index sequence < 1, 2, 4, 6 >. Given two sequences X and Y the problem is to find the length of the maximum-length common subsequence of X and Y.

Input

The program input is from the std input. Each data set in the input contains two strings representing the given sequences. The sequences are separated by any number of white spaces. The input data are correct.

Output

For each set of data the program prints on the standard output the length of the maximum-length common subsequence from the beginning of a separate line.

Sample Input

abcfbc         abfcabprogramming    contest abcd           mnp

Sample Output

#include <iostrea<span style="font-family:Arial, Helvetica, sans-serif;">m></span>

#include <algorithm>#include <cstring>#include <cstdio>#include <cstdlib>using namespace std;int dp[201][201];char str1[201];char str2[201];int main(){    while(~scanf("%s%s", str1, str2)){        dp[0][0] = 0;        for(int i = 1; i <= strlen(str1); i ++){            for(int j = 1; j <= strlen(str2); j ++){                if(str1[i - 1] == str2[j - 1]){                    dp[i][j] = dp[i - 1][j - 1] + 1;                }                else{                    dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);                }            }        }        printf("%d\n", dp[strlen(str1)][strlen(str2)]);    }    return 0;}

0 0