POJ3080_Blue Jeans_KMP_求最长公共子序列

来源:互联网 发布:js手风琴效果代码 编辑:程序博客网 时间:2024/05/07 15:34

题意:

给n条DNA序列,每条长度为60,找到最长的公共的DNA子序列输出,若无,输出no significant commonalities

题解:

给的数据比较弱,最多10行序列,每行60个字符,我刚开始都想用纯暴力解这个题了,不过既然在学KMP,就还是练习一下吧。

我见网上的代码有用二分+后缀数组接的(都还没怎么学呢),表示太高深就没有看,这里用的是几乎暴力的方法


先枚举所有长度大于3的,第一行的子串(连续)

并依次用此子串跟后面的序列匹配,如果全部匹配成功就为一组解,然后在所有解中挑最长的,字典序排在前面的序列



题目:

Blue Jeans
Time Limit: 1000MS
Memory Limit: 65536KTotal Submissions: 9757
Accepted: 4109

Description

The Genographic Project is a research partnership between IBM and The National Geographic Society that is analyzing DNA from hundreds of thousands of contributors to map how the Earth was populated.

As an IBM researcher, you have been tasked with writing a program that will find commonalities amongst given snippets of DNA that can be correlated with individual survey information to identify new genetic markers.

A DNA base sequence is noted by listing the nitrogen bases in the order in which they are found in the molecule. There are four bases: adenine (A), thymine (T), guanine (G), and cytosine (C). A 6-base DNA sequence could be represented as TAGACC.

Given a set of DNA base sequences, determine the longest series of bases that occurs in all of the sequences.

Input

Input to this problem will begin with a line containing a single integer n indicating the number of datasets. Each dataset consists of the following components:
  • A single positive integer m (2 <= m <= 10) indicating the number of base sequences in this dataset.
  • m lines each containing a single base sequence consisting of 60 bases.

Output

For each dataset in the input, output the longest base subsequence common to all of the given base sequences. If the longest common subsequence is less than three bases in length, display the string "no significant commonalities" instead. If multiple subsequences of the same longest length exist, output only the subsequence that comes first in alphabetical order.

Sample Input

32GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA3GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATAGATACTAGATACTAGATACTAGATACTAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAAGATACCAGATACCAGATACCAGATACCAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA3CATCATCATCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCACATCATCATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACATCATCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

Sample Output

no significant commonalitiesAGATACCATCATCAT


Run IDUserProblemResultMemoryTimeLanguageCode LengthSubmit Time11781253chengtbf3080Accepted168K16MSC++2256B2013-07-14 14:27:58


代码:


#include<stdio.h>#include<string.h>char child_array[63];//为当前匹配的子串char ans[63];//为答案int len_child,len_ans;//分别为上两个串的长度char str[11][63];int next[63];int m;//串的个数void get_next()//求当前匹配子串的next数组{int pos=2;int cnd=0;memset(next,0,sizeof(next));next[0]=-1;next[1]=0;while (pos<=len_child){if (child_array[pos-1]==child_array[cnd]){cnd++;next[pos]=cnd;pos++;}else if(cnd>0){cnd=next[cnd];}else{next[pos]=0;pos++;}}}int kmp_search(int i){int pos=0;int j=0;while (pos+j<60){if (str[i][pos+j]==child_array[j]){if (j==len_child-1){return 1;}else{j++;}}else{pos=pos+j-next[j];if (next[j]>-1){j=next[j];}else{j=0;}}}return 0;}int main(){int i,j,start,end,k,flag;//flag变量用来标记是否存在当前长度的子串可以匹配int t;char temp;scanf("%d",&t);while (t--){scanf("%d",&m);for ( i = 1; i <=m ; i++){scanf("%c",&temp);scanf("%s",str[i]);}ans[0]='\0';//初始化答案数组for ( start = 0; start <=57 ; start++)//start为子串的左端起始位置{for ( len_child = 3; len_child+start <= 60; len_child++)//所求子串的长度从最小开始搜起{flag=0;//默认是当前长度是不匹配的end=len_child+start;i=0;for ( k = start,i=0; k < end; k++,i++){child_array[i]=str[1][k];}child_array[i]='\0';//至此,建立好了长度为len_child,起点从start开始的子串child_arrayget_next();//求当前匹配子串的next数组for ( i = 2; i <=m ; i++){if (kmp_search(i)==0){break;}}if (i==m+1)//说明和所有的母串都匹配{flag=1;//说明存在该长度的解len_ans=strlen(ans);if ( len_child> len_ans  ||  (  len_child==len_ans  &&  strcmp(child_array,ans)<0       ) )//第一种情况:如果更长,就替换掉,由于ans初始化长度为0,所以第一个满足条件的串一定可以//第二种情况:长度相同,如果字典序排在前面,也替换掉{strcpy(ans,child_array);len_ans=len_child;}}if (flag==0)//如果不存在该长度的解,那么也一定不存在比该长度还要长的解,因为子串是连续的{break;//这里可以看作是一种优化吧,不过看数据规模,优不优化都可以过}}}if (strlen(ans)==0){printf("no significant commonalities\n");}else{printf("%s\n",ans);} }return 0;}