POJ3080_Blue Jeans_KMP_求最长公共子序列
来源:互联网 发布:js手风琴效果代码 编辑:程序博客网 时间:2024/05/07 15:34
题意:
给n条DNA序列,每条长度为60,找到最长的公共的DNA子序列输出,若无,输出no significant commonalities
题解:
给的数据比较弱,最多10行序列,每行60个字符,我刚开始都想用纯暴力解这个题了,不过既然在学KMP,就还是练习一下吧。
我见网上的代码有用二分+后缀数组接的(都还没怎么学呢),表示太高深就没有看,这里用的是几乎暴力的方法
先枚举所有长度大于3的,第一行的子串(连续)
并依次用此子串跟后面的序列匹配,如果全部匹配成功就为一组解,然后在所有解中挑最长的,字典序排在前面的序列
题目:
Blue Jeans
Time Limit: 1000MS
Memory Limit: 65536KTotal Submissions: 9757
Accepted: 4109
Memory Limit: 65536KTotal Submissions: 9757
Accepted: 4109
Description
The Genographic Project is a research partnership between IBM and The National Geographic Society that is analyzing DNA from hundreds of thousands of contributors to map how the Earth was populated.
As an IBM researcher, you have been tasked with writing a program that will find commonalities amongst given snippets of DNA that can be correlated with individual survey information to identify new genetic markers.
A DNA base sequence is noted by listing the nitrogen bases in the order in which they are found in the molecule. There are four bases: adenine (A), thymine (T), guanine (G), and cytosine (C). A 6-base DNA sequence could be represented as TAGACC.
Given a set of DNA base sequences, determine the longest series of bases that occurs in all of the sequences.
As an IBM researcher, you have been tasked with writing a program that will find commonalities amongst given snippets of DNA that can be correlated with individual survey information to identify new genetic markers.
A DNA base sequence is noted by listing the nitrogen bases in the order in which they are found in the molecule. There are four bases: adenine (A), thymine (T), guanine (G), and cytosine (C). A 6-base DNA sequence could be represented as TAGACC.
Given a set of DNA base sequences, determine the longest series of bases that occurs in all of the sequences.
Input
Input to this problem will begin with a line containing a single integer n indicating the number of datasets. Each dataset consists of the following components:
- A single positive integer m (2 <= m <= 10) indicating the number of base sequences in this dataset.
- m lines each containing a single base sequence consisting of 60 bases.
Output
For each dataset in the input, output the longest base subsequence common to all of the given base sequences. If the longest common subsequence is less than three bases in length, display the string "no significant commonalities" instead. If multiple subsequences of the same longest length exist, output only the subsequence that comes first in alphabetical order.
Sample Input
32GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA3GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATAGATACTAGATACTAGATACTAGATACTAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAAGATACCAGATACCAGATACCAGATACCAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA3CATCATCATCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCACATCATCATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACATCATCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
Sample Output
no significant commonalitiesAGATACCATCATCAT
代码:
#include<stdio.h>#include<string.h>char child_array[63];//为当前匹配的子串char ans[63];//为答案int len_child,len_ans;//分别为上两个串的长度char str[11][63];int next[63];int m;//串的个数void get_next()//求当前匹配子串的next数组{int pos=2;int cnd=0;memset(next,0,sizeof(next));next[0]=-1;next[1]=0;while (pos<=len_child){if (child_array[pos-1]==child_array[cnd]){cnd++;next[pos]=cnd;pos++;}else if(cnd>0){cnd=next[cnd];}else{next[pos]=0;pos++;}}}int kmp_search(int i){int pos=0;int j=0;while (pos+j<60){if (str[i][pos+j]==child_array[j]){if (j==len_child-1){return 1;}else{j++;}}else{pos=pos+j-next[j];if (next[j]>-1){j=next[j];}else{j=0;}}}return 0;}int main(){int i,j,start,end,k,flag;//flag变量用来标记是否存在当前长度的子串可以匹配int t;char temp;scanf("%d",&t);while (t--){scanf("%d",&m);for ( i = 1; i <=m ; i++){scanf("%c",&temp);scanf("%s",str[i]);}ans[0]='\0';//初始化答案数组for ( start = 0; start <=57 ; start++)//start为子串的左端起始位置{for ( len_child = 3; len_child+start <= 60; len_child++)//所求子串的长度从最小开始搜起{flag=0;//默认是当前长度是不匹配的end=len_child+start;i=0;for ( k = start,i=0; k < end; k++,i++){child_array[i]=str[1][k];}child_array[i]='\0';//至此,建立好了长度为len_child,起点从start开始的子串child_arrayget_next();//求当前匹配子串的next数组for ( i = 2; i <=m ; i++){if (kmp_search(i)==0){break;}}if (i==m+1)//说明和所有的母串都匹配{flag=1;//说明存在该长度的解len_ans=strlen(ans);if ( len_child> len_ans || ( len_child==len_ans && strcmp(child_array,ans)<0 ) )//第一种情况:如果更长,就替换掉,由于ans初始化长度为0,所以第一个满足条件的串一定可以//第二种情况:长度相同,如果字典序排在前面,也替换掉{strcpy(ans,child_array);len_ans=len_child;}}if (flag==0)//如果不存在该长度的解,那么也一定不存在比该长度还要长的解,因为子串是连续的{break;//这里可以看作是一种优化吧,不过看数据规模,优不优化都可以过}}}if (strlen(ans)==0){printf("no significant commonalities\n");}else{printf("%s\n",ans);} }return 0;}
- POJ3080_Blue Jeans_KMP_求最长公共子序列
- 求最长公共子序列
- 求公共最长子序列
- 求最长公共子序列
- 求最长公共子序列
- 求最长公共子序列
- 求最长公共子序列
- 求最长公共子序列
- 求最长公共子序列
- 求公共最长子序列
- 求最长公共子序列
- 求最长公共子序列(LCS)
- 求两字符串最长公共子序列
- 笔试:求最长公共子序列
- 动态规划求最长公共子序列
- 【动态规划】求最长公共子序列
- 动态规划:求最长公共子序列
- 【基础算法】求最长公共子序列
- [C#基础教程]C# 泛型Dictionary 之二(Hashtable)使用说明
- iOS Dev (3) HelloWorld with codes
- poj2184 Cow Exhibition(带负数的背包问题)(关于背包问题的精髓可以去百度背包九讲)
- SVN中常用的更新与提交的区别
- Linux静态库和动态库区别及实例
- POJ3080_Blue Jeans_KMP_求最长公共子序列
- 成功为Android系统配上了GNU开发环境(有图、有视频、有真相)
- About:tabs原始设置
- C# Lock 解读
- VS2010使用存在的问题
- VC++6.0和office2007的冲突解决
- 《中国大历史》 黄仁宇 (博文中内附与该书无关的中国历史朝代表)
- 7月14---jsp
- EL表达式函数