Algorithm学习笔记 --- DNAsorting

来源：互联网发布：仿今日头条 php 编辑：程序博客网时间：2024/05/16 06:08

DNA Sorting

Time Limit: 1000MS Memory Limit: 10000KTotal Submissions: 79623 Accepted: 31994

Description

One measure of ``unsortedness'' in a sequence is the number of pairs of entries that are out of order with respect to each other. For instance, in the letter sequence ``DAABEC'', this measure is 5, since D is greater than four letters to its right and E is greater than one letter to its right. This measure is called the number of inversions in the sequence. The sequence ``AACEDGG'' has only one inversion (E and D)---it is nearly sorted---while the sequence ``ZWQM'' has 6 inversions (it is as unsorted as can be---exactly the reverse of sorted).

You are responsible for cataloguing a sequence of DNA strings (sequences containing only the four letters A, C, G, and T). However, you want to catalog them, not in alphabetical order, but rather in order of ``sortedness'', from ``most sorted'' to ``least sorted''. All the strings are of the same length.

Input

The first line contains two integers: a positive integer n (0 < n <= 50) giving the length of the strings; and a positive integer m (0 < m <= 100) giving the number of strings. These are followed by m lines, each containing a string of length n.

Output

Output the list of input strings, arranged from ``most sorted'' to ``least sorted''. Since two strings can be equally sorted, then output them according to the orginal order.

Sample Input

10 6AACATGAAGGTTTTGGCCAATTTGGCCAAAGATCAGATTTCCCGGGGGGAATCGATGCAT

Sample Output

CCCGGGGGGAAACATGAAGGGATCAGATTTATCGATGCATTTTTGGCCAATTTGGCCAAA

Source

East Central North America 1998

分析如下：

此题在解得时候我尝试用二位数组存储字符串，发现在处理时十分麻烦，后来选择结构体处理，并且在C++中可以用string这个类型

解法一：

#include <iostream>
#include <string>
#include <algorithm>
#include <cstdio>
using namespace std;
typedef struct
{
string str;
int count;
}DNA;
//进行字符串比较如果小于返回负数，如果相等返回0，如果大于返回正数

bool compare(DNA a,DNA b)
{
return a.count<b.count;
}
int main()
{
DNA a[100];
int m,n;
cin>>m>>n;
int i;
for(i=0;i<n;i++)
{
cin>>a[i].str;
a[i].count=0;
int k,j;
for(j=1;j<m;j++)
{
for(k=0;k<j;k++)
{
if(a[i].str[j]<a[i].str[k])
a[i].count++;
}
}

}

//STL中的排序函数
sort(a,a+n,compare);
for(i=0;i<n;i++)
{
cout<<a[i].str<<endl;
}
return 0;
}

解法二：

这个解法我是在网上看到网友发出来的，时间复杂度为O(n);比较快，此处借阅：

地址为：http://blog.csdn.net/china8848/article/details/2227131

什么是逆序数：

跟标准列相反序数的总和，比如说，标准列是1 2 3 4 5，那么 5 4 3 2 1 的逆序数算法：看第二个，4之前有一个5，在标准列中5在4的后面，所以记1个，类似的，第三个 3 之前有 4 5 都是在标准列中3的后面，所以记2个，同样的，2 之前有3个，1之前有4个

将这些数加起来就是逆序数=1+2+3+4=10。

再举一个 2 4 3 1 5 。4 之前有0个 3 之前有1个 1 之前有3个 5 之前有0个

所以逆序数就是1+3=4。

归并排序(MegerSort)求逆序数思想：

如果比较任意两个数字，那么时间复杂度是O(n^2),对于较大的n是无法接受的，MegerSort的时间复杂度是O(nlogn).

合并的时候一个指针指i向左边的元素l，有个指针j指向右边的元素r，当r小于l的时候，左边i到mid中的元素则与r构成逆序对。只用将逆序数总数加上mid-i+1就可以了。关于这个的应用参见上一篇解题报告。

对于本题，由于字符串中字符的范围是确定的，只有A C G T，所以可以在O(n)内求出逆序数方法是：至后向前插入字符，以下四个参数分别代表如果在左边加上A C G T，逆序数会加上多少，left_A left_C left_G left_T.首先初始化为0，然后至后向前插入字符，如果插入的是A：以后在前面插入C G T，逆序数都会增加，所以：left_C++;left_G++;left_T++;如果插入的是C：逆序数加上left_C,以后在前面插入G T，逆序数都会增加，所以：left_G++;left_T++;如果插入的是G：逆序数加上left_G,以后在前面插入T，逆序数都会增加，所以：left_T++;如果插入的是T：逆序数加上left_T,以后在前面插入字符，逆序数都不会增加。因此有代码：

for(i=length-1;i>=0;i--)

{

a=str[i];

switch(a)

{

case 'A':

left_C++;

left_G++;

left_T++;

break;

case 'C':

left_G++;

left_T++;

count+=left_C;

break;

case 'G':

left_T++;

count+=left_G;

break;

case 'T':

count+=left_T;

break;

}

return count;

0 0