Computing GC Content

来源:互联网 发布:魔兽数据库 npc 野兽 编辑:程序博客网 时间:2024/05/17 08:57

Computing GC Content

In Rosalind’s implementation, a string in FASTA format will be labeled by the ID “Rosalind_xxxx”, where “xxxx” denotes a four-digit code between 0000 and 9999.

Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.

Sample Dataset

>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT

Sample Output

Rosalind_0808
60.919540

代码

import osos.chdir("/home/owht/R/Rosalind")def calcuGC(list):    noCG = list.count("G") + list.count("C")    GCcon=float(noCG)/len(list)    return GCcon*100index = []seqlist = []longseq = ""file = open("rosalind_gc.txt")line = file.readlines()file.close()noline = 0for seq in line:    if ">" in seq:        index.append(seq)        seqlist.append(longseq.replace("\n",""))        longseq = ""        noline +=1    else:        longseq = longseq + seq.replace("\n","")        noline +=1    if noline ==  len(line):        seqlist.append(longseq.replace("\n",""))seqlist = seqlist[1:]result = []for longseq in seqlist:    result.append(calcuGC(longseq))SeqID = index[result.index(max(result))].replace(">","")SeqID = SeqID.replace("\n","")SeqGC = max(result)file = open("result.txt","w")file.write(SeqID)file.write("\r")file.write(str(SeqGC))file.close()
0 0
原创粉丝点击