生信脚本练习(9)合并文件 ②

来源:互联网 发布:linux alias 参数 编辑:程序博客网 时间:2024/06/14 01:50

这个练习也是合并文件,但是只是合并两个文件

这是样本信息:MegaID  Chr Star    Ref Alt Depth170602C304  1   861302  G   A   635170602C304  1   906303  G   T   290170602C304  1   985841  C   T   56170602C304  1   985866  A   G   376170602C304  1   1120344 C   T   439170602C304  2   21255445    G   A   1009以下是带注释的信息,要求把注释补充到样本信息文件的后面Chr Star    End Ref Alt Cosmic1   69345   69345   C   A   ID=COSM911918;OCCURENCE=1(endometrium)1   69523   69523   G   T   ID=COSM426644;OCCURENCE=1(breast)1   69538   69538   G   A   ID=COSM75742;OCCURENCE=1(ovary)1   69539   69539   T   C   ID=COSM1343690;OCCURENCE=1(large_intestine)1   69540   69540   G   T   ID=COSM1560546;OCCURENCE=1(large_intestine)1   69569   69569   T   C   ID=COSM1599955;OCCURENCE=2(central_nervous_system)1   69591   69591   C   T   ID=COSM3419425;OCCURENCE=1(large_intestine)1   565326  565326  G   A   ID=COSN228104;OCCURENCE=1(skin)最后变成这样子:MegaID  Chr Star    Ref Alt Depth   Cosmic170602C304  1   861302  G   A   635 ID=COSN213381;OCCURENCE=1(breast)170602C304  1   906303  G   T   290 -170602C304  1   985841  C   T   56  ID=COSM3401108;OCCURENCE=1(central_nervous_system)170602C304  1   985866  A   G   376 ID=COSM3978005;OCCURENCE=1(lung)170602C304  1   1120344 C   T   439 ID=COSM234136;OCCURENCE=1(skin)170602C304  2   21255445    G   A   1009    -

解法如下:
先把样本信息做成字典,值全是“-” ,然后能匹配的键就把“-”换掉,是不是很聪明^_^

dictt = {}array = []a = []f = open('c:/Test4_sample',"r")try:    #while True:    line = f.readlines()    array = line[1:]    for i in array:        a.append(i.strip().split("\t"))finally:    f.close()a = a[1:]#print(a)for arr in array:    dictt[(arr)] = "-"#print(dictt)dic = {}array = []b = []f = open('c:/Test4_database',"r")try:    line = f.readlines()    array = line[:]finally:    f.close()for i in array:    b.append(i.strip().split("\t"))b = b[1:]#print(b)for array in b:    dic[array[2]+"\t"+array[3]+"\t"+array[4]] = array[5]#print(dic)for k,v in dic.items():    for key,value in dictt.items():        if k in key:            dictt[key] = v   # 精髓是用ID数值代替掉后面的“-”print(dictt)with open('t4.txt',"w")as f:    f.write("MegaID\tChr\tStar\tRef\tAlt\tDepth\tCosmic\n")    for k,v in dictt.items():        f.write(str(k.strip())+ '\t' +str(v.strip())+"\n")
原创粉丝点击