10月11号Python生物信息学数据管理

来源：互联网发布：7位铁通卡用什么网络编辑：程序博客网时间：2024/06/05 08:02

第六章笔记

1，列表中删除元素：

pop()函数会返回删除的该元素的值，默认删除最后一个元素，想删除指定位置，则在括号内加入数字，例如：data.pop(5)。

内置函数del data[i] 也可以

想删除定值用remove方法，data.remove(5)，删除列表中的5。

以上函数都会永久修改原始列表。

2，字典里删除元素：

pop()可以，d.pop()括号必须是键。

del()也可以，del d['a'] ，这里'a'是键。

3，删除文本中特定行

in_file = open('text.txt')out_file = open('new.txt', 'w')index = 0indices_to_remove = [1, 2, 5, 6]for line in in_file:    index = index + 1    if index not in indices_to_remove:        out_file.write(line)in_file.close()out_file.close()

4，保持顺序删除重复

input_file = open('UniprotID.txt')output_file = open('UniprotID-unique.txt','w')unique = []for line in input_file:    if line not in unique:        output_file.write(line)        unique.append(line)input_file.close()output_file.close()

5，集合：唯一对象的无序组合。不支持索引和切片。

创建集合：

>>> s = set([1, 2, 3])>>> s{1, 2, 3}

>>> s.add(4)

>>> s{1, 2, 3, 4}>>> s.add(4)>>> s{1, 2, 3, 4}

集合方法：

update()是将几个元素添加到集合，例：s1.update(['a','b'])

s1.union(s2)指s1和s2的并集；

s1.intersection(s2)指s1和s2的交集；

s1.symmetric_difference(s2)指只在s1或s2的元素的合集，称为对称差；

s1.difference(s2)指只在s1不在s2的元素的合集；

自测题：

6.1

fasta_file = open('SwissProt.fasta','r')out_file = open('SwissProt1.fasta','w')seq = ''for line in fasta_file:    if line[0] == '>' and seq == '':        header = line    elif line[0] != '>':        seq = seq + line.strip()    elif line[0] == '>' and seq != '':        print(seq)        print(header)        if seq[0] == 'M':            field = header.split('|')            out_file.write(field[1] + ' ')        seq = ''        header = lineout_file.close()

6.2

input_file = open('SwissProt.fasta','r')output_file = open('习题6.1.txt','w')count_line = 0for line in input_file:    count_line += 1    if count_line % 2 ==0:        output_file.write(line)output_file.close()

6.3

input_file1 = open('1.txt','r')input_file2 = open('2.txt','r')f1 = input_file1.readlines()f2 = input_file2.readlines()data1 = []data2 = []for line in f1:    data1.append(line.strip())for line in f2:    data2.append(line.strip())only_1data = []only_2data = []common_data = []common_ = 0only_1 = 0only_2 = 0for fig in data1:    if fig in data2:        common_ += 1        common_data.append(fig)            else:        only_1 += 1        only_1data.append(fig)for fig in data2:    if fig in data2 and fig not in data1:        only_2 += 1        only_2data.append(fig)print('共有的行：',common_,common_data,)print('只在1中的行：',only_1,only_1data)print('只在2中的行：',only_2,only_2data)

参考：使用dict和set - 廖雪峰的官方网站 https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/00143167793538255adf33371774853a0ef943280573f4d000

阅读全文

0 0