生信脚本练习(12)求fasta文件各序列长度并统计作图

来源:互联网 发布:如何安装办公软件 编辑:程序博客网 时间:2024/06/05 11:16

题目要求是要从一个fasta文件中统计出每条序列的长度分布,并作图。

代码如下:

import osimport getpassimport matplotlib.pyplot as pltusr = getpass.getuser()os.chdir('c:/Users/' + usr + '/Desktop')seq_len = {}# 把fasta文件全部读取做成字典,键是带‘>’的那一行,值是序列def readfasta(filename):    fa = open(filename, 'r')    res = {}    ID = ''    for line in fa:        if line.startswith('>'):                       ID = line.strip('\n')            res[ID] = ''        else:                       res[ID] += line.strip('\n')    return resseq = readfasta('Test1.fa')# 做成另外一个字典字典,键是带‘>’的那一行,值是序列长度:for k,v in seq.items():    seq_len[k] = len(v)seq_len_sort = sorted(seq_len.items(), key=lambda x: x[1])seq_len_sort = dict(seq_len_sort)location = {}lenth_a = 0lenth_b = 500bar = 500# 通过循环的方式找到每个窗口区间内的长度分布,这里选的窗口是500# 这里的4500是排序之后根据最大值看出来的while lenth_b < 4500:    count = 0    ran = range(lenth_a,lenth_b)    for k,v in seq_len_sort.items():        if v in ran:            count += 1    location[(str(lenth_a)+ '-'+ str(lenth_b))] = count    lenth_a += bar    lenth_b += bar#print(location)f = open('t1.txt','w')f.write('lenth' + '\t' + 'number'+ '\n')lenth = []number = []for k,v in location.items():    f.write(str(k)+ '\t' + str(v) + '\n')    lenth.append(k)    number.append(v)f.close()plt.bar(range(len(lenth)),number)# 最后matplotlib画图#xlabels = [x[index] for index in lenth]#plt.xticks(number, lenth, rotation='vertical')plt.show()

然而我还不会更改x轴坐标值。因为我用的字符串的形式。。。
这里写图片描述

画图代码参照matplotlib官网的barh_demo改了一下:

import matplotlib.pyplot as pltplt.rcdefaults()fig, ax = plt.subplots()y_pos = np.arange(len(lenth))error = np.random.rand(len(lenth))ax.barh(y_pos, number, xerr=error, align='center',        color='lightblue', ecolor='black')ax.set_yticks(y_pos)ax.set_yticklabels(lenth)ax.invert_yaxis()  # labels read top-to-bottomax.set_xlabel('nummber')ax.set_title('lenth location')plt.show()

errorbar 怎么也去不掉???
这里写图片描述

下面是我用R画的

这里写图片描述

library(ggplot2)x <- c('0-500', '500-1000', '1000-1500', '1500-2000', '2000-2500', '2500-3000', '3000-3500', '3500-4000')y <- c( 4, 16, 10, 11, 3, 5, 0, 1)dt = data.frame(length= x, number = y)dt$length = factor(dt$length, levels=c('0-500', '500-1000', '1000-1500', '1500-2000', '2000-2500', '2500-3000', '3000-3500', '3500-4000'))ggplot(dt, aes(x = length, y = number, fill = length)) + geom_bar(stat = "identity") +theme_set(theme_bw())# 谁能告诉我为什么下面这样的写法就画不出来呢??library(ggplot2)setwd("c:\\Desktop") dt <- read.table("t1.txt",header = T) print(dt)dt$length =factor(dt$length, levels=c('0-500', '500-1000', '1000-1500', '1500-2000', '2000-2500', '2500-3000', '3000-3500', '3500-4000'))ggplot(dt, aes(length = length, number = number, fill = length)) + geom_bar(stat = "identity") +theme_set(theme_bw())# 报错:Error in `$<-.data.frame`(`*tmp*`, "length", value = structure(integer(0), .Label = c("0-500", : replacement has 0 rows, data has 8Traceback:1. `$<-`(`*tmp*`, "length", value = structure(integer(0), .Label = c("0-500",  . "500-1000", "1000-1500", "1500-2000", "2000-2500", "2500-3000",  . "3000-3500", "3500-4000"), class = "factor"))2. `$<-.data.frame`(`*tmp*`, "length", value = structure(integer(0), .Label = c("0-500",  . "500-1000", "1000-1500", "1500-2000", "2000-2500", "2500-3000",  . "3000-3500", "3500-4000"), class = "factor"))3. stop(sprintf(ngettext(N, "replacement has %d row, data has %d",  .     "replacement has %d rows, data has %d"), N, nrows), domain = NA)
阅读全文
1 0
原创粉丝点击