GFF3 TO GTF
来源:互联网 发布:淘宝logo图片 编辑:程序博客网 时间:2024/05/02 17:15
GFF3 TO GTF
gff3格式是使用gmap软件得到的。
输入文件gff3的格式如下:
chr1A IWGSCv1.0_gmap gene 11740 12074 . + . ID=TRIAE_CS42_1AS_TGACv1_023354_AA0082670.1.path1;Name=TRIAE_CS42_1AS_TGACv1_023354_AA0082670.1 chr1A IWGSCv1.0_gmap mRNA 11740 12074 . + . ID=TRIAE_CS42_1AS_TGACv1_023354_AA0082670.1.mrna1;Name=TRIAE_CS42_1AS_TGACv1_023354_AA0082670.1;Parent=TRIAE_CS42_1AS_TGACv1_023354_AA0082670.1.path1;coverage=100.0;identity=100.0;matches=335;mismatches=0;indels=0;unknowns=0 chr1A IWGSCv1.0_gmap exon 11740 12074 100 + . ID=TRIAE_CS42_1AS_TGACv1_023354_AA0082670.1.mrna1.exon1;Name=TRIAE_CS42_1AS_TGACv1_023354_AA0082670.1;Parent=TRIAE_CS42_1AS_TGACv1_023354_AA0082670.1.mrna1;Target=TRIAE_CS42_1AS_TGACv1_023354_AA0082670.1 1 335 + chr1A IWGSCv1.0_gmap gene 22427 24851 . - . ID=TRIAE_CS42_1AS_TGACv1_024449_AA0082770.1.path1;Name=TRIAE_CS42_1AS_TGACv1_024449_AA0082770.1 chr1A IWGSCv1.0_gmap mRNA 22427 24851 . - . ID=TRIAE_CS42_1AS_TGACv1_024449_AA0082770.1.mrna1;Name=TRIAE_CS42_1AS_TGACv1_024449_AA0082770.1;Parent=TRIAE_CS42_1AS_TGACv1_024449_AA0082770.1.path1;coverage=100.0;identity=100.0;matches=2425;mismatches=0;indels=0;unknowns=0 chr1A IWGSCv1.0_gmap exon 22427 24851 100 - . ID=TRIAE_CS42_1AS_TGACv1_024449_AA0082770.1.mrna1.exon1;Name=TRIAE_CS42_1AS_TGACv1_024449_AA0082770.1;Parent=TRIAE_CS42_1AS_TGACv1_024449_AA0082770.1.mrna1;Target=TRIAE_CS42_1AS_TGACv1_024449_AA0082770.1 1 2425 + chr1A IWGSCv1.0_gmap gene 28794 39054 . + . ID=TRIAE_CS42_1AS_TGACv1_021338_AA0081570.1.path1;Name=TRIAE_CS42_1AS_TGACv1_021338_AA0081570.1 chr1A IWGSCv1.0_gmap mRNA 28794 39054 . + . ID=TRIAE_CS42_1AS_TGACv1_021338_AA0081570.1.mrna1;Name=TRIAE_CS42_1AS_TGACv1_021338_AA0081570.1;Parent=TRIAE_CS42_1AS_TGACv1_021338_AA0081570.1.path1;coverage=100.0;identity=100.0;matches=1624;mismatches=0;indels=0;unknowns=0 chr1A IWGSCv1.0_gmap exon 28794 28929 100 + . ID=TRIAE_CS42_1AS_TGACv1_021338_AA0081570.1.mrna1.exon1;Name=TRIAE_CS42_1AS_TGACv1_021338_AA0081570.1;Parent=TRIAE_CS42_1AS_TGACv1_021338_AA0081570.1.mrna1;Target=TRIAE_CS42_1AS_TGACv1_021338_AA0081570.1 1 136 + chr1A IWGSCv1.0_gmap exon 37567 39054 100 + . ID=TRIAE_CS42_1AS_TGACv1_021338_AA0081570.1.mrna1.exon2;Name=TRIAE_CS42_1AS_TGACv1_021338_AA0081570.1;Parent=TRIAE_CS42_1AS_TGACv1_021338_AA0081570.1.mrna1;Target=TRIAE_CS42_1AS_TGACv1_021338_AA0081570.1 137 1624 +
转换成gtf的格式,类似下边的结果:
chr1A IWGSCv1.0_gmap transcript 11740 12074 . + . transcript_id "TRIAE_CS42_1AS_TGACv1_023354_AA0082670.1"; gene_id "TRIAE_CS42_1AS_TGACv1_023354_AA0082670";chr1A IWGSCv1.0_gmap exon 11740 12074 100 + . transcript_id "TRIAE_CS42_1AS_TGACv1_023354_AA0082670.1"; gene_id "TRIAE_CS42_1AS_TGACv1_023354_AA0082670";exon_number 1;chr1A IWGSCv1.0_gmap transcript 22427 24851 . - . transcript_id "TRIAE_CS42_1AS_TGACv1_024449_AA0082770.1"; gene_id "TRIAE_CS42_1AS_TGACv1_024449_AA0082770";chr1A IWGSCv1.0_gmap exon 22427 24851 100 - . transcript_id "TRIAE_CS42_1AS_TGACv1_024449_AA0082770.1"; gene_id "TRIAE_CS42_1AS_TGACv1_024449_AA0082770";exon_number 1;chr1A IWGSCv1.0_gmap transcript 28794 39054 . + . transcript_id "TRIAE_CS42_1AS_TGACv1_021338_AA0081570.1"; gene_id "TRIAE_CS42_1AS_TGACv1_021338_AA0081570";chr1A IWGSCv1.0_gmap exon 28794 28929 100 + . transcript_id "TRIAE_CS42_1AS_TGACv1_021338_AA0081570.1"; gene_id "TRIAE_CS42_1AS_TGACv1_021338_AA0081570";exon_number 1;chr1A IWGSCv1.0_gmap exon 37567 39054 100 + . transcript_id "TRIAE_CS42_1AS_TGACv1_021338_AA0081570.1"; gene_id "TRIAE_CS42_1AS_TGACv1_021338_AA0081570";exon_number 2;chr1A IWGSCv1.0_gmap transcript 59624 60578 . - . transcript_id "TRIAE_CS42_1AS_TGACv1_021658_AA0082030.1"; gene_id "TRIAE_CS42_1AS_TGACv1_021658_AA0082030";chr1A IWGSCv1.0_gmap exon 59624 60578 99 - . transcript_id "TRIAE_CS42_1AS_TGACv1_021658_AA0082030.1"; gene_id "TRIAE_CS42_1AS_TGACv1_021658_AA0082030";exon_number 1;chr1A IWGSCv1.0_gmap transcript 86763 89148 . - . transcript_id "TRIAE_CS42_1AS_TGACv1_021895_AA0082240.1"; gene_id "TRIAE_CS42_1AS_TGACv1_021895_AA0082240";
转换的脚本如下:
#!/usr/bin/env python# -*- coding: utf-8 -*-__author__ = 'shengwei ma'__author_email__ = 'shengweima@icloud.com'with open('TGACv1.cdna.gff3', 'r') as f: for line in f: lin = line.strip().split('\t') if lin[2] == 'gene': print "%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\ttranscript_id \"%s\";gene_id \"%s\"" % \ (lin[0], lin[1], "transcript", lin[3], lin[4], lin[5], lin[6], lin[7], lin[8].split(';')[0][3:-6], lin[8].split(';')[1].split('.')[0][5:]) if lin[2] == 'exon': exon = lin[8].split(';')[0] exon1 = exon.split('exon')[-1] print "%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\ttranscript_id \"%s\";gene_id \"%s\";exon_number %s" % \ (lin[0], lin[1], lin[2], lin[3], lin[4], lin[5], lin[6], lin[7], lin[8].split(';')[0][3:-12], lin[8].split(';')[1].split('.')[0][5:], exon1)
0 0
- GFF3 TO GTF
- GTF
- How do I convert GFF file to a GTF file? 怎么把gff文件变成gtf文件
- 文件格式之gff3
- 创建gtf下载数据
- 创建gtf下载
- 从gtf到gff
- GTF - Great Teacher Friedman
- get rRNA.gtf
- python提取GFF3文件信息
- 王垠:GTF: Great Teacher Friedman
- 从UCSC获得gtf文件
- GTF基因注释文件详解
- GFF3文件按照染色体位置排序
- 从gff3中获取fasta序列
- 从gff3文件获取fasta序列
- 从gff3文件获取fasta序列(2)
- 从gff3文件获得fasta序列
- webpack学习笔记-----第一个webpack小例子
- java学习笔记————本质篇1
- Netty,HttpAsyncClient和阻塞I/O(Httpclient)比较
- Thrift入门初探--thrift安装及java入门实例
- 面向程序员的数据库访问性能优化法则
- GFF3 TO GTF
- Qt入门-文件读写
- ASP.NET ZERO 学习 —— (7) 开发手册之基础架构
- Lua 获取毫秒ms和微秒
- jmeter 入门操作
- Atom 安装包国内镜像
- kafak集群搭建与使用
- j2ee 中smb简单介绍使用
- Android Studio百度地图发布版SHA1的获取