FASTQ 格式说明

来源:互联网 发布:扫描二维码抽奖软件 编辑:程序博客网 时间:2024/05/16 05:17

FASTQ是一种存储了生物序列(通常是核酸序列)以及相应的质量评价的文本格式。

目前几乎是高通量测序数据的标准格式。

FASTQ格式每四行描述一条测序序列信息:

第一行由'@'开始,后面跟着序列的ID信息,这点跟FASTA格式是一样的。

第二行是序列。

第三行由'+'开始,后面也可以跟着序列的描述信息。

第四行是第二行测序序列的质量评价,字符数跟第二行的序列是相等的,一一对应。



注:序列的ID信息,是这条序列的唯一标识,包含信息如下:

例1:@HWUSI-EAS100R:6:73:941:1973#0/1

HWUSI-EAS100Rthe unique instrument name6flowcell lane73tile number within the flowcell lane941'x'-coordinate of the cluster within the tile1973'y'-coordinate of the cluster within the tile#0index number for a multiplexed sample (0 for no indexing)/1the member of a pair, /1 or /2 (paired-end or mate-pair reads only)

例2:@EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG

EAS139the unique instrument name136the run idFC706VJthe flowcell id2flowcell lane2104tile number within the flowcell lane15343'x'-coordinate of the cluster within the tile197393'y'-coordinate of the cluster within the tile1the member of a pair, 1 or 2 (paired-end or mate-pair reads only)YY if the read fails filter (read is bad), N otherwise180 when none of the control bits are on, otherwise it is an even numberATCACGindex sequence

0 0
原创粉丝点击