SAM数据格式学习2之FLAG理解

来源:互联网 发布:mac团灭 编辑:程序博客网 时间:2024/06/04 18:51

SAM格式:


1.FLAG说明:

 Each bit in the FLAG field is defined as:0x0001pthe read is paired in sequencing0x0002Pthe read is mapped in a proper pair0x0004uthe query sequence itself is unmapped0x0008Uthe mate is unmapped0x0010rstrand of the query (1 for reverse)0x0020Rstrand of the mate0x00401the read is the first read in a pair0x00802the read is the second read in a pair0x0100sthe alignment is not primary0x0200fthe read fails platform/vendor quality checks0x0400dthe read is either a PCR or an optical duplicate0x0800Sthe alignment is supplementarywhere the second column gives the string representation of the FLAG field. 

2.理解:0x为16进制位,每一个代表一个特定的意思



3.实例:

read1:

@chrUn_KN707963v1_decoy_19393_19870_2:0:0_0:0:0_0/1CCATTTGATTCCATTCCTTTGGATTCCATTCCATTGTATTGGATTGCATTGGATTCCATTCCATTCTATT+2222222222222222222222222222222222222222222222222222222222222222222222

read2:

@chrUn_KN707963v1_decoy_19393_19870_2:0:0_0:0:0_0/2TCAAAGGGAATAGAATCGAATGAAATAGAATCTAATGGAATGGAATGGAATGGAATGGAATGGAATGGAA+2222222222222222222222222222222222222222222222222222222222222222222222

匹配的SAM(部分):

@SQSN:HLA-DRB1*15:03:01:01LN:11567@SQSN:HLA-DRB1*15:03:01:02LN:11569@SQSN:HLA-DRB1*16:02:01LN:11005@PGID:bwaPN:bwaVN:0.7.13-r1126CL:bwa sampe ../hs38DH.fa hs38DHPE1L100F1.sai hs38DHPE1L100F2.sai hs38DHPE1L100F1.fq hs38DHPE1L100F2.fqchrUn_KN707963v1_decoy_19393_19870_2:0:0_0:0:0_099chrUn_KN707963v1_decoy193936070M=19801478CCATTTGATTCCATTCCTTTGGATTCCATTCCATTGTATTGGATTGCATTGGATTCCATTCCATTCTATT2222222222222222222222222222222222222222222222222222222222222222222222XT:A:UNM:i:2SM:i:37AM:i:37X0:i:1X1:i:0XM:i:2XO:i:0XG:i:0MD:Z:40C4C24chrUn_KN707963v1_decoy_19393_19870_2:0:0_0:0:0_0147chrUn_KN707963v1_decoy198016070M=19393-478TTCCATTCCATTCCATTCCATTCCATTCCATTCCATTAGATTCTATTTCATTCGATTCTATTCCCTTTGA2222222222222222222222222222222222222222222222222222222222222222222222XT:A:UNM:i:0SM:i:37AM:i:37X0:i:1X1:i:0XM:i:0XO:i:0XG:i:0MD:Z:70


其中99表示read1的FLAG,99=64+32+2+1

64表示the read is the first read in a pair

32表示strand of the mate

2表示the read is mapped in a proper pair

1表示<span style="font-size: 12px; font-family: Arial, Helvetica, sans-serif;">the read is paired in sequencing</span>
</pre><p></p><p>其中147表示read2的FLAG,147=128+16+2+1</p><p>128表示:<span style="font-size: 12px; background-color: rgb(240, 240, 240);">the read is the second read in a pair</span></p><p>16表示:<span style="font-size: 12px; background-color: rgb(240, 240, 240);">strand of the query (1 for reverse)</span><pre name="code" class="plain">表示查询序列是反的,
原来产生的序列为:
<pre name="code" class="plain">TCAAAGGGAATAGAATCGAATGAAATAGAATCTAATGGAATGGAATGGAATGGAATGGAATGGAATGGAA
匹配后的序列为:
<pre name="code" class="plain">TTCCATTCCATTCCATTCCATTCCATTCCATTCCATTAGATTCTATTTCATTCGATTCTATTCCCTTTGA
可以看出,两条序列是反向匹配的,TCAAAGGG匹配第二条后面开始的AGTTTCCC。。。

2和1同read1


参考:

【1】The sequence alignment/map format and SAMtools



0 0
原创粉丝点击