基因数据处理63之snap修改默认设置后处理大于400bp的记录

来源:互联网 发布:淘宝网怎么找人工客服 编辑:程序博客网 时间:2024/06/05 03:02

通过修改Read.h中的400=》4000,之后可以运行,但是匹配的命中率好低。但是bwamen很不错,下一篇有记录。

xubo@xubo:~/xubo/data/alignment/cs-bwamem$ snap-aligner single snapindex/ g38l500N10000.fq -o g38l500N10000.snap1.samWelcome to SNAP version 1.0beta.23.Loading index from directory... 32s.  248957422 bases, seed size 20Aligning.Total Reads    Aligned, MAPQ >= 10    Aligned, MAPQ < 10     Unaligned              Too Short/Too Many Ns     Reads/s   Time in Aligner (s)10,000         8,933 (89.33%)         98 (0.98%)             969 (9.69%)            0 (0.00%)                 4,068     2xubo@xubo:~/xubo/data/alignment/cs-bwamem$ snap-aligner single snapindex/ g38l1000N10000.fq -o g38l1000N10000.snap1.samWelcome to SNAP version 1.0beta.23.Loading index from directory... 33s.  248957422 bases, seed size 20Aligning.Total Reads    Aligned, MAPQ >= 10    Aligned, MAPQ < 10     Unaligned              Too Short/Too Many Ns     Reads/s   Time in Aligner (s)10,000         796 (7.96%)            8 (0.08%)              9,196 (91.96%)         0 (0.00%)                 2,608     4xubo@xubo:~/xubo/data/alignment/cs-bwamem$ snap-aligner snapindex/ g38l500N100000.fq -o g38l500N100000.snap.samWelcome to SNAP version 1.0beta.23.Invalid command: snapindex/Usage: snap-aligner <command> [<options>]Commands:   index    build a genome index   single   align single-end reads   paired   align paired-end reads   daemon   run in daemon mode--accept commands remotelyType a command without arguments to see its help.xubo@xubo:~/xubo/data/alignment/cs-bwamem$ snap-aligner single  snapindex/ g38l500N100000.fq -o g38l500N100000.snap.samWelcome to SNAP version 1.0beta.23.Loading index from directory... 34s.  248957422 bases, seed size 20Aligning.Total Reads    Aligned, MAPQ >= 10    Aligned, MAPQ < 10     Unaligned              Too Short/Too Many Ns     Reads/s   Time in Aligner (s)100,000        88,891 (88.89%)        1,083 (1.08%)          10,026 (10.03%)        0 (0.00%)                 4,200     24xubo@xubo:~/xubo/data/alignment/cs-bwamem$ snap-aligner single snapindex/ g38l1000N100000.fq -o g38l1000N100000.snap1.samWelcome to SNAP version 1.0beta.23.Loading index from directory... 33s.  248957422 bases, seed size 20Aligning.Total Reads    Aligned, MAPQ >= 10    Aligned, MAPQ < 10     Unaligned              Too Short/Too Many Ns     Reads/s   Time in Aligner (s)100,000        7,786 (7.79%)          67 (0.07%)             92,145 (92.14%)        2 (0.00%)                 2,390     42xubo@xubo:~/xubo/data/alignment/cs-bwamem$ snap-aligner single snapindex/ g38l1000N1000000.fq -o g38l1000N1000000.snap1.samWelcome to SNAP version 1.0beta.23.Loading index from directory... 32s.  248957422 bases, seed size 20Aligning.Total Reads    Aligned, MAPQ >= 10    Aligned, MAPQ < 10     Unaligned              Too Short/Too Many Ns     Reads/s   Time in Aligner (s)1,000,000      78,762 (7.88%)         602 (0.06%)            920,610 (92.06%)       26 (0.00%)                2,420     413

统计信息:

xubo@xubo:~/xubo/data/alignment/cs-bwamem$ samtools flagstat g38l500N10000.snap1.sam 10000 + 0 in total (QC-passed reads + QC-failed reads)0 + 0 secondary0 + 0 supplementary0 + 0 duplicates9031 + 0 mapped (90.31% : N/A)0 + 0 paired in sequencing0 + 0 read10 + 0 read20 + 0 properly paired (N/A : N/A)0 + 0 with itself and mate mapped0 + 0 singletons (N/A : N/A)0 + 0 with mate mapped to a different chr0 + 0 with mate mapped to a different chr (mapQ>=5)xubo@xubo:~/xubo/data/alignment/cs-bwamem$ samtools flagstat g38l500N100000.snap1.sam [E::hts_open_format] fail to open file 'g38l500N100000.snap1.sam'samtools flagstat: Cannot open input file "g38l500N100000.snap1.sam": No such file or directoryxubo@xubo:~/xubo/data/alignment/cs-bwamem$ samtools flagstat g38l500N100000.snap.sam 100000 + 0 in total (QC-passed reads + QC-failed reads)0 + 0 secondary0 + 0 supplementary0 + 0 duplicates89974 + 0 mapped (89.97% : N/A)0 + 0 paired in sequencing0 + 0 read10 + 0 read20 + 0 properly paired (N/A : N/A)0 + 0 with itself and mate mapped0 + 0 singletons (N/A : N/A)0 + 0 with mate mapped to a different chr0 + 0 with mate mapped to a different chr (mapQ>=5)xubo@xubo:~/xubo/data/alignment/cs-bwamem$ samtools flagstat g38l1000N10000.snap1.sam 10000 + 0 in total (QC-passed reads + QC-failed reads)0 + 0 secondary0 + 0 supplementary0 + 0 duplicates804 + 0 mapped (8.04% : N/A)0 + 0 paired in sequencing0 + 0 read10 + 0 read20 + 0 properly paired (N/A : N/A)0 + 0 with itself and mate mapped0 + 0 singletons (N/A : N/A)0 + 0 with mate mapped to a different chr0 + 0 with mate mapped to a different chr (mapQ>=5)xubo@xubo:~/xubo/data/alignment/cs-bwamem$ samtools flagstat g38l1000N100000.snap1.sam 100000 + 0 in total (QC-passed reads + QC-failed reads)0 + 0 secondary0 + 0 supplementary0 + 0 duplicates7853 + 0 mapped (7.85% : N/A)0 + 0 paired in sequencing0 + 0 read10 + 0 read20 + 0 properly paired (N/A : N/A)0 + 0 with itself and mate mapped0 + 0 singletons (N/A : N/A)0 + 0 with mate mapped to a different chr0 + 0 with mate mapped to a different chr (mapQ>=5)xubo@xubo:~/xubo/data/alignment/cs-bwamem$ samtools flagstat g38l1000N1000000.snap1.sam 1000000 + 0 in total (QC-passed reads + QC-failed reads)0 + 0 secondary0 + 0 supplementary0 + 0 duplicates79364 + 0 mapped (7.94% : N/A)0 + 0 paired in sequencing0 + 0 read10 + 0 read20 + 0 properly paired (N/A : N/A)0 + 0 with itself and mate mapped0 + 0 singletons (N/A : N/A)0 + 0 with mate mapped to a different chr0 + 0 with mate mapped to a different chr (mapQ>=5)xubo@xubo:~/xubo/data/alignment/cs-bwamem$ 

参考

【1】https://github.com/xubo245/AdamLearning【2】https://github.com/bigdatagenomics/adam/ 【3】https://github.com/xubo245/SparkLearning【4】http://spark.apache.org【5】http://stackoverflow.com/questions/28166667/how-to-pass-d-parameter-or-environment-variable-to-spark-job  【6】http://stackoverflow.com/questions/28840438/how-to-override-sparks-log4j-properties-per-driver

研究成果:

【1】 [BIBM] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Chao Wang, and Xuehai Zhou, "Distributed Gene Clinical Decision Support System Based on Cloud Computing", in IEEE International Conference on Bioinformatics and Biomedicine. (BIBM 2017, CCF B)【2】 [IEEE CLOUD] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Xuehai Zhou. Efficient Distributed Smith-Waterman Algorithm Based on Apache Spark (CLOUD 2017, CCF-C).【3】 [CCGrid] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Jinhong Zhou, Xuehai Zhou. DSA: Scalable Distributed Sequence Alignment System Using SIMD Instructions. (CCGrid 2017, CCF-C).【4】more: https://github.com/xubo245/Publications

Help

If you have any questions or suggestions, please write it in the issue of this project or send an e-mail to me: xubo245@mail.ustc.edu.cnWechat: xu601450868QQ: 601450868
阅读全文
0 0
原创粉丝点击