基因数据处理69之bowtie安装与使用
来源:互联网 发布:js正则匹配网址 编辑:程序博客网 时间:2024/05/16 05:34
1.下载:
hadoop@Master:~/xubo/tools$ git clone https://github.com/BenLangmead/bowtie2.gitCloning into 'bowtie2'...remote: Counting objects: 7503, done.remote: Total 7503 (delta 0), reused 0 (delta 0), pack-reused 7503Receiving objects: 100% (7503/7503), 143.80 MiB | 403.00 KiB/s, done.Resolving deltas: 100% (4949/4949), done.Checking connectivity... done.
2.安装:
hadoop@Master:~/xubo/tools/bowtie2$ makeg++ -O3 -m64 -msse2 -funroll-loops -g3 -DCOMPILER_OPTIONS="\"-O3 -m64 -msse2 -funroll-loops -g3 -DPOPCNT_CAPABILITY\"" -DPOPCNT_CAPABILITY \ -fno-strict-aliasing -DBOWTIE2_VERSION="\"`cat VERSION`\"" -DBUILD_HOST="\"`hostname`\"" -DBUILD_TIME="\"`date`\"" -DCOMPILER_VERSION="\"`g++ -v 2>&1 | tail -1`\"" -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBOWTIE_MM -DBOWTIE2 -DNDEBUG -Wall \ -I third_party \ -o bowtie2-build-s bt2_build.cpp \ ccnt_lut.cpp ref_read.cpp alphabet.cpp shmem.cpp edit.cpp bt2_idx.cpp bt2_io.cpp bt2_util.cpp reference.cpp ds.cpp multikey_qsort.cpp limit.cpp random_source.cpp tinythread.cpp diff_sample.cpp bowtie_build_main.cpp \ -lpthread g++ -O3 -m64 -msse2 -funroll-loops -g3 -DCOMPILER_OPTIONS="\"-O3 -m64 -msse2 -funroll-loops -g3 -DPOPCNT_CAPABILITY\"" -DPOPCNT_CAPABILITY \ -fno-strict-aliasing -DBOWTIE2_VERSION="\"`cat VERSION`\"" -DBUILD_HOST="\"`hostname`\"" -DBUILD_TIME="\"`date`\"" -DCOMPILER_VERSION="\"`g++ -v 2>&1 | tail -1`\"" -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBOWTIE_MM -DBOWTIE2 -DBOWTIE_64BIT_INDEX -DNDEBUG -Wall \ -I third_party \ -o bowtie2-build-l bt2_build.cpp \ ccnt_lut.cpp ref_read.cpp alphabet.cpp shmem.cpp edit.cpp bt2_idx.cpp bt2_io.cpp bt2_util.cpp reference.cpp ds.cpp multikey_qsort.cpp limit.cpp random_source.cpp tinythread.cpp diff_sample.cpp bowtie_build_main.cpp \ -lpthread g++ -O3 -m64 -msse2 -funroll-loops -g3 -DCOMPILER_OPTIONS="\"-O3 -m64 -msse2 -funroll-loops -g3 -DPOPCNT_CAPABILITY\"" -DPOPCNT_CAPABILITY \ -fno-strict-aliasing -DBOWTIE2_VERSION="\"`cat VERSION`\"" -DBUILD_HOST="\"`hostname`\"" -DBUILD_TIME="\"`date`\"" -DCOMPILER_VERSION="\"`g++ -v 2>&1 | tail -1`\"" -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBOWTIE_MM -DBOWTIE2 -DNDEBUG -Wall \ -I third_party \ -o bowtie2-align-s bt2_search.cpp \ ccnt_lut.cpp ref_read.cpp alphabet.cpp shmem.cpp edit.cpp bt2_idx.cpp bt2_io.cpp bt2_util.cpp reference.cpp ds.cpp multikey_qsort.cpp limit.cpp random_source.cpp tinythread.cpp qual.cpp pat.cpp sam.cpp read_qseq.cpp aligner_seed_policy.cpp aligner_seed.cpp aligner_seed2.cpp aligner_sw.cpp aligner_sw_driver.cpp aligner_cache.cpp aligner_result.cpp ref_coord.cpp mask.cpp pe.cpp aln_sink.cpp dp_framer.cpp scoring.cpp presets.cpp unique.cpp simple_func.cpp random_util.cpp aligner_bt.cpp sse_util.cpp aligner_swsse.cpp outq.cpp aligner_swsse_loc_i16.cpp aligner_swsse_ee_i16.cpp aligner_swsse_loc_u8.cpp aligner_swsse_ee_u8.cpp aligner_driver.cpp bowtie_main.cpp \ -lpthread g++ -O3 -m64 -msse2 -funroll-loops -g3 -DCOMPILER_OPTIONS="\"-O3 -m64 -msse2 -funroll-loops -g3 -DPOPCNT_CAPABILITY\"" -DPOPCNT_CAPABILITY \ -fno-strict-aliasing -DBOWTIE2_VERSION="\"`cat VERSION`\"" -DBUILD_HOST="\"`hostname`\"" -DBUILD_TIME="\"`date`\"" -DCOMPILER_VERSION="\"`g++ -v 2>&1 | tail -1`\"" -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBOWTIE_MM -DBOWTIE2 -DBOWTIE_64BIT_INDEX -DNDEBUG -Wall \ -I third_party \ -o bowtie2-align-l bt2_search.cpp \ ccnt_lut.cpp ref_read.cpp alphabet.cpp shmem.cpp edit.cpp bt2_idx.cpp bt2_io.cpp bt2_util.cpp reference.cpp ds.cpp multikey_qsort.cpp limit.cpp random_source.cpp tinythread.cpp qual.cpp pat.cpp sam.cpp read_qseq.cpp aligner_seed_policy.cpp aligner_seed.cpp aligner_seed2.cpp aligner_sw.cpp aligner_sw_driver.cpp aligner_cache.cpp aligner_result.cpp ref_coord.cpp mask.cpp pe.cpp aln_sink.cpp dp_framer.cpp scoring.cpp presets.cpp unique.cpp simple_func.cpp random_util.cpp aligner_bt.cpp sse_util.cpp aligner_swsse.cpp outq.cpp aligner_swsse_loc_i16.cpp aligner_swsse_ee_i16.cpp aligner_swsse_loc_u8.cpp aligner_swsse_ee_u8.cpp aligner_driver.cpp bowtie_main.cpp \ -lpthread g++ -O3 -m64 -msse2 -funroll-loops -g3 \ -DCOMPILER_OPTIONS="\"-O3 -m64 -msse2 -funroll-loops -g3 -DPOPCNT_CAPABILITY\"" -DPOPCNT_CAPABILITY \ -fno-strict-aliasing -DBOWTIE2_VERSION="\"`cat VERSION`\"" -DBUILD_HOST="\"`hostname`\"" -DBUILD_TIME="\"`date`\"" -DCOMPILER_VERSION="\"`g++ -v 2>&1 | tail -1`\"" -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBOWTIE_MM -DBOWTIE2 -DBOWTIE_INSPECT_MAIN -Wall \ -I third_party -I . \ -o bowtie2-inspect-s bt2_inspect.cpp \ ccnt_lut.cpp ref_read.cpp alphabet.cpp shmem.cpp edit.cpp bt2_idx.cpp bt2_io.cpp bt2_util.cpp reference.cpp ds.cpp multikey_qsort.cpp limit.cpp random_source.cpp tinythread.cpp \ -lpthread g++ -O3 -m64 -msse2 -funroll-loops -g3 \ -DCOMPILER_OPTIONS="\"-O3 -m64 -msse2 -funroll-loops -g3 -DPOPCNT_CAPABILITY\"" -DPOPCNT_CAPABILITY \ -fno-strict-aliasing -DBOWTIE2_VERSION="\"`cat VERSION`\"" -DBUILD_HOST="\"`hostname`\"" -DBUILD_TIME="\"`date`\"" -DCOMPILER_VERSION="\"`g++ -v 2>&1 | tail -1`\"" -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBOWTIE_MM -DBOWTIE2 -DBOWTIE_INSPECT_MAIN -DBOWTIE_64BIT_INDEX -Wall \ -I third_party -I . \ -o bowtie2-inspect-l bt2_inspect.cpp \ ccnt_lut.cpp ref_read.cpp alphabet.cpp shmem.cpp edit.cpp bt2_idx.cpp bt2_io.cpp bt2_util.cpp reference.cpp ds.cpp multikey_qsort.cpp limit.cpp random_source.cpp tinythread.cpp \ -lpthread hadoop@Master:~/xubo/tools/bowtie2$ make installmkdir -p /usr/local/binfor file in bowtie2-build-s bowtie2-build-l bowtie2-align-s bowtie2-align-l bowtie2-inspect-s bowtie2-inspect-l bowtie2-inspect bowtie2-build bowtie2 ; do \ cp -f $file /usr/local/bin ; \ donecp: cannot create regular file ‘/usr/local/bin/bowtie2-build-s’: Permission deniedcp: cannot create regular file ‘/usr/local/bin/bowtie2-build-l’: Permission deniedcp: cannot create regular file ‘/usr/local/bin/bowtie2-align-s’: Permission deniedcp: cannot create regular file ‘/usr/local/bin/bowtie2-align-l’: Permission deniedcp: cannot create regular file ‘/usr/local/bin/bowtie2-inspect-s’: Permission deniedcp: cannot create regular file ‘/usr/local/bin/bowtie2-inspect-l’: Permission deniedcp: cannot create regular file ‘/usr/local/bin/bowtie2-inspect’: Permission deniedcp: cannot create regular file ‘/usr/local/bin/bowtie2-build’: Permission deniedcp: cannot create regular file ‘/usr/local/bin/bowtie2’: Permission deniedmake: *** [install] Error 1hadoop@Master:~/xubo/tools/bowtie2$ sudo make install[sudo] password for hadoop: mkdir -p /usr/local/binfor file in bowtie2-build-s bowtie2-build-l bowtie2-align-s bowtie2-align-l bowtie2-inspect-s bowtie2-inspect-l bowtie2-inspect bowtie2-build bowtie2 ; do \ cp -f $file /usr/local/bin ; \ done
3.使用:
(1)建立索引:
hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ bowtie2-build GRCH38BWAindex/GRCH38chr1L3556522.fasta bowtie2/indexSettings: Output files: "bowtie2/index.*.bt2" Line rate: 6 (line is 64 bytes) Lines per side: 1 (side is 64 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Max bucket size: default Max bucket size, sqrt multiplier: default Max bucket size, len divisor: 4 Difference-cover sample period: 1024 Endianness: little Actual local endianness: little Sanity checking: disabled Assertions: disabled Random seed: 0 Sizeofs: void*:8, int:4, long:8, size_t:8Input files DNA, FASTA: GRCH38BWAindex/GRCH38chr1L3556522.fastaBuilding a SMALL indexReading reference sizes Time reading reference sizes: 00:00:03Calculating joined lengthWriting headerReserving space for joined stringJoining reference sequences Time to join reference sequences: 00:00:03bmax according to bmaxDivN setting: 57620253Using parameters --bmax 43215190 --dcv 1024 Doing ahead-of-time memory usage test Passed! Constructing with these parameters: --bmax 43215190 --dcv 1024Constructing suffix-array element generatorBuilding DifferenceCoverSample Building sPrime Building sPrimeOrder V-Sorting samples V-Sorting samples time: 00:00:05 Allocating rank array Ranking v-sort output Ranking v-sort output time: 00:00:01 Invoking Larsson-Sadakane on ranks Invoking Larsson-Sadakane on ranks time: 00:00:02 Sanity-checking and returningBuilding samplesReserving space for 12 sample suffixesGenerating random suffixesQSorting 12 sample offsets, eliminating duplicatesQSorting sample offsets, eliminating duplicates time: 00:00:00Multikey QSorting 12 samples (Using difference cover) Multikey QSorting samples time: 00:00:00Calculating bucket sizesSplitting and merging Splitting and merging time: 00:00:00Avg bucket size: 3.29259e+07 (target: 43215189)Converting suffix-array elements to index imageAllocating ftab, absorbFtabEntering Ebwt loopGetting block 1 of 7 Reserving size (43215190) for bucket 1 Calculating Z arrays for bucket 1 Entering block accumulator loop for bucket 1: bucket 1: 10% bucket 1: 20% bucket 1: 30% bucket 1: 40% bucket 1: 50% bucket 1: 60% bucket 1: 70% bucket 1: 80% bucket 1: 90% bucket 1: 100% Sorting block of length 41559493 for bucket 1 (Using difference cover) Sorting block time: 00:00:16Returning block of 41559494 for bucket 1Getting block 2 of 7 Reserving size (43215190) for bucket 2 Calculating Z arrays for bucket 2 Entering block accumulator loop for bucket 2: bucket 2: 10% bucket 2: 20% bucket 2: 30% bucket 2: 40% bucket 2: 50% bucket 2: 60% bucket 2: 70% bucket 2: 80% bucket 2: 90% bucket 2: 100% Sorting block of length 36821901 for bucket 2 (Using difference cover) Sorting block time: 00:00:13Returning block of 36821902 for bucket 2Getting block 3 of 7 Reserving size (43215190) for bucket 3 Calculating Z arrays for bucket 3 Entering block accumulator loop for bucket 3: bucket 3: 10% bucket 3: 20% bucket 3: 30% bucket 3: 40% bucket 3: 50% bucket 3: 60% bucket 3: 70% bucket 3: 80% bucket 3: 90% bucket 3: 100% Sorting block of length 41919124 for bucket 3 (Using difference cover) Sorting block time: 00:00:16Returning block of 41919125 for bucket 3Getting block 4 of 7 Reserving size (43215190) for bucket 4 Calculating Z arrays for bucket 4 Entering block accumulator loop for bucket 4: bucket 4: 10% bucket 4: 20% bucket 4: 30% bucket 4: 40% bucket 4: 50% bucket 4: 60% bucket 4: 70% bucket 4: 80% bucket 4: 90% bucket 4: 100% Sorting block of length 24681605 for bucket 4 (Using difference cover) Sorting block time: 00:00:09Returning block of 24681606 for bucket 4Getting block 5 of 7 Reserving size (43215190) for bucket 5 Calculating Z arrays for bucket 5 Entering block accumulator loop for bucket 5: bucket 5: 10% bucket 5: 20% bucket 5: 30% bucket 5: 40% bucket 5: 50% bucket 5: 60% bucket 5: 70% bucket 5: 80% bucket 5: 90% bucket 5: 100% Sorting block of length 41918922 for bucket 5 (Using difference cover) Sorting block time: 00:00:15Returning block of 41918923 for bucket 5Getting block 6 of 7 Reserving size (43215190) for bucket 6 Calculating Z arrays for bucket 6 Entering block accumulator loop for bucket 6: bucket 6: 10% bucket 6: 20% bucket 6: 30% bucket 6: 40% bucket 6: 50% bucket 6: 60% bucket 6: 70% bucket 6: 80% bucket 6: 90% bucket 6: 100% Sorting block of length 8244387 for bucket 6 (Using difference cover) Sorting block time: 00:00:03Returning block of 8244388 for bucket 6Getting block 7 of 7 Reserving size (43215190) for bucket 7 Calculating Z arrays for bucket 7 Entering block accumulator loop for bucket 7: bucket 7: 10% bucket 7: 20% bucket 7: 30% bucket 7: 40% bucket 7: 50% bucket 7: 60% bucket 7: 70% bucket 7: 80% bucket 7: 90% bucket 7: 100% Sorting block of length 35335574 for bucket 7 (Using difference cover) Sorting block time: 00:00:13Returning block of 35335575 for bucket 7Exited Ebwt loopfchr[A]: 0fchr[C]: 67070277fchr[G]: 115125320fchr[T]: 163236848fchr[$]: 230481012Exiting Ebwt::buildToDisk()Returning from initFromVectorWrote 81023576 bytes to primary EBWT file: bowtie2/index.1.bt2Wrote 57620260 bytes to secondary EBWT file: bowtie2/index.2.bt2Re-opening _in1 and _in2 as input streamsReturning from Ebwt constructorHeaders: len: 230481012 bwtLen: 230481013 sz: 57620253 bwtSz: 57620254 lineRate: 6 offRate: 4 offMask: 0xfffffff0 ftabChars: 10 eftabLen: 20 eftabSz: 80 ftabLen: 1048577 ftabSz: 4194308 offsLen: 14405064 offsSz: 57620256 lineSz: 64 sideSz: 64 sideBwtSz: 48 sideBwtLen: 192 numSides: 1200422 numLines: 1200422 ebwtTotLen: 76827008 ebwtTotSz: 76827008 color: 0 reverse: 0Total time for call to driver() for forward index: 00:02:50Reading reference sizes Time reading reference sizes: 00:00:02Calculating joined lengthWriting headerReserving space for joined stringJoining reference sequences Time to join reference sequences: 00:00:02 Time to reverse reference sequence: 00:00:01bmax according to bmaxDivN setting: 57620253Using parameters --bmax 43215190 --dcv 1024 Doing ahead-of-time memory usage test Passed! Constructing with these parameters: --bmax 43215190 --dcv 1024Constructing suffix-array element generatorBuilding DifferenceCoverSample Building sPrime Building sPrimeOrder V-Sorting samples V-Sorting samples time: 00:00:05 Allocating rank array Ranking v-sort output Ranking v-sort output time: 00:00:01 Invoking Larsson-Sadakane on ranks Invoking Larsson-Sadakane on ranks time: 00:00:02 Sanity-checking and returningBuilding samplesReserving space for 12 sample suffixesGenerating random suffixesQSorting 12 sample offsets, eliminating duplicatesQSorting sample offsets, eliminating duplicates time: 00:00:00Multikey QSorting 12 samples (Using difference cover) Multikey QSorting samples time: 00:00:00Calculating bucket sizesSplitting and merging Splitting and merging time: 00:00:00Split 2, merged 6; iterating...Splitting and merging Splitting and merging time: 00:00:00Split 1, merged 1; iterating...Splitting and merging Splitting and merging time: 00:00:00Split 1, merged 1; iterating...Splitting and merging Splitting and merging time: 00:00:00Split 1, merged 1; iterating...Splitting and merging Splitting and merging time: 00:00:00Avg bucket size: 2.88101e+07 (target: 43215189)Converting suffix-array elements to index imageAllocating ftab, absorbFtabEntering Ebwt loopGetting block 1 of 8 Reserving size (43215190) for bucket 1 Calculating Z arrays for bucket 1 Entering block accumulator loop for bucket 1: bucket 1: 10% bucket 1: 20% bucket 1: 30% bucket 1: 40% bucket 1: 50% bucket 1: 60% bucket 1: 70% bucket 1: 80% bucket 1: 90% bucket 1: 100% Sorting block of length 28292839 for bucket 1 (Using difference cover) Sorting block time: 00:00:10Returning block of 28292840 for bucket 1Getting block 2 of 8 Reserving size (43215190) for bucket 2 Calculating Z arrays for bucket 2 Entering block accumulator loop for bucket 2: bucket 2: 10% bucket 2: 20% bucket 2: 30% bucket 2: 40% bucket 2: 50% bucket 2: 60% bucket 2: 70% bucket 2: 80% bucket 2: 90% bucket 2: 100% Sorting block of length 39729933 for bucket 2 (Using difference cover) Sorting block time: 00:00:15Returning block of 39729934 for bucket 2Getting block 3 of 8 Reserving size (43215190) for bucket 3 Calculating Z arrays for bucket 3 Entering block accumulator loop for bucket 3: bucket 3: 10% bucket 3: 20% bucket 3: 30% bucket 3: 40% bucket 3: 50% bucket 3: 60% bucket 3: 70% bucket 3: 80% bucket 3: 90% bucket 3: 100% Sorting block of length 14119472 for bucket 3 (Using difference cover) Sorting block time: 00:00:05Returning block of 14119473 for bucket 3Getting block 4 of 8 Reserving size (43215190) for bucket 4 Calculating Z arrays for bucket 4 Entering block accumulator loop for bucket 4: bucket 4: 10% bucket 4: 20% bucket 4: 30% bucket 4: 40% bucket 4: 50% bucket 4: 60% bucket 4: 70% bucket 4: 80% bucket 4: 90% bucket 4: 100% Sorting block of length 41568163 for bucket 4 (Using difference cover) Sorting block time: 00:00:15Returning block of 41568164 for bucket 4Getting block 5 of 8 Reserving size (43215190) for bucket 5 Calculating Z arrays for bucket 5 Entering block accumulator loop for bucket 5: bucket 5: 10% bucket 5: 20% bucket 5: 30% bucket 5: 40% bucket 5: 50% bucket 5: 60% bucket 5: 70% bucket 5: 80% bucket 5: 90% bucket 5: 100% Sorting block of length 37733383 for bucket 5 (Using difference cover) Sorting block time: 00:00:14Returning block of 37733384 for bucket 5Getting block 6 of 8 Reserving size (43215190) for bucket 6 Calculating Z arrays for bucket 6 Entering block accumulator loop for bucket 6: bucket 6: 10% bucket 6: 20% bucket 6: 30% bucket 6: 40% bucket 6: 50% bucket 6: 60% bucket 6: 70% bucket 6: 80% bucket 6: 90% bucket 6: 100% Sorting block of length 23633964 for bucket 6 (Using difference cover) Sorting block time: 00:00:09Returning block of 23633965 for bucket 6Getting block 7 of 8 Reserving size (43215190) for bucket 7 Calculating Z arrays for bucket 7 Entering block accumulator loop for bucket 7: bucket 7: 10% bucket 7: 20% bucket 7: 30% bucket 7: 40% bucket 7: 50% bucket 7: 60% bucket 7: 70% bucket 7: 80% bucket 7: 90% bucket 7: 100% Sorting block of length 34621654 for bucket 7 (Using difference cover) Sorting block time: 00:00:13Returning block of 34621655 for bucket 7Getting block 8 of 8 Reserving size (43215190) for bucket 8 Calculating Z arrays for bucket 8 Entering block accumulator loop for bucket 8: bucket 8: 10% bucket 8: 20% bucket 8: 30% bucket 8: 40% bucket 8: 50% bucket 8: 60% bucket 8: 70% bucket 8: 80% bucket 8: 90% bucket 8: 100% Sorting block of length 10781597 for bucket 8 (Using difference cover) Sorting block time: 00:00:03Returning block of 10781598 for bucket 8Exited Ebwt loopfchr[A]: 0fchr[C]: 67070277fchr[G]: 115125320fchr[T]: 163236848fchr[$]: 230481012Exiting Ebwt::buildToDisk()Returning from initFromVectorWrote 81023576 bytes to primary EBWT file: bowtie2/index.rev.1.bt2Wrote 57620260 bytes to secondary EBWT file: bowtie2/index.rev.2.bt2Re-opening _in1 and _in2 as input streamsReturning from Ebwt constructorHeaders: len: 230481012 bwtLen: 230481013 sz: 57620253 bwtSz: 57620254 lineRate: 6 offRate: 4 offMask: 0xfffffff0 ftabChars: 10 eftabLen: 20 eftabSz: 80 ftabLen: 1048577 ftabSz: 4194308 offsLen: 14405064 offsSz: 57620256 lineSz: 64 sideSz: 64 sideBwtSz: 48 sideBwtLen: 192 numSides: 1200422 numLines: 1200422 ebwtTotLen: 76827008 ebwtTotSz: 76827008 color: 0 reverse: 1Total time for backward call to driver() for mirror index: 00:03:36
(2)匹配:
hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ bowtie2 -x bowtie2/index -U g38L100c10000000Nhs20.fq -S bowtie2/g38L100c10000000Nhs20.bowtie2.sam9257464 reads; of these: 9257464 (100.00%) were unpaired; of these: 4401 (0.05%) aligned 0 times 7741394 (83.62%) aligned exactly 1 time 1511669 (16.33%) aligned >1 times99.95% overall alignment rate
居然没有统计时间。。。。
统计结果:
hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ samtools flagstat bowtie2/g38L100c10000000Nhs20.bowtie2.sam 9257464 + 0 in total (QC-passed reads + QC-failed reads)0 + 0 secondary0 + 0 supplementary0 + 0 duplicates9253063 + 0 mapped (99.95% : N/A)0 + 0 paired in sequencing0 + 0 read10 + 0 read20 + 0 properly paired (N/A : N/A)0 + 0 with itself and mate mapped0 + 0 singletons (N/A : N/A)0 + 0 with mate mapped to a different chr0 + 0 with mate mapped to a different chr (mapQ>=5)
参考:
【1】http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#getting-started-with-bowtie-2-lambda-phage-example
【2】http://blog.sciencenet.cn/blog-830496-750216.html
参考片段in 【1】:
Getting started with Bowtie 2: Lambda phage exampleBowtie 2 comes with some example files to get you started. The example files are not scientifically significant; we use the Lambda phage reference genome simply because it's short, and the reads were generated by a computer program, not a sequencer. However, these files will let you start running Bowtie 2 and downstream tools right away.First follow the manual instructions to obtain Bowtie 2. Set the BT2_HOME environment variable to point to the new Bowtie 2 directory containing the bowtie2, bowtie2-build and bowtie2-inspect binaries. This is important, as the BT2_HOME variable is used in the commands below to refer to that directory.Indexing a reference genomeTo create an index for the Lambda phage reference genome included with Bowtie 2, create a new temporary directory (it doesn't matter where), change into that directory, and run:$BT2_HOME/bowtie2-build $BT2_HOME/example/reference/lambda_virus.fa lambda_virusThe command should print many lines of output then quit. When the command completes, the current directory will contain four new files that all start with lambda_virus and end with .1.bt2, .2.bt2, .3.bt2, .4.bt2, .rev.1.bt2, and .rev.2.bt2. These files constitute the index - you're done!You can use bowtie2-build to create an index for a set of FASTA files obtained from any source, including sites such as UCSC, NCBI, and Ensembl. When indexing multiple FASTA files, specify all the files using commas to separate file names. For more details on how to create an index with bowtie2-build, see the manual section on index building. You may also want to bypass this process by obtaining a pre-built index. See using a pre-built index below for an example.Aligning example readsStay in the directory created in the previous step, which now contains the lambda_virus index files. Next, run:$BT2_HOME/bowtie2 -x lambda_virus -U $BT2_HOME/example/reads/reads_1.fq -S eg1.samThis runs the Bowtie 2 aligner, which aligns a set of unpaired reads to the Lambda phage reference genome using the index generated in the previous step. The alignment results in SAM format are written to the file eg1.sam, and a short alignment summary is written to the console. (Actually, the summary is written to the "standard error" or "stderr" filehandle, which is typically printed to the console.)To see the first few lines of the SAM output, run:head eg1.sam
时间比较:
hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem/bowtie2$ ./run.sh 9257464 reads; of these: 9257464 (100.00%) were unpaired; of these: 4401 (0.05%) aligned 0 times 7741394 (83.62%) aligned exactly 1 time 1511669 (16.33%) aligned >1 times99.95% overall alignment rate1=> RunTime:2480.267404 s********************************************9257464 reads; of these: 9257464 (100.00%) were unpaired; of these: 4401 (0.05%) aligned 0 times 7741394 (83.62%) aligned exactly 1 time 1511669 (16.33%) aligned >1 times99.95% overall alignment rate2=> RunTime:2405.534448 s********************************************9257464 reads; of these: 9257464 (100.00%) were unpaired; of these: 4401 (0.05%) aligned 0 times 7741394 (83.62%) aligned exactly 1 time 1511669 (16.33%) aligned >1 times99.95% overall alignment rate3=> RunTime:2396.183261 s********************************************
参考
【1】https://github.com/xubo245/AdamLearning【2】https://github.com/bigdatagenomics/adam/ 【3】https://github.com/xubo245/SparkLearning【4】http://spark.apache.org【5】http://stackoverflow.com/questions/28166667/how-to-pass-d-parameter-or-environment-variable-to-spark-job 【6】http://stackoverflow.com/questions/28840438/how-to-override-sparks-log4j-properties-per-driver
研究成果:
【1】 [BIBM] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Chao Wang, and Xuehai Zhou, "Distributed Gene Clinical Decision Support System Based on Cloud Computing", in IEEE International Conference on Bioinformatics and Biomedicine. (BIBM 2017, CCF B)【2】 [IEEE CLOUD] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Xuehai Zhou. Efficient Distributed Smith-Waterman Algorithm Based on Apache Spark (CLOUD 2017, CCF-C).【3】 [CCGrid] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Jinhong Zhou, Xuehai Zhou. DSA: Scalable Distributed Sequence Alignment System Using SIMD Instructions. (CCGrid 2017, CCF-C).【4】more: https://github.com/xubo245/Publications
Help
If you have any questions or suggestions, please write it in the issue of this project or send an e-mail to me: xubo245@mail.ustc.edu.cnWechat: xu601450868QQ: 601450868
阅读全文
0 0
- 基因数据处理69之bowtie安装与使用
- 基因数据处理3之bwakit安装和使用
- 基因数据处理26之bcftools安装和使用
- 基因数据处理40之bedtools的安装和使用
- 基因数据处理18之基因序列生成工具wgsim安装和使用
- 基因数据处理25之avocado安装
- 基因数据处理36之qc-metrics安装
- 基因数据处理39之mango安装记录
- 基因数据处理72之GATK安装成功
- 基因数据处理41之mango使用失败
- 基因数据处理48之ART使用实例
- 基因数据处理27之FastQC在linux下安装运行
- 基因数据处理44之cloud-scale-bwamem安装
- 基因数据处理70之Picard安装没成功
- 基因数据处理70之Picard安装没成功
- 基因数据处理1之mapping_to_cram
- 基因数据处理45之cloud-scale-bwamem安装(compile.pl安装有问题)
- 基因数据处理46之cloud-scale-bwamem安装(compile.pl安装没问题)
- 基因数据处理65之bwa处理500bp和1000bp的记录
- 基因数据处理66之avocado集群运行
- 基因数据处理67之bwa建立索引时间
- 基因数据处理68之avocado的配置文件默认无法从hdfs读取
- Scrapy_part1_初识SCRAPY和安装
- 基因数据处理69之bowtie安装与使用
- ll -bash: ls: command not found
- 02爬虫---浏览器的模拟Headers属性
- 基因数据处理70之Picard安装没成功
- 基因数据处理71之GRCH38 的chr14提取
- LeetCode28. Implement strStr()
- 关于支持向量机(SVM)
- Sublime Text 无法使用Package Control或插件安装失败的解决方法
- 基因数据处理72之GATK安装成功