软件使用总结

来源:互联网 发布:淘宝网毛线手套 编辑:程序博客网 时间:2024/06/05 22:35
软件一:MITObim - 线粒体诱饵和迭代映射
VERSIONS :1.9(稳定 - 依赖于MIRA 4.0.2)
我提供了进一步的例子(https://github.com/chrishah/MITObim/tree/master/examples)
必备条件
-------------
- GNU工具
- Perl
- MIRA的运行版本
MIRA 4.0.2     http://sourceforge.net/projects/mira-assembler/files/MIRA/stable/
** MIRA的预编译**二进制文件可用于Linux和OSX   http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html“MIRA的最终指南”

MITObim程序(线粒体诱饵和迭代映射)
代表了直接从总基因组DNA衍生的NGS读取中组装非模式生物的新线粒体基因组的高效方法。
脚本正在执行三个步骤并迭代地重复:
(i)从先前的映射组合中导出参考序列,
(ii)使用新导出的参考进行的电脑诱饵
(iii)先前捕获的读取被映射到新导出的引用,导致扩展的参考序列
需要将包含MIRA可执行文件的目录放在PATH中才能成功 使用MITObim.pl
如果您不能或不会这样做,您还可以通过--mirapath选项告诉MITObim在哪里找到正确的MIRA二进制文件
- 从Github下载MITObim包装器脚本和testdata,例如 将整个MITObim存储库下载到zip存档(使用Github页面上的按钮)或在命令行上使用git(`git clone --recursive git:// github.com / chrishah / MITObim.git`)

在ubuntu上安装docker应该是一样简单:
```bash
sudo apt-get install docker.io
```
然后,您可以在计算机上指定一个将与映像中的/ home / data目录同步的工作目录,并输入自包含的shell环境以运行MITObim:
```bash
WORKING_DIR=/您/所需/工作/目录
sudo docker run -i -t -v $ WORKING_DIR /:/ home / data chrishah / mitobim / bin / bash


if you found MITObim useful, please cite:
Hahn C, Bachmann L and Chevreux B. (2013) Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads -
a baiting and iterative mapping approach. Nucl. Acids Res. 41(13):e129. doi: 10.1093/nar/gkt371

*************************************************************************************************************************************************************************
usage: ./MITObim.pl <parameters>
   
parameters:
  -start <int>  iteration to start with (default=0, when using '-quick' reference)
  -end <int>  iteration to end with (default=startiteration, i.e. if not specified otherwise stop after 1 iteration)
  -sample <string> sampleID (please don't use '.' in the sampleID). If resuming, the sampleID needs to be identical to that of the previous iteration / MIRA assembly.
  -ref <string>  referenceID. If resuming, use the same as in previous iteration/initial MIRA assembly.
  -readpool <FILE> readpool in fastq format (*.gz is also allowed). read pairs need to be interleaved for full functionality of the '-pair' option below.
                -quick <FILE>           reference sequence to be used as bait in fasta format
                -maf <FILE>             extracts reference from maf file created by previous MITObim iteration/MIRA assembly (resume)
  
optional:
  --kbait <int>  set kmer for baiting stringency (default: 31)
  --platform  specify sequencing platform (default: 'solexa'; other options: 'iontor', '454', 'pacbio')
  --denovo  runs MIRA in denovo mode
  --pair   extend readpool to contain full read pairs, even if only one member was baited (relies on /1 and /2 header convention for read pairs) (default: no).
  --verbose  show detailed output of MIRA modules (default: no)
  --split   split reference at positions with more than 5N (default: no)
  --help   shows this helpful information
  --clean                 retain only the last 2 iteration directories (default: no)
  --trimreads  trim data (default: no; we recommend to trim beforehand and feed MITObim with pre trimmed data)
  --trimoverhang  trim overhang up- and downstream of reference, i.e. don't extend the bait, just re-assemble (default: no)
  --mismatch <int> number of allowed mismatches in mapping - only for illumina data (default: 15% of avg. read length)
  --min_cov <int>  minimum average coverage of contigs to be retained (default: 0 - off)
  --min_len <int>  minimum length of contig to be retained as backbone (default: 0 - off)
  --mirapath <string>     full path to MIRA binaries (only needed if MIRA is not in PATH)
  --redirect_tmp  redirect temporary output to this location (useful in case you are running MITObim on an NFS mount)
  --NFS_warn_only  allow MIRA to run on NFS mount without aborting -  warn only (expert option - see MIRA documentation 'check_nfs')
  --version  display MITObim version
  
examples:
  ./MITObim.pl -start 1 -end 5 -sample StrainX -ref reference-mt -readpool illumina_readpool.fastq -maf initial_assembly.maf
  ./MITObim.pl -end 10 -quick reference.fasta -sample StrainY -ref reference-mt -readpool illumina_readpool.fastq


TUTORIAL I: reconstruction of a mitochondrial genome using a two step procedure
a. Initial mapping assembly using MIRA:(初始组装基因)
 -bash-4.1$ mkdir tutorial1
 -bash-4.1$ cd tutorial1
 -bash-4.1$ cp /PATH/TO/testdata1/Tthymallus-150bp-300sd50-interleaved.fastq initial-mapping-testpool-to-Salpinus-mt_in.solexa.fastq
 -bash-4.1$ cp /PATH/TO/testdata1/Salpinus-mt-genome-NC_000861.fasta initial-mapping-testpool-to-Salpinus-mt_backbone_in.fasta
 -bash-4.1$ ln -s /PATH/TO/testdata1/Tthymallus-150bp-300sd50-interleaved.fastq reads.fastq
 -bash-4.1$ ln -s /PATH/TO/testdata1/Salpinus-mt-genome-NC_000861.fasta reference.fa
 -bash-4.1$ echo -e "\n#manifest file for basic mapping assembly with illumina data using MIRA 4\n\nproject = initial-mapping-testpool-to-Salpinus-mt\n\njob=genome,mapping,accurate\n\nparameters = -NW:mrnl=0 -AS:nop=1 SOLEXA_SETTINGS -CO:msr=no\n\nreadgroup\nis_reference\ndata = reference.fa\nstrain = Salpinus-mt-genome\n\nreadgroup = reads\ndata = reads.fastq\ntechnology = solexa\nstrain = testpool\n" > manifest.conf
 -bash-4.1$ head -n 20 manifest.conf
 -bash-4.1$ mira manifest.conf
          运行结果在a.txt中
 -bash-4.1$ ls -hlrt
 -bash-4.1$ ls -hlrt initial-mapping-testpool-to-Salpinus-mt_assembly/
          The newly constructed reference(新的参考序列位置) is contained in the file `initial-mapping-testpool-to-Salpinus-mt_out.maf` in the `initial-mapping-testpool-to-Salpinus-mt_d_results` directory.
b. Baiting and iterative mapping using the MITObim.pl script:
 -bash-4.1$ /PATH/TO/MITObim.pl -start 1 -end 10 -sample testpool -ref Salpinus_mt_genome -readpool reads.fastq -maf initial-mapping-testpool-to-Salpinus-mt_assembly/initial-mapping-testpool-to-Salpinus-mt_d_results/initial-mapping-testpool-to-Salpinus-mt_out.maf &> log
          pwd:/home/cainana/MITObim/dir/tutorial2
          running mapping assembly using MIRA
  readpool contains 6000 reads
  assembly contains 1 contig(s)
  contig length: 16664
  MITObim has reached a stationary read number after 5 iterations!!
  Final assembly result will be written to file: /home/cainana/MITObim/dir/tutorial2/iteration5/testpool_Salpinus_mt_genome-it5_noIUPAC.fasta
TUTORIAL II - direct reconstruction without prior mapping assembly using the --quick option(无需之前的基因组装,直接用-quick选项重建)(*approximate runtime: 4 min*)
 
 -bash-4.1$ mkdir tutorial3
 -bash-4.1$ cd tutorial3
 -bash-4.1$ /PATH/TO/MITObim.pl -start 1 -end 30 -sample testpool -ref Salpinus_mt_genome -readpool ~/PATH/TO/testdata1/Tthymallus-150bp-300sd50-interleaved.fastq --quick ~/PATH/TO/testdata1/Salpinus-mt-genome-NC_000861.fasta &> log
             
  result:reconstruct the mitochondrial genome and reach a stationary number of mitochondrial reads only after 14 iterations
                 324.1MB
TUTORIAL III - reconstructing mt genomes from mt barcode seeds(mt maybe is mitochondrial)(*approximate runtime: 20 min*)
 
 -bash-1.4$ mkdir tutorial4
 -bash-1.4$ cd tutorial4
 -bash-1.4$ ~/PATH/TO/MITObim.pl -sample testpool -ref Tthymallus-COI -readpool ~/PATH/TO/testdata1/Tthymallus-150bp-300sd50-interleaved.fastq --quick ~/PATH/TO/testdata1/Tthymallus-COI-partial-HQ961018.fasta -end 100 --clean &> log
  
  MITObim reconstructs the mitchondrial genome in 82  iterations. runtime: 6 min
  `--clean` option which tells MITObim to always only keep the latest two iteration directories to save space
                ls tutorial4 
     only iteration81\iteration82\log  85.1MB
For "well behaved" datasets 
        _de novo_ assembly(--denovo` flag)
 read pair information(`--paired` flag) can further speed up the reconstruction
 (*approximate runtime: 10 min*)
 
 -bash-4.1$ mkdir tutorial3-denovo
 -bash-4.1$ cd tutorial3-denovo
 -bash-4.1$ ~/PATH/TO/MITObim.pl -sample testpool -ref Tthymallus-COI -readpool ~/PATH/TO/testdata1/Tthymallus-150bp-300sd50-interleaved.fastq --quick ~/PATH/TO/testdata1/Tthymallus-COI-partial-HQ961018.fasta -end 50 --denovo --paired --clean &> log
  run的过程中tutorial4-denovo下不断更新 latest 3 dir and log  END: 30\31
  runtime:3 min   93.9MB
~/MITObim/MITObim.pl --sample SRR831234 -ref GQ368662 -readpool ~/F/genomes/SRR831234.fastq --quick ~/F/genomes/GQ368662.fasta -end 5 --denovo --paired --clean &>log
  
  Fatal error (may be due to problems of the input data or parameters):
 
   ********************************************************************************
  * Tmp directory is on a NFS mount ... but we don't want that.                  *(最好不要把临时文件放在挂在的磁盘上,会很慢)  a controlled program stop
   ********************************************************************************
cainana@cainana-VirtualBox:~/genomes/reconstruction/work2$ ~/MITObim/MITObim.pl --sample SRR831234 -ref GQ368662 -readpool ~/genomes/SRR831234.fastq --quick ~/genomes/GQ368662.fasta -end 5 --denovo --paired --clean &>log
               error
********************************
******************************
**************************
************************
*********************
*******************
****************
**************
*************
***********
*********
*******
*****
技术路线
1、对比 Bowtie2-2.2.9
 $./bowtie2-build AJ492192.fasta AJ492192
 $./bowtie2 -x AJ492192 -U SRR831234.fastq -s SRR831234_bowtie.sam
2、格式处理 Picard
 $ java -jar picard.jar SortSam I=SRR831234_bowtie.sam O=SRR831234_bowtie_s.sam SORT_ORDER=coordinate
 $ java -jar picard.jar SamToFastq I=SRR831234_bowtie_S.sam FASTQ=SRR831234_bowtie_pic.fastq
3、组装 MITObim_1.8
 $nohup ./MITObim_1.8.pl -start 1 -end 30 -sample SRR831234 -ref AJ492192 -readpool SRR831234_bowtie_pic_QC.fastq
   / --quick AJ492192.fasta --mirapath /home/su/cesar/mira_4.0.2_linux-gnu_x86_64_static/bin --NFE_warn_only >log &  (kbait= 31 default)
4、环化 BLAST+
5、查重 Notepad++
6、注释 bioedit
原创粉丝点击