几种clean data的软件用法

来源:互联网 发布:淘宝 国际驾照 编辑:程序博客网 时间:2024/05/16 14:56
#几种clean data的软件用法:
==============================================================
一、read quality control and preprocessing----FastQC、PRINSEQ
1、FastQC(输入文件可以是fastq(压缩和未压缩均可),sam、bam文件。)
fastqc *.fq

2、PRINSEQ(耗时!)
#In order to create an html report, you need two commands.
#step 1、生成一个临时的graph文件:
prinseq-lite.pl -fastq Xoo-wild_1.fq -phred64 -out_good null -out_bad null -graph_data graph
#step 2、生成图形
prinseq-graphs.pl -i graph -html_all -o QCreport

二、过滤软件---PRINSEQ、trimmomatic、Fastx-tools、Cutadapt
1、PRINSEQ(可处理PE)
#过滤掉PE中平均质量低于20的reads命令:
prinseq-lite.pl -fastq reads1.fastq -fastq2 reads2.fastq -phred64 -min_qual_mean 20 -out_good qual_filtered -out_bad null –no_qual_header –log -verbose
#Trimmomatic checks that the read’s pair survived and reports the properly paired reads in files paired1.fq.gz and paired2.fq.gz. The output files unpaired1.fq.gz and unpaired2.fq.gz contain reads, which lost their pair.

2、trimmomatic(可处理PE,多线程,很快)
#过滤掉PE中平均质量低于20的reads命令:
java -jar trimmomatic-0.32.jar PE -phred64 /home/hlb/xoo_tran/Xoo-wild_1.fq /home/hlb/xoo_tran/Xoo-wild_2.fq paired1.fq.gz unpaired1.fq.gz paired2.fq.gz unpaired2.fq.gz AVGQUAL:20

java -jar trimmomatic-0.32.jar PE s_1_1_sequence.txt.gz s_1_2_sequence.txt.gz lane1_forward_paired.fq.gz lane1_forward_unpaired.fq.gz lane1_reverse_paired.fq.gz lane1_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

java -jar trimmomatic-0.32.jar PE -phred64 reads1.fastq.gz reads2.fastq.gz paired1.fq.gz unpaired1.fq.gz paired2.fq.gz unpaired2.fq.gz TRAILING:20 MINLEN:50

三、修剪软件---PRINSEQ、FastX、trimmomatic、cutadapt
#FastX quality trimmer trims bases from the 3′ end, whereas PRINSEQ and Trimmomatic can trim reads from both ends.
#PRINSEQ, Trimmomatic, and Cutadapt have paired-end support
1、trimmomatic
#Trimmomatic处理PE的命令:TRAILING:20表示从3‘端开始,修剪碱基至质量高于20(官方更推荐使用Sliding Window or MaxInfo参数)。然后过滤掉短于50 bp的reads(MINLEN:50)。
java -jar trimmomatic-0.32.jar PE -phred64 reads1.fastq.gz reads2.fastq.gz paired1.fq.gz unpaired1.fq.gz paired2.fq.gz unpaired2.fq.gz TRAILING:20 MINLEN:50
#使用sliding Window的方法:Trimmomatic slides the window from the beginning (5′ end) of the read。
#The following Trimmomatic command slides a 3-base window from the 5′ end and cuts reads when the mean quality falls below 20 (SLIDINGWINDOW:3:20). It also filters out reads which are shorter than 50 bases (MINLEN:50) after trimming.如果发现留下的reads较少,可以将windowsize值增大,如从3变为7。
java -jar trimmomatic-0.32.jar PE -phred64 reads1.fastq.gz reads2.fastq.gz paired1.fq.gz unpaired1.fq.gz paired2.fq.gz unpaired2.fq.gz SLIDINGWINDOW:3:20 MINLEN:50
#使用MaxInfo的方法:reads长度和质量兼顾,提供两个参数:target read length和strictness, and trims reads from the 3′ end calculating a score at each base.
#One can control this balance by the strictness parameter, which gets a value between 0 and 1 so that higher values favor read correctness. The following MAXINFO trimming command sets target length = 50 and strictness = 0.7。
java –jar trimmomatic-0.32.jar PE -phred64 reads1.fastq.gz reads2.fastq.gz paired1.fq.gz unpaired1.fq.gz paired2.fq.gz unpaired2.fq.gz MAXINFO:50:0.7 MINLEN:50

2、PRINSEQ
#使用sliding Window的方法:PRINSEQ allows one to decide from which end the scanning should start
#PRINSEQ command slides a 3-base window from the opposite direction, 3′ end, and trims reads if the mean base quality is less than (lt) 20. It also filters out reads which are shorter than 50 bases after trimming.
prinseq-lite.pl -phred64 -trim_qual_window 3 -trim_qual_type mean -trim_qual_right 20 -trim_qual_rule lt -fastq reads1.fastq -fastq2 reads2.fastq -out_good window -out_bad null -verbose -min_len 50 -no_qual_header


四、去重软件---prinseq
prinseq-lite.pl -fastq reads1.fastq -derep 1 -derep_min 101 -log -verbose -out_good dupfiltered -out_bad null -no_qual_header
0 0
原创粉丝点击