基因数据处理61之idea运行cs-bwamem处理single-end(1条100bp的reads)
来源:互联网 发布:食用油 知乎 编辑:程序博客网 时间:2024/06/01 09:49
代码:
package cs.ucla.edu.bwaspark import java.text.SimpleDateFormat import java.util.Date import cs.ucla.edu.bwaspark.FastMap._ import cs.ucla.edu.bwaspark.commandline.{BWAMEMCommand, UploadFASTQCommand} import org.apache.spark.sql.SQLContext import org.apache.spark.{SparkContext, SparkConf} /** * Created by xubo on 2016/6/4. */ object BWAMEMSparkSuite { def main(args: Array[String]) { val iString = new SimpleDateFormat("yyyyMMddHHmmssSSS").format(new Date()) val argArr = Array("cs-bwamem", "-bfn", "1", "-bPSW", "1", "-sbatch", "10", "-bPSWJNI", "1", "-oChoice", "2", "-oPath", "file/alignment/output/test64" + iString, "-localRef", "1", "-isSWExtBatched", "1", "0", "file/alignment/input/GRCH38BWAindex/GRCH38chr1L3556522.fasta", "hdfs://219.219.220.149:9000/xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq") // argArr.foreach(println) val argsList = argArr.toList var bwamemArgs = new BWAMEMCommand bwamemArgs = BWAMEMSpark.bwamemCmdLineParser(argsList.tail) // val conf = new SparkConf().setAppName("Cloud-Scale BWAMEM: cs-bwamem").setMaster("local[4]") val conf = new SparkConf().setMaster("local[4]").setAppName(this.getClass().getSimpleName().filter(!_.equals('$'))) val sc = new SparkContext(conf) println("align start") memMain(sc, bwamemArgs) println("CS-BWAMEM Finished!!!") // BWAMEMSpark.main(argArr) println("align end") // val file = "hdfs://219.219.220.149:9000/xubo/16.adam/0" // val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) import sqlContext._ val df3 = sqlContext.read.option("mergeSchema", "true").parquet("file/alignment/output/test64" + iString + "/0") // df3.printSchema() df3.show() println(df3.count()) println("end") } }
运行记录:
cs.ucla.edu.bwaspark.BWAMEMSparkSuite Map('isPSWJNI -> 1, 'localRef -> 1, 'batchedFolderNum -> 1, 'isPSWBatched -> 1, 'subBatchSize -> 10, 'inFASTQPath -> hdfs://219.219.220.149:9000/xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq, 'inFASTAPath -> file/alignment/input/GRCH38BWAindex/GRCH38chr1L3556522.fasta, 'outputPath -> file/alignment/output/test6420160604172207095, 'isSWExtBatched -> 1, 'isPairEnd -> 0, 'outputChoice -> 2) CS- BWAMEM command line arguments: false file/alignment/input/GRCH38BWAindex/GRCH38chr1L3556522.fasta hdfs://219.219.220.149:9000/xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq 1 true 10 true ./target/jniNative.so 2 file/alignment/output/test6420160604172207095 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/D:/1win7/java/otherJar/spark-assembly-1.5.2-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/C:/Users/xubo/.m2/repository/org/slf4j/slf4j-log4j12/1.7.5/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2016-06-04 17:22:09 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2016-06-04 17:22:11 WARN MetricsSystem:71 - Using default name DAGScheduler for source because spark.app.id is not set. align start 2016-06-04 17:22:12 WARN :139 - Your hostname, xubo-PC resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:c0a8:168%19, but we couldn't find any external IP address! HDFS master: hdfs://219.219.220.149:9000 Input HDFS folder number: 1 Head line: @RG ID:foo SM:bar Read Group ID: foo Load Index Files Load BWA-MEM options Output choice: 2 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [WARNING] Avro: Invalid default for field comment: null not a "bytes" [WARNING] Avro: Invalid default for field comment: null not a "bytes" [WARNING] Avro: Invalid default for field comment: null not a "bytes" 2016-06-04 17:22:32 WARN MemoryStore:71 - Not enough space to cache broadcast_0 in memory! (computed 415.5 MB so far) CS-BWAMEM Finished!!! align end 2016-06-04 17:24:08 WARN ParquetRecordReader:193 - Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl +--------------------+---------+---------+----+--------+--------------------+--------------------+-----+---------------------+-------------------+----------+----------+----------+----------+-----------+------------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+---------------------------+----------------------+-----------------------+--------------------+----------------------+------------------+------------------------------------+-------------------+-----------------------+-----------------+------------------+----------------+----------+ | contig| start| end|mapq|readName| sequence| qual|cigar|basesTrimmedFromStart|basesTrimmedFromEnd|readPaired|properPair|readMapped|mateMapped|firstOfPair|secondOfPair|failedVendorQualityChecks|duplicateRead|readNegativeStrand|mateNegativeStrand|primaryAlignment|secondaryAlignment|supplementaryAlignment|mismatchingPositions|origQual| attributes|recordGroupName|recordGroupSequencingCenter|recordGroupDescription|recordGroupRunDateEpoch|recordGroupFlowOrder|recordGroupKeySequence|recordGroupLibrary|recordGroupPredictedMedianInsertSize|recordGroupPlatform|recordGroupPlatformUnit|recordGroupSample|mateAlignmentStart|mateAlignmentEnd|mateContig| +--------------------+---------+---------+----+--------+--------------------+--------------------+-----+---------------------+-------------------+----------+----------+----------+----------+-----------+------------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+---------------------------+----------------------+-----------------------+--------------------+----------------------+------------------+------------------------------------+-------------------+-----------------------+-----------------+------------------+----------------+----------+ |[chr1,248956422,n...|225496693|225496793| 60| chr1-1|CATATTTACCAATTAAA...|@C@D@FFDFHHHHIJ.J...| 100M| 0| 0| false| false| true| false| false| false| false| false| false| false| true| false| false| 61A38| null|NM:i:1 AS:i:95 XS...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| +--------------------+---------+---------+----+--------+--------------------+--------------------+-----+---------------------+-------------------+----------+----------+----------+----------+-----------+------------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+---------------------------+----------------------+-----------------------+--------------------+----------------------+------------------+------------------------------------+-------------------+-----------------------+-----------------+------------------+----------------+----------+ 2016-06-04 17:24:09 WARN ParquetRecordReader:193 - Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl 1 end 2016-06-04 17:24:10 WARN QueuedThreadPool:145 - 7 threads could not be stopped Process finished with exit code 0
参考
【1】https://github.com/xubo245/AdamLearning【2】https://github.com/bigdatagenomics/adam/ 【3】https://github.com/xubo245/SparkLearning【4】http://spark.apache.org【5】http://stackoverflow.com/questions/28166667/how-to-pass-d-parameter-or-environment-variable-to-spark-job 【6】http://stackoverflow.com/questions/28840438/how-to-override-sparks-log4j-properties-per-driver
研究成果:
【1】 [BIBM] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Chao Wang, and Xuehai Zhou, "Distributed Gene Clinical Decision Support System Based on Cloud Computing", in IEEE International Conference on Bioinformatics and Biomedicine. (BIBM 2017, CCF B)【2】 [IEEE CLOUD] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Xuehai Zhou. Efficient Distributed Smith-Waterman Algorithm Based on Apache Spark (CLOUD 2017, CCF-C).【3】 [CCGrid] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Jinhong Zhou, Xuehai Zhou. DSA: Scalable Distributed Sequence Alignment System Using SIMD Instructions. (CCGrid 2017, CCF-C).【4】more: https://github.com/xubo245/Publications
Help
If you have any questions or suggestions, please write it in the issue of this project or send an e-mail to me: xubo245@mail.ustc.edu.cnWechat: xu601450868QQ: 601450868
阅读全文
0 0
- 基因数据处理61之idea运行cs-bwamem处理single-end(1条100bp的reads)
- 基因数据处理53之cs-bwamem集群版运行paird-end(1千万条100bp的reads)
- 基因数据处理52之cs-bwamem集群版运行(1千万条100bp的reads)
- 基因数据处理59之snap运行single-end(1千万条100bp的reads)
- 基因数据处理60之bwa运行single-end(1千万条100bp的reads)
- 基因数据处理57之BWA-MEM运行single-end(1千万条100bp的reads)
- 基因数据处理54之bwa-mem运行paird-end(1千万条100bp的reads)
- 基因数据处理56之bwa运行paird-end(1千万条100bp的reads).md
- 基因数据处理58之snap运行paired-end(1千万条100bp的reads对)
- 基因数据处理64之bwamem处理500bp和1000bp的记录
- 基因数据处理51之cs-bwamem集群版运行*
- 基因数据处理84之cs-bwamem处理小数据集
- 基因数据处理55之cs-bwamem安装记录(idea maven ,没有通过pl)
- 基因数据处理82之cs-bwamem处理SRR003161(参考基因组为GRCH38chr1)
- 基因数据处理82之cs-bwamem处理SRR003161(参考基因组为GRCH38chr1)
- 基因数据处理62之snap默认无法处理大于400bp的reads
- 基因数据处理49之cloud-scale-bwamem运行成功
- 基因数据处理50之cs-bwamem、bwa、snap、bwa-mem与art比较
- 方程求根(二分法和牛顿迭代法)
- 基于S3C2440的嵌入式Linux驱动——AT24C02(EEPROM I2C接口)驱动解读
- 消息队列二
- 基因数据处理60之bwa运行single-end(1千万条100bp的reads)
- 用5种分类算法预测小麦种类,比较得分情况
- 基因数据处理61之idea运行cs-bwamem处理single-end(1条100bp的reads)
- python 笔记 12月18日关于 字符串 列表 元组 字典的一些用法和增删改查
- Hexo + yilia 搭建博客可能会遇到的所有疑问
- 爬虫
- 基因数据处理62之snap默认无法处理大于400bp的reads
- quartz初探--相关类和接口
- erlang序列化工具性能对比(erlang protobuf和term_to_binary对比)
- 基本术语(告诉你西瓜书为什么叫西瓜书)
- HIVE汇总