基因数据处理61之idea运行cs-bwamem处理single-end(1条100bp的reads)

来源:互联网 发布:食用油 知乎 编辑:程序博客网 时间:2024/06/01 09:49

代码:

    package cs.ucla.edu.bwaspark    import java.text.SimpleDateFormat    import java.util.Date    import cs.ucla.edu.bwaspark.FastMap._    import cs.ucla.edu.bwaspark.commandline.{BWAMEMCommand, UploadFASTQCommand}    import org.apache.spark.sql.SQLContext    import org.apache.spark.{SparkContext, SparkConf}    /**      * Created by xubo on 2016/6/4.      */    object BWAMEMSparkSuite {      def main(args: Array[String]) {        val iString = new SimpleDateFormat("yyyyMMddHHmmssSSS").format(new Date())        val argArr = Array("cs-bwamem", "-bfn", "1", "-bPSW", "1", "-sbatch", "10",          "-bPSWJNI", "1", "-oChoice", "2",          "-oPath", "file/alignment/output/test64" + iString, "-localRef", "1",          "-isSWExtBatched", "1", "0", "file/alignment/input/GRCH38BWAindex/GRCH38chr1L3556522.fasta",          "hdfs://219.219.220.149:9000/xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq")        //    argArr.foreach(println)        val argsList = argArr.toList        var bwamemArgs = new BWAMEMCommand        bwamemArgs = BWAMEMSpark.bwamemCmdLineParser(argsList.tail)        //    val conf = new SparkConf().setAppName("Cloud-Scale BWAMEM: cs-bwamem").setMaster("local[4]")        val conf = new SparkConf().setMaster("local[4]").setAppName(this.getClass().getSimpleName().filter(!_.equals('$')))        val sc = new SparkContext(conf)        println("align start")        memMain(sc, bwamemArgs)        println("CS-BWAMEM Finished!!!")        //    BWAMEMSpark.main(argArr)        println("align end")        //    val file = "hdfs://219.219.220.149:9000/xubo/16.adam/0"        //    val sc = new SparkContext(conf)        val sqlContext = new SQLContext(sc)        import sqlContext._        val df3 = sqlContext.read.option("mergeSchema", "true").parquet("file/alignment/output/test64" + iString + "/0")        //    df3.printSchema()        df3.show()        println(df3.count())        println("end")      }    }

运行记录:

    cs.ucla.edu.bwaspark.BWAMEMSparkSuite    Map('isPSWJNI -> 1, 'localRef -> 1, 'batchedFolderNum -> 1, 'isPSWBatched -> 1, 'subBatchSize -> 10, 'inFASTQPath -> hdfs://219.219.220.149:9000/xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq, 'inFASTAPath -> file/alignment/input/GRCH38BWAindex/GRCH38chr1L3556522.fasta, 'outputPath -> file/alignment/output/test6420160604172207095, 'isSWExtBatched -> 1, 'isPairEnd -> 0, 'outputChoice -> 2)    CS- BWAMEM command line arguments: false file/alignment/input/GRCH38BWAindex/GRCH38chr1L3556522.fasta hdfs://219.219.220.149:9000/xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq 1 true 10 true ./target/jniNative.so 2 file/alignment/output/test6420160604172207095    SLF4J: Class path contains multiple SLF4J bindings.    SLF4J: Found binding in [jar:file:/D:/1win7/java/otherJar/spark-assembly-1.5.2-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]    SLF4J: Found binding in [jar:file:/C:/Users/xubo/.m2/repository/org/slf4j/slf4j-log4j12/1.7.5/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]    2016-06-04 17:22:09 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable    2016-06-04 17:22:11 WARN  MetricsSystem:71 - Using default name DAGScheduler for source because spark.app.id is not set.    align start    2016-06-04 17:22:12 WARN  :139 - Your hostname, xubo-PC resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:c0a8:168%19, but we couldn't find any external IP address!    HDFS master: hdfs://219.219.220.149:9000    Input HDFS folder number: 1    Head line: @RG  ID:foo  SM:bar    Read Group ID: foo    Load Index Files    Load BWA-MEM options    Output choice: 2    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".    SLF4J: Defaulting to no-operation (NOP) logger implementation    SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.    [WARNING] Avro: Invalid default for field comment: null not a "bytes"    [WARNING] Avro: Invalid default for field comment: null not a "bytes"    [WARNING] Avro: Invalid default for field comment: null not a "bytes"    2016-06-04 17:22:32 WARN  MemoryStore:71 - Not enough space to cache broadcast_0 in memory! (computed 415.5 MB so far)    CS-BWAMEM Finished!!!    align end    2016-06-04 17:24:08 WARN  ParquetRecordReader:193 - Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl    +--------------------+---------+---------+----+--------+--------------------+--------------------+-----+---------------------+-------------------+----------+----------+----------+----------+-----------+------------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+---------------------------+----------------------+-----------------------+--------------------+----------------------+------------------+------------------------------------+-------------------+-----------------------+-----------------+------------------+----------------+----------+    |              contig|    start|      end|mapq|readName|            sequence|                qual|cigar|basesTrimmedFromStart|basesTrimmedFromEnd|readPaired|properPair|readMapped|mateMapped|firstOfPair|secondOfPair|failedVendorQualityChecks|duplicateRead|readNegativeStrand|mateNegativeStrand|primaryAlignment|secondaryAlignment|supplementaryAlignment|mismatchingPositions|origQual|          attributes|recordGroupName|recordGroupSequencingCenter|recordGroupDescription|recordGroupRunDateEpoch|recordGroupFlowOrder|recordGroupKeySequence|recordGroupLibrary|recordGroupPredictedMedianInsertSize|recordGroupPlatform|recordGroupPlatformUnit|recordGroupSample|mateAlignmentStart|mateAlignmentEnd|mateContig|    +--------------------+---------+---------+----+--------+--------------------+--------------------+-----+---------------------+-------------------+----------+----------+----------+----------+-----------+------------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+---------------------------+----------------------+-----------------------+--------------------+----------------------+------------------+------------------------------------+-------------------+-----------------------+-----------------+------------------+----------------+----------+    |[chr1,248956422,n...|225496693|225496793|  60|  chr1-1|CATATTTACCAATTAAA...|@C@D@FFDFHHHHIJ.J...| 100M|                    0|                  0|     false|     false|      true|     false|      false|       false|                    false|        false|             false|             false|            true|             false|                 false|               61A38|    null|NM:i:1    AS:i:95 XS...|            foo|                       null|                  null|                   null|                null|                  null|              null|                                null|               null|                   null|              bar|              null|            null|      null|    +--------------------+---------+---------+----+--------+--------------------+--------------------+-----+---------------------+-------------------+----------+----------+----------+----------+-----------+------------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+---------------------------+----------------------+-----------------------+--------------------+----------------------+------------------+------------------------------------+-------------------+-----------------------+-----------------+------------------+----------------+----------+    2016-06-04 17:24:09 WARN  ParquetRecordReader:193 - Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl    1    end    2016-06-04 17:24:10 WARN  QueuedThreadPool:145 - 7 threads could not be stopped    Process finished with exit code 0

参考

【1】https://github.com/xubo245/AdamLearning【2】https://github.com/bigdatagenomics/adam/ 【3】https://github.com/xubo245/SparkLearning【4】http://spark.apache.org【5】http://stackoverflow.com/questions/28166667/how-to-pass-d-parameter-or-environment-variable-to-spark-job  【6】http://stackoverflow.com/questions/28840438/how-to-override-sparks-log4j-properties-per-driver

研究成果:

【1】 [BIBM] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Chao Wang, and Xuehai Zhou, "Distributed Gene Clinical Decision Support System Based on Cloud Computing", in IEEE International Conference on Bioinformatics and Biomedicine. (BIBM 2017, CCF B)【2】 [IEEE CLOUD] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Xuehai Zhou. Efficient Distributed Smith-Waterman Algorithm Based on Apache Spark (CLOUD 2017, CCF-C).【3】 [CCGrid] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Jinhong Zhou, Xuehai Zhou. DSA: Scalable Distributed Sequence Alignment System Using SIMD Instructions. (CCGrid 2017, CCF-C).【4】more: https://github.com/xubo245/Publications

Help

If you have any questions or suggestions, please write it in the issue of this project or send an e-mail to me: xubo245@mail.ustc.edu.cnWechat: xu601450868QQ: 601450868
阅读全文
0 0
原创粉丝点击