基因数据处理75之从HDFS读取vcf文件存为Adam的parquet文件(成功)

来源:互联网 发布:名媛风的淑女打扮知乎 编辑:程序博客网 时间:2024/06/06 02:31

1.参考:

package org.bdgenomics.adam.cliclass FlattenSuite extends ADAMFunSuite {val loader = Thread.currentThread().getContextClassLoaderval inputPath = loader.getResource("small.vcf").getPathval outputFile = File.createTempFile("adam-cli.FlattenSuite", ".adam")val outputPath = outputFile.getAbsolutePathval flatFile = File.createTempFile("adam-cli.FlattenSuite", ".adam-flat")val flatPath = flatFile.getAbsolutePathassert(outputFile.delete(), "Couldn't delete (empty) temp file")assert(flatFile.delete(), "Couldn't delete (empty) temp file")val argLine = "%s %s".format(inputPath, outputPath).split("\\s+")val args: Vcf2ADAMArgs = Args4j.apply[Vcf2ADAMArgs](argLine)val vcf2Adam = new Vcf2ADAM(args)vcf2Adam.run(sc)

2.代码:

package org.gcdss.cli.loadimport org.apache.spark.sql.SQLContextimport org.apache.spark.{SparkConf, SparkContext}import org.bdgenomics.adam.cli.{Vcf2ADAMArgs, Vcf2ADAM}import org.bdgenomics.adam.rdd.ADAMContextimport org.bdgenomics.adam.rdd.ADAMContext._import org.bdgenomics.utils.cli.Args4j//import org.bdgenomics.avocado.AvocadoFunSuiteobject Callvcf2Adam {  //  def resourcePath(path: String) = ClassLoader.getSystemClassLoader.getResource(path).getFile  //  def tmpFile(path: String) = Files.createTempDirectory("").toAbsolutePath.toString + "/" + path  //  def apply(local: Boolean, fqFile: String, faFile: String, configFile: String, output: String) {  def main(args: Array[String]) {    println("start:")    var conf = new SparkConf().setAppName(this.getClass().getSimpleName().filter(!_.equals('$'))).setMaster("spark://219.219.220.149:7077")    //    var conf = new SparkConf().setAppName("AvocadoSuite").setMaster("local[4]")    val sc = new SparkContext(conf)    val startTime = System.currentTimeMillis()    val path = "hdfs://219.219.220.149:9000/xubo/callVariant/vcf/All_20160407.vcf"    val output = "/xubo/callVariant/vcf/All_20160407.adam"    val argLine = "%s %s".format(path, output).split("\\s+")    val args: Vcf2ADAMArgs = Args4j.apply[Vcf2ADAMArgs](argLine)    //    val arr=Array(argLine)    val vcf2Adam = new Vcf2ADAM(args)    vcf2Adam.run(sc)    val saveTime = System.currentTimeMillis()    println("run time:" + (saveTime - startTime) + " ms")    println("*************end*************")    sc.stop()  }}

3.结果:
202个小文件

时间:
211898ms
有点快。

通过adam-shell读取,记录为0:

scala> val rdd= sc.loadVariantAnnotations(“/xubo/callVariant/vcf/All_20160407.adam”)
print(rdd.count)

0

参考

【1】https://github.com/xubo245/AdamLearning【2】https://github.com/bigdatagenomics/adam/ 【3】https://github.com/xubo245/SparkLearning【4】http://spark.apache.org【5】http://stackoverflow.com/questions/28166667/how-to-pass-d-parameter-or-environment-variable-to-spark-job  【6】http://stackoverflow.com/questions/28840438/how-to-override-sparks-log4j-properties-per-driver

研究成果:

【1】 [BIBM] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Chao Wang, and Xuehai Zhou, "Distributed Gene Clinical Decision Support System Based on Cloud Computing", in IEEE International Conference on Bioinformatics and Biomedicine. (BIBM 2017, CCF B)【2】 [IEEE CLOUD] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Xuehai Zhou. Efficient Distributed Smith-Waterman Algorithm Based on Apache Spark (CLOUD 2017, CCF-C).【3】 [CCGrid] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Jinhong Zhou, Xuehai Zhou. DSA: Scalable Distributed Sequence Alignment System Using SIMD Instructions. (CCGrid 2017, CCF-C).【4】more: https://github.com/xubo245/Publications

Help

If you have any questions or suggestions, please write it in the issue of this project or send an e-mail to me: xubo245@mail.ustc.edu.cnWechat: xu601450868QQ: 601450868
阅读全文
'); })();
0 0
原创粉丝点击
热门IT博客
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 表板蜡怎么用 洗蜡水 轮胎蜡的成分 汽车光蜡 蜡样芽孢杆菌活菌片 汽车去划痕蜡 划痕蜡怎么样 医用蜡块 龟牌极限蜡 汽车轮胎蜡 聚丙烯蜡乳液 汽车刮痕蜡 龟牌极限蜡怎么样 堪地里拉蜡 汽车蜡哪家好 汽车表面蜡 镀膜蜡哪个牌子好 汽车养护蜡什么牌子好 黑色汽车蜡 3m水晶蜡 璀璨水晶硬蜡 划痕蜡好吗 汽车蜡哪个好 聚乙烯蜡设备 美光汽车蜡 汽车打什么蜡 汽车蜡拖什么牌子好 汽车仪表盘蜡 龟牌白金蜡 龟牌白金蜡怎么样 聚乙烯蜡原料 划痕蜡管用吗 自动喷蜡枪 什么汽车蜡比较好 汽车表板蜡哪个牌子好 轮胎蜡品牌 汽车内饰蜡 jpp划痕蜡 好的汽车蜡 龟牌划痕蜡有用吗 汽车电蜡